Kubernetes CronJob Monitoring
Monitor every Kubernetes CronJob in your cluster without rebuilding images or modifying manifests beyond two annotations.
Why this is hard by default
Kubernetes' CronJob controller does the bare minimum: it tries to create a Job at the scheduled time and records what happened in the resource's status. If the cluster is under pressure, scheduled runs can be missed entirely — and unless someone watches kubectl get cj, the failure is invisible.
Adding heartbeat monitoring to existing CronJobs usually means modifying the workload image to call a ping URL on exit. That works for jobs you control but breaks down for third-party charts, large multi-namespace clusters, and any team that wants to opt in without rebuilding CI.
Install
The chart installs a mutating admission webhook (~15 MB controller) that watches for CronJob resources with the crond.io/monitor: "true" annotation and rewrites them at admission time to wrap the workload with the agent.
Opt in per CronJob
That's it. On admission the webhook:
- Creates a monitor in crond.io if one with that name doesn't exist
- Injects an init container that drops the agent into a shared
emptyDir - Rewrites the workload command to
crond-agent wrap -- <original command> - Wires the ping URL and grace period via env vars
What gets reported
- Start ping on Job container start
- Success ping with exit code 0 and execution duration
- Failure ping with exit code, duration, and last 4 KB of stderr
- Missed pings automatically detected by crond.io after grace + schedule
Cluster-wide patterns
For platform teams running CronJobs across many tenant namespaces, the chart supports a default-on mode that auto-monitors every CronJob unless explicitly opted out with crond.io/monitor: "false". Combine with a Kyverno or OPA policy to enforce naming conventions across teams.
Compatibility
- Kubernetes 1.25+
batch/v1CronJob (not the deprecated v1beta1)- Container runtimes: containerd, CRI-O, Docker
- Cloud-managed: EKS, GKE, AKS, DigitalOcean Kubernetes — tested
- Air-gapped clusters: configure
.Values.image.registryto your internal mirror