$crond.io
integrations / kubernetes

Kubernetes CronJob Monitoring

Monitor every Kubernetes CronJob in your cluster without rebuilding images or modifying manifests beyond two annotations.

Why this is hard by default

Kubernetes' CronJob controller does the bare minimum: it tries to create a Job at the scheduled time and records what happened in the resource's status. If the cluster is under pressure, scheduled runs can be missed entirely — and unless someone watches kubectl get cj, the failure is invisible.

Adding heartbeat monitoring to existing CronJobs usually means modifying the workload image to call a ping URL on exit. That works for jobs you control but breaks down for third-party charts, large multi-namespace clusters, and any team that wants to opt in without rebuilding CI.

Install

# Add the chart repo
helm repo add crond https://charts.crond.io
helm repo update
# Install with your API key
helm install crond-monitor crond/crond-monitor \
--namespace crond-system --create-namespace \
--set apiKey=$CROND_API_KEY

The chart installs a mutating admission webhook (~15 MB controller) that watches for CronJob resources with the crond.io/monitor: "true" annotation and rewrites them at admission time to wrap the workload with the agent.

Opt in per CronJob

apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
annotations:
crond.io/monitor: "true"
crond.io/name: "k8s-nightly-backup"
crond.io/grace-seconds: "1800"
spec:
schedule: "0 2 * * *"
# everything else unchanged

That's it. On admission the webhook:

  • Creates a monitor in crond.io if one with that name doesn't exist
  • Injects an init container that drops the agent into a shared emptyDir
  • Rewrites the workload command to crond-agent wrap -- <original command>
  • Wires the ping URL and grace period via env vars

What gets reported

  • Start ping on Job container start
  • Success ping with exit code 0 and execution duration
  • Failure ping with exit code, duration, and last 4 KB of stderr
  • Missed pings automatically detected by crond.io after grace + schedule

Cluster-wide patterns

For platform teams running CronJobs across many tenant namespaces, the chart supports a default-on mode that auto-monitors every CronJob unless explicitly opted out with crond.io/monitor: "false". Combine with a Kyverno or OPA policy to enforce naming conventions across teams.

Compatibility

  • Kubernetes 1.25+
  • batch/v1 CronJob (not the deprecated v1beta1)
  • Container runtimes: containerd, CRI-O, Docker
  • Cloud-managed: EKS, GKE, AKS, DigitalOcean Kubernetes — tested
  • Air-gapped clusters: configure .Values.image.registry to your internal mirror