PagerDuty Alerts for Cron Jobs
Page on-call rotations when critical cron jobs fail. Uses PagerDuty's Events API v2 — no Slack bridge required, full incident lifecycle (trigger, acknowledge, resolve).
1. Create the Events API integration
- In PagerDuty, go to Services → New Service (or pick an existing one).
- Add an integration of type Events API v2. Name it "crond.io".
- Copy the Integration Key (also called routing key) — a 32-character hex string.
2. Add the alert channel
The pagerduty payload template formats our alerts as PagerDuty Events API v2 payloads automatically — including deduplication keys so repeated alerts collapse into a single incident.
Severity mapping
By default crond.io maps monitor state to PagerDuty severity:
| Monitor state | PagerDuty severity | Event action |
|---|---|---|
| down | critical | trigger |
| late | warning | trigger |
| recovered | — | resolve |
Override per-monitor with severity_override in the alert config.
Slack-first, PagerDuty-on-escalation
Common pattern: route every failure to Slack for visibility, but only page PagerDuty when the alert is still active after N minutes. PagerDuty supports this natively via Event Rules— keep the crond.io alert channel firing immediately, and let PagerDuty's rules decide whether to escalate based on dedup key + age.
Alternatively, add both crond.io alert channels to the monitor — Slack with no delay, PagerDuty with delay_seconds: 300. PagerDuty only fires if the monitor is still down after 5 minutes.
Auto-resolve
When the monitored job recovers, crond.io sends a resolve event with the same dedup key. PagerDuty closes the incident automatically — no manual ack needed for self-healing failures.
Troubleshooting
- No incident created: verify the routing key against PagerDuty's integration page. A 202 response with no incident means the routing key is wrong but PagerDuty silently accepts the event.
- Duplicate incidents: dedup key collides with another integration sharing the routing key. Use a separate Events API integration per source.
- Escalation not firing: check the service's escalation policy — events without urgency override use the service's default.