$crond.io
integrations / pagerduty

PagerDuty Alerts for Cron Jobs

Page on-call rotations when critical cron jobs fail. Uses PagerDuty's Events API v2 — no Slack bridge required, full incident lifecycle (trigger, acknowledge, resolve).

1. Create the Events API integration

  1. In PagerDuty, go to Services → New Service (or pick an existing one).
  2. Add an integration of type Events API v2. Name it "crond.io".
  3. Copy the Integration Key (also called routing key) — a 32-character hex string.

2. Add the alert channel

curl -X POST https://api.crond.io/api/v1/monitors/$MONITOR_ID/alerts \
-H "Authorization: Bearer $CROND_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"channel": "webhook",
"url": "https://events.pagerduty.com/v2/enqueue",
"payload_template": "pagerduty",
"routing_key": "your-32-char-integration-key"
}'

The pagerduty payload template formats our alerts as PagerDuty Events API v2 payloads automatically — including deduplication keys so repeated alerts collapse into a single incident.

Severity mapping

By default crond.io maps monitor state to PagerDuty severity:

Monitor statePagerDuty severityEvent action
downcriticaltrigger
latewarningtrigger
recoveredresolve

Override per-monitor with severity_override in the alert config.

Slack-first, PagerDuty-on-escalation

Common pattern: route every failure to Slack for visibility, but only page PagerDuty when the alert is still active after N minutes. PagerDuty supports this natively via Event Rules— keep the crond.io alert channel firing immediately, and let PagerDuty's rules decide whether to escalate based on dedup key + age.

Alternatively, add both crond.io alert channels to the monitor — Slack with no delay, PagerDuty with delay_seconds: 300. PagerDuty only fires if the monitor is still down after 5 minutes.

Auto-resolve

When the monitored job recovers, crond.io sends a resolve event with the same dedup key. PagerDuty closes the incident automatically — no manual ack needed for self-healing failures.

Troubleshooting

  • No incident created: verify the routing key against PagerDuty's integration page. A 202 response with no incident means the routing key is wrong but PagerDuty silently accepts the event.
  • Duplicate incidents: dedup key collides with another integration sharing the routing key. Use a separate Events API integration per source.
  • Escalation not firing: check the service's escalation policy — events without urgency override use the service's default.