Cron Job Observability: Metrics, Logs, and Traces for Scheduled Tasks

Beyond "Did It Run?"

Knowing whether a cron job ran is the bare minimum. Real observability means understanding how it ran: how long it took, how much data it processed, whether it is getting slower over time, and what it did during execution.

The Three Pillars for Cron Jobs

1. Metrics

Track quantitative data about each run:

Execution duration - is the job getting slower?
Records processed - is throughput consistent?
Success/failure rate - is reliability declining?
Resource usage - CPU, memory, disk I/O during execution

#!/bin/bash
set -euo pipefail

START=$(date +%s)

# Do the work
PROCESSED=$(python3 process_orders.py 2>&1 | tail -1)

END=$(date +%s)
DURATION=$((END - START))

# Report metrics with check-in
curl -fsS https://cronguard.app/api/ping/abc123 \
  -d "duration=${DURATION}s, processed=${PROCESSED}"

2. Logs

Structured logging makes cron output searchable and parseable:

#!/bin/bash
LOG_FILE="/var/log/cron-jobs/order-processor.log"

log_json() {
  local level=$1 msg=$2
  echo "{\"timestamp\":\"$(date -Iseconds)\",\"level\":\"$level\",\"job\":\"order-processor\",\"message\":\"$msg\"}" >> "$LOG_FILE"
}

log_json "info" "Starting order processing"
RESULT=$(python3 process_orders.py 2>&1) || {
  log_json "error" "Failed: $RESULT"
  exit 1
}
log_json "info" "Completed: $RESULT"

3. Traces (for Complex Jobs)

For jobs that call multiple services or have multiple phases, tracing shows where time is spent and where failures occur:

Phase 1: Fetch data from API      [2.3s] OK
Phase 2: Transform records        [0.8s] OK
Phase 3: Write to database         [1.2s] OK
Phase 4: Send notifications        [0.5s] FAILED - SMTP timeout
Phase 5: Upload report to S3       [SKIPPED]

Duration Tracking

Duration trends are one of the most valuable metrics. A backup that normally takes 5 minutes but gradually increases to 30 minutes is a warning sign: growing data, degrading disk performance, or increasing contention.

Setting Duration Alerts

Alert when execution time exceeds historical norms:

Warning: 2x average duration
Critical: 5x average duration or approaching the interval between runs

Output Validation

Track what your jobs produce, not just whether they ran:

#!/bin/bash
set -euo pipefail

# Run the job and capture metrics
RESULT=$(python3 sync_orders.py --json-stats)

SYNCED=$(echo "$RESULT" | jq -r '.synced')
SKIPPED=$(echo "$RESULT" | jq -r '.skipped')
ERRORS=$(echo "$RESULT" | jq -r '.errors')

# Alert if error rate is too high
if [ "$ERRORS" -gt 10 ]; then
  echo "WARNING: $ERRORS errors during sync" >&2
fi

# Check in with metrics
curl -fsS https://cronguard.app/api/ping/abc123 \
  -d "synced=$SYNCED, skipped=$SKIPPED, errors=$ERRORS"

Historical Analysis

Store metrics over time to answer questions like:

Is this job getting slower? (performance degradation)
Is it processing fewer records? (data pipeline issue)
Does it fail on specific days? (pattern detection)
Did the last deploy affect job performance? (change correlation)

Dashboard Design

A good cron job dashboard shows:

Status grid - all jobs at a glance (green/yellow/red)
Last run info - when, how long, what it reported
Duration chart - execution time over days/weeks
Failure history - when and why jobs failed
Upcoming runs - what is scheduled next

Alerting Strategy

Not every metric needs an alert. Focus on actionable signals:

Signal	Alert Level	Action
Job did not check in	Critical	Investigate immediately
Job reported failure	High	Check logs, may need manual intervention
Duration 3x normal	Warning	Monitor, investigate if persistent
Output below threshold	Warning	Check data source, may be upstream issue

Frequently asked questions about cron job observability

What should I track about a cron job beyond whether it ran? Track quantitative data about each run: execution duration to see whether the job is getting slower, records processed to see whether throughput is consistent, the success and failure rate to see whether reliability is declining, and resource usage such as CPU, memory, and disk I/O during execution. Together these answer how the job ran, not just that it ran.

Why is execution duration such a valuable metric? Duration trends are one of the most valuable metrics because they surface slow decay rather than sudden breakage. A backup that normally takes 5 minutes but gradually increases to 30 minutes is a warning sign of growing data, degrading disk performance, or increasing contention. A reasonable alerting rule is a warning at 2x the average duration and a critical alert at 5x the average, or when the run starts approaching the interval between runs.

How do I make cron output easier to search? Use structured logging. Emitting one JSON object per log line, with a timestamp, a level, the job name, and the message, makes cron output searchable and parseable instead of a wall of free text you have to grep through by hand.

Which cron signals actually deserve an alert? Not every metric needs an alert, so focus on actionable signals. A job that did not check in is critical and should be investigated immediately. A job that reported a failure is high priority: check the logs, since it may need manual intervention. Duration at 3x normal is a warning worth monitoring and investigating if it persists. Output below the expected threshold is also a warning, and usually means checking the data source because it may be an upstream issue.

What should a cron job dashboard show? A good cron job dashboard shows a status grid with all jobs at a glance in green, yellow, or red; last run info covering when it ran, how long it took, and what it reported; a duration chart of execution time over days or weeks; a failure history of when and why jobs failed; and the upcoming runs that are scheduled next.

Cron Job Observability: Metrics, Logs, and Traces for Scheduled Tasks

Beyond "Did It Run?"

The Three Pillars for Cron Jobs

1. Metrics

2. Logs

3. Traces (for Complex Jobs)

Duration Tracking

Setting Duration Alerts

Output Validation

Historical Analysis

Dashboard Design

Alerting Strategy

Frequently asked questions about cron job observability

Further reading

Related posts

Set up your first monitor.
It'll take 30 seconds.

Beyond "Did It Run?"

The Three Pillars for Cron Jobs

1. Metrics

2. Logs

3. Traces (for Complex Jobs)

Duration Tracking

Setting Duration Alerts

Output Validation

Historical Analysis

Dashboard Design

Alerting Strategy

Frequently asked questions about cron job observability

Further reading

Related posts

Set up your first monitor.It'll take 30 seconds.

Set up your first monitor.
It'll take 30 seconds.