Why Cron Jobs Fail Silently (And How to Catch Them)

The Invisible Infrastructure

Cron jobs are the workhorses of backend infrastructure. They run backups, process payments, send emails, sync data, and clean up logs. They do it quietly, reliably, and without asking for attention.

Until they stop. And nobody notices.

That is the fundamental problem with cron jobs: they are designed to be invisible. There is no user interface showing their status. No green light that turns red. Just silence where there used to be work getting done.

Common Causes of Silent Failures

1. Environment Variable Differences

Cron runs jobs in a minimal shell environment. The PATH, HOME, and other variables your script depends on may not be set. A script that works perfectly from the terminal can fail completely under cron because it cannot find a binary or config file.

# This works in your terminal
python3 /home/deploy/scripts/backup.py

# But cron might not find python3 because /usr/local/bin isn't in PATH
# Fix: use absolute paths
/usr/local/bin/python3 /home/deploy/scripts/backup.py

2. Swallowed Exit Codes

Bash scripts with multiple commands can mask failures. If a command in the middle fails but the last command succeeds, the script exits with code 0 and everything looks fine.

#!/bin/bash
# The backup might fail, but the echo always succeeds
pg_dump mydb > backup.sql
echo "Done"  # exit code 0, even if pg_dump failed

Fix this with set -e to abort on any error:

#!/bin/bash
set -euo pipefail

pg_dump mydb > backup.sql
echo "Done"  # only runs if pg_dump succeeded

3. Disk Space Exhaustion

A backup job that writes to a full disk produces a zero-byte or truncated file. The script might not even error out; it just silently creates a useless backup. Weeks later, when you actually need to restore, you discover every backup since last Tuesday is empty.

4. Permission Changes

System updates, deploys, or container rebuilds can change file permissions. A cron job that ran fine for months suddenly cannot read its config file or write to its output directory. The crontab still triggers the job, but the job immediately fails.

5. Network Timeouts

Jobs that call external APIs or sync with remote services fail when those services are temporarily unavailable. Without timeout handling and retry logic, the job either hangs indefinitely or fails and moves on without completing its work.

6. Resource Contention

Two cron jobs scheduled at the same time can compete for CPU, memory, or database connections. One or both may fail or produce incorrect results under load. This is especially common with jobs scheduled at midnight or on the hour, when many tasks tend to run simultaneously.

Why Default Cron Monitoring Does Not Work

Email Output

Cron can email output to MAILTO, but this only works if:

A mail server is configured on the host
Someone actually reads those emails
The job produces output on failure (many do not)

In practice, cron emails either go to /dev/null, pile up in a mailbox nobody checks, or are never configured in the first place.

Log Files

Redirecting output to log files is better, but someone needs to read those logs. Grepping through /var/log/syslog for cron entries is not a monitoring strategy.

The Dead Man's Switch Approach

The most reliable way to monitor cron jobs is the dead man's switch pattern: instead of watching for failure, you watch for the absence of success.

Your cron job "checks in" at the end of a successful run by hitting a URL. If the check-in does not arrive within the expected window, something went wrong. It does not matter why it failed; the absence of a success signal is the alert.

#!/bin/bash
set -euo pipefail

# Do the actual work
pg_dump mydb > /backups/mydb-$(date +%Y%m%d).sql
gzip /backups/mydb-$(date +%Y%m%d).sql

# Check in with CronGuard - only runs if everything above succeeded
curl -fsS --retry 3 https://cronguard.app/api/ping/your-monitor-id

If the script fails at any point before the curl, the check-in never happens, and you get an alert. Simple, reliable, and it catches every failure mode: crashes, hangs, timeouts, permission errors, disk space issues, and everything else.

What Good Monitoring Looks Like

Expected schedule awareness - the monitor knows when the job should run
Grace periods - jobs can vary in duration without triggering false alarms
Multiple notification channels - email, Slack, Discord, webhooks
Historical tracking - see patterns in job duration and failure frequency
Team visibility - everyone can see what is running and what is failing

Prevention Checklist

Use set -euo pipefail in all bash cron scripts
Use absolute paths for binaries and files
Check disk space before writing large files
Add timeouts to network operations
Stagger job schedules to avoid resource contention
Test cron jobs in the cron environment, not just your terminal
Monitor with a dead man's switch, not just log scraping

Frequently asked questions about silent cron failures

Why does cron not tell me when a job fails? Cron can email its output to MAILTO, but that only helps if a mail server is configured on the host, someone actually reads those emails, and the job produces output on failure — and many jobs do not. In practice those emails go to /dev/null, pile up in a mailbox nobody checks, or were never configured in the first place.

Can a script report success even though part of it failed? Yes. A bash script with multiple commands can mask failures: if a command in the middle fails but the last command succeeds, the script exits with code 0 and everything looks fine. A pg_dump that fails followed by an echo that succeeds produces a script that exits cleanly with no backup.

Why does my script work in the terminal but not under cron? Cron runs jobs in a minimal shell environment, so PATH, HOME, and other variables your script depends on may not be set. A script that works perfectly from your terminal can fail completely under cron because it cannot find a binary or a config file, which is why absolute paths for binaries and files are safer.

What is a dead man's switch for cron jobs? It inverts the question: instead of watching for failure, you watch for the absence of success. The job checks in at the end of a successful run by hitting a URL, and if that check-in does not arrive within the expected window, you get an alert — regardless of whether the cause was a crash, a hang, a timeout, a permission error, or a full disk.

How do I stop a bash script from continuing after a failed command? Put set -euo pipefail at the top of the script so it aborts on the first error instead of running to the end and exiting 0. This is the single change that turns a swallowed exit code into a visible failure, and it belongs in every bash cron script.

Why Cron Jobs Fail Silently (And How to Catch Them)

The Invisible Infrastructure

Common Causes of Silent Failures

1. Environment Variable Differences

2. Swallowed Exit Codes

3. Disk Space Exhaustion

4. Permission Changes

5. Network Timeouts

6. Resource Contention

Why Default Cron Monitoring Does Not Work

Email Output

Log Files

The Dead Man's Switch Approach

What Good Monitoring Looks Like

Prevention Checklist

Frequently asked questions about silent cron failures

Further reading

Related posts

Set up your first monitor.
It'll take 30 seconds.

The Invisible Infrastructure

Common Causes of Silent Failures

1. Environment Variable Differences

2. Swallowed Exit Codes

3. Disk Space Exhaustion

4. Permission Changes

5. Network Timeouts

6. Resource Contention

Why Default Cron Monitoring Does Not Work

Email Output

Log Files

The Dead Man's Switch Approach

What Good Monitoring Looks Like

Prevention Checklist

Frequently asked questions about silent cron failures

Further reading

Related posts

Set up your first monitor.It'll take 30 seconds.

Set up your first monitor.
It'll take 30 seconds.