Agents going quiet for 48 hours is not resting. It is broken.

May 7, 2026
An agent that hasn't logged activity in 48 hours is not on a break. It is silently broken.

An AI agent that hasn't logged activity in 48 hours is not on a break. It is silently broken. And almost every team running autonomous agents finds out the wrong way.



The failure mode nobody warns you about


The agent fires on schedule. The webhook returns 200. The integration logs success. And the actual work product never lands.


A draft that never reached the database. An email that never sent. A calendar event that never materialised. Every component along the path reports green, but the outcome is missing. This is the dominant failure mode of autonomous-agent systems and the reason most teams underestimate how much monitoring they need.



Triggers vs. heartbeats


The standard observability stack monitors triggers: did the cron fire, did the API return 200, did the queue advance. None of that tells you whether the work actually happened.


The fix is to monitor the heartbeat instead. A heartbeat is the artifact the agent must produce on every run: a row it touches, a column it updates, a file it writes. If the heartbeat is missing for longer than the agent's cadence allows, something upstream is broken even if every status code is green.


Heartbeats are cheap. Most agents already produce one as a side effect of doing their job. The work is naming it, recording its timestamp, and watching the gap.



What this looks like in practice


Three states per agent: green (active in the last cadence window), amber (one window late), red (two or more late). One row per agent, sorted by severity. The operator scans it for thirty seconds in the morning. Amber gets investigated before it goes red.


A real example from April. Our LinkedIn agent went amber on a Tuesday morning. The cause turned out to be a credential rotation upstream that we had missed during a vendor migration. Total fix time: 90 seconds. No drafts were missed because the dashboard surfaced the gap before any downstream artifact was due.


Without the heartbeat panel, we would have found out the following Monday when a brief failed to appear in someone's queue. Six days of compounded silence.



The cost of the alternative


You will hear two arguments against this. First: "we have logging, we will catch it." You will not. Logs surface what happened. They do not surface what was supposed to happen and did not. Second: "this is over-engineering." It is thirty seconds of operator attention a day for an early-warning system on every autonomous workflow you run. The first time it saves you a cascading incident, it has paid for itself for the year.



The takeaway


Build for the silent failure, not the loud one. Loud failures are easy: a 500 response, a queue backed up, an alert fires. Silent failures are the ones that age into incidents. Heartbeats are how you turn silent failures into loud ones, which is the only kind of failure you can fix.