Overnight Runs
Overnight Runs
Leaving an AI coding agent to work autonomously for hours while you sleep. The premise: for large tasks that would take you 4-8 hours of interactive session work, set up a well-defined task, kick it off before bed, review results in the morning.
What Makes This Work
Three things have to hold for overnight runs to be productive:
-
Clear task definition: The agent needs a verifiable success condition. "Refactor the auth module to use JWT" is not good enough. "Refactor the auth module to use JWT, with all existing tests passing and no new TypeScript errors" is.
-
Persistent environment: The session has to survive unattended. exe-dev provides remote VMs for this. A local machine that sleeps, locks, or loses network connection will kill the session.
-
Observability: You need to know what happened without reading every step. Hooks for notifications (see below), commit logs, test results.
Setting Up Notifications
claude-code's hook system is the mechanism for overnight monitoring. The full list of hook events available: PreToolUse, PostToolUse, PostToolUseFailure, Stop, SessionStart, SessionEnd, UserPromptSubmit, FileChanged.
A Stop hook fires when the agent exits:
// .claude/settings.json
{
"hooks": {
"Stop": [{
"matcher": "",
"hooks": [{
"type": "command",
"command": "notify-me.sh"
}]
}]
}
}
notify-me.sh can hit a webhook, send a text via Twilio, post to Slack, or email. When you wake up and see a notification, you know the session ended.
Also hook PostToolUse on Bash tool calls to log every command executed — gives you a full audit trail.
Task Definition Checklist
Before starting an overnight run:
- Success condition is verifiable (tests pass, specific output exists, no errors)
- Agent has read access to all files it needs
- Agent has write access to only the files it should touch
- Git is in a clean state — agent should commit incrementally
- Stop hook is configured for notifications
- Irreversible operations (deletes, external API calls with side effects) are restricted or confirmed
What Goes Wrong
Infinite loops: Agent gets stuck trying to fix a test, makes the same mistake repeatedly. Mitigate with: max token limits, commit-check hooks that fail if no commits in N minutes, task framing that includes "if you're stuck after 3 attempts, stop and report".
Context exhaustion: Very long sessions exhaust context even with compaction. Break large tasks into sessions of 2-4 hours max. Checkpoint between sessions.
Cascading errors: Wrong early decision creates downstream problems. Include rollback instructions: "if you break more tests than were passing at start, revert and stop."
Silent failure: Agent completes "successfully" but task is wrong. Always verify against success condition before declaring done.
Good Overnight Run Tasks
- Comprehensive test coverage for an existing module (clear metric: coverage %)
- Migrating a codebase from one library version to another
- Implementing a well-specified feature with existing test suite
- Refactoring to a new pattern across many files
- Documentation generation from code
Bad Overnight Run Tasks
- Exploratory architecture work (needs human judgment at decision points)
- Tasks with external side effects that are hard to reverse
- Open-ended "improve this" tasks without clear metrics
- Tasks requiring credentials/access that aren't set up in the environment
Stack
exe-dev (persistent VM) + claude-code (agent) + hook system (notifications) + git (audit trail and rollback).
Related
exe-dev · claude-code · multi-agent-setup · agentic-workflows