Overnight Runs

3 min · workflow

Overnight Runs

Leaving an AI coding agent to work autonomously for hours while you sleep. The premise: for large tasks that would take you 4-8 hours of interactive session work, set up a well-defined task, kick it off before bed, review results in the morning.

What Makes This Work

Three things have to hold for overnight runs to be productive:

Clear task definition: The agent needs a verifiable success condition. "Refactor the auth module to use JWT" is not good enough. "Refactor the auth module to use JWT, with all existing tests passing and no new TypeScript errors" is.
Persistent environment: The session has to survive unattended. exe-dev provides remote VMs for this. A local machine that sleeps, locks, or loses network connection will kill the session.
Observability: You need to know what happened without reading every step. Hooks for notifications (see below), commit logs, test results.

Setting Up Notifications

claude-code's hook system is the mechanism for overnight monitoring. The full list of hook events available: PreToolUse, PostToolUse, PostToolUseFailure, Stop, SessionStart, SessionEnd, UserPromptSubmit, FileChanged.

A Stop hook fires when the agent exits:

// .claude/settings.json
{
  "hooks": {
    "Stop": [{
      "matcher": "",
      "hooks": [{
        "type": "command",
        "command": "notify-me.sh"
      }]
    }]
  }
}

notify-me.sh can hit a webhook, send a text via Twilio, post to Slack, or email. When you wake up and see a notification, you know the session ended.

Also hook PostToolUse on Bash tool calls to log every command executed — gives you a full audit trail.

Task Definition Checklist

Before starting an overnight run:

Success condition is verifiable (tests pass, specific output exists, no errors)
Agent has read access to all files it needs
Agent has write access to only the files it should touch
Git is in a clean state — agent should commit incrementally
Stop hook is configured for notifications
Irreversible operations (deletes, external API calls with side effects) are restricted or confirmed

What Goes Wrong

Infinite loops: Agent gets stuck trying to fix a test, makes the same mistake repeatedly. Mitigate with: max token limits, commit-check hooks that fail if no commits in N minutes, task framing that includes "if you're stuck after 3 attempts, stop and report".

Context exhaustion: Very long sessions exhaust context even with compaction. Break large tasks into sessions of 2-4 hours max. Checkpoint between sessions.

Cascading errors: Wrong early decision creates downstream problems. Include rollback instructions: "if you break more tests than were passing at start, revert and stop."

Silent failure: Agent completes "successfully" but task is wrong. Always verify against success condition before declaring done.

Good Overnight Run Tasks

Comprehensive test coverage for an existing module (clear metric: coverage %)
Migrating a codebase from one library version to another
Implementing a well-specified feature with existing test suite
Refactoring to a new pattern across many files
Documentation generation from code

Bad Overnight Run Tasks

Exploratory architecture work (needs human judgment at decision points)
Tasks with external side effects that are hard to reverse
Open-ended "improve this" tasks without clear metrics
Tasks requiring credentials/access that aren't set up in the environment

Stack

exe-dev (persistent VM) + claude-code (agent) + hook system (notifications) + git (audit trail and rollback).

exe-dev · claude-code · multi-agent-setup · agentic-workflows

Sources

linked from

Agentic Workflows Computer Use Claude Sonnet 4.6 Gemma 4 Amp Code Claude Code Codex CLI Cursor exe.dev Ollama Multi-Agent Setup

Overnight Runs

Overnight Runs

What Makes This Work

Setting Up Notifications

Task Definition Checklist

What Goes Wrong

Good Overnight Run Tasks

Bad Overnight Run Tasks

Stack

Related

Sources