Resume Sessions Across Restarts

How to Resume Agent Sessions Across Restarts

Keep Sortie’s state intact across planned restarts and unexpected crashes — no manual intervention, no lost work, no duplicated effort.

Prerequisites

A working Sortie setup (quick start)
Sortie running against a real or file-based tracker with at least one dispatched issue
Familiarity with your WORKFLOW.md configuration

What survives a restart

Everything. Sortie stores all durable state in SQLite, not in memory. A restart is equivalent to closing and reopening the database file. Three tables hold the data that matters:

Table	What it stores	Why it matters after restart
`retry_entries`	Pending retries: issue ID, attempt number, scheduled fire time	Retries resume at the correct position in the backoff sequence. Overdue retries fire immediately on startup.
`run_history`	Completed runs: issue ID, attempt, status, timestamps, workspace path	The `max_sessions` budget check queries this table. After restart, Sortie knows exactly how many sessions each issue has used — no counter resets.
`session_metadata`	Last session ID, token counters, model name, API request count	Enables agent session resume (e.g., the `--resume` flag for Claude Code). When the same issue is dispatched again, the adapter can pick up the previous session.

The key insight: Sortie never holds state that only exists in memory. Retry attempt counts, session budgets, and token tallies all come from SQLite queries. Kill the process at any point and nothing is lost.

What happens to in-flight sessions

When Sortie stops — whether from Ctrl+C, a termination signal, or a crash — any running agent processes receive a graceful shutdown signal, then are force-terminated after a 30-second grace period. The issues those agents were working on are left in a recoverable state:

Their tracker status hasn’t changed (still “In Progress” or whatever your active state is)
Their workspace directories remain on disk, untouched
They may or may not have a pending retry entry in SQLite, depending on when the stop happened

Here’s what the startup sequence does to pick them back up:

Sortie opens the database and loads all retry entries. Overdue entries (where the fire time has passed) are marked for immediate dispatch.
The poll loop starts and fetches candidate issues from the tracker.
Previously in-flight issues appear as candidates — they’re still in an active tracker state.
Sortie dispatches them again, reusing existing workspace directories.
The before_run hook runs in the existing workspace (for example, git pull to bring the workspace up to date).
The agent starts in that workspace with all previous work preserved on disk.

No special configuration is needed for this to work. It’s the default behavior.

If your before_run hook does a git fetch && git reset, the agent picks up exactly where the previous session left off. If you haven’t configured hooks, the workspace contains whatever files the agent wrote before the process stopped.

What happens to handoff-stage PRs

The poll-based recovery in the previous section finds in-flight issues because they’re still in an active tracker state. Handoff-stage issues are different: the agent has finished, the PR is open, and the tracker issue has moved to your tracker.handoff_state (for example, “In Review”). The dispatch loop no longer sees these issues as candidates, so a restart used to leave them with no pending CI or review polling until the reviewer manually pushed them back to an active state.

On startup, Sortie now reconstructs review and CI pending entries for handoff-stage issues from three persisted sources:

The latest successful run per issue in SQLite run_history.
The current tracker state for those issues (one batched FetchIssueStatesByIDs call).
The workspace’s existing .sortie/scm.json (PR coordinates for review, branch for CI).

After the recovery summary line, normal review and CI polling resumes against those entries on the next reconcile tick. Reviewer comments left while the process was down are picked up within one review poll interval. No operator action is required.

Recovery is bounded so startup stays cheap:

The candidate set is the most recent 200 unique issues with a successful run in the last 30 days.
An issue ages out when the SCM activity is older than 30 days. Sortie reads the pushed_at field from .sortie/scm.json for this check; if it’s missing, it falls back to the run’s completed_at timestamp.
If a successful run is older than 30 days and the PR has had no fresh push, recovery skips it. Move the issue back to an active tracker state to re-engage it.

Write pushed_at from the after_run hook that pushes the PR so handoffs age out on the most recent push instead of the agent’s completion time. See Configure review feedback and Configure CI feedback for the hook examples.

Check the startup log line to confirm recovery ran:

time=2026-05-14T10:00:01.500+00:00 level=INFO msg="pending reaction recovery completed" candidates=3 cap_skipped=0 state_checked=3 review_recovered=2 ci_recovered=2 stale_skipped=0 skipped=1

review_recovered and ci_recovered show what was reinstated. stale_skipped counts handoff candidates that aged out. skipped counts candidates excluded for any other reason (terminal state, claim conflict, missing PR coordinates, malformed metadata).

Design your hooks for restartability

Workspace paths are deterministic. Issue PROJ-42 always maps to the same directory: <workspace_root>/PROJ-42. The first dispatch creates it; every subsequent dispatch reuses it — including dispatches after a restart.

This means your hooks need to handle both cases:

after_create runs once, when the workspace directory is brand new. Use it for one-time setup like cloning a repository.
before_run runs before every agent attempt, including post-restart dispatches. Use it to refresh the workspace.
after_run runs after every agent attempt. Use it to preserve work.

Here’s a hook configuration that makes restarts seamless:

# WORKFLOW.md
workspace:
  root: /var/lib/sortie/workspaces
  hooks:
    after_create: |
      git clone git@github.com:acme/backend.git .
    before_run: |
      git fetch origin main && git reset --hard origin/main
    after_run: |
      git add -A && git commit -m "sortie: {{.issue.identifier}}" --allow-empty && git push

The pattern: after_create clones fresh. before_run pulls latest. after_run commits and pushes. After a restart, the workspace already exists, so after_create is skipped. before_run refreshes the checkout, and the agent starts with a clean working tree on top of any previously pushed commits.

For deeper coverage of hook patterns, see Set Up Workspace Hooks.

Verify persistence is working

Check the database file

Sortie creates a .sortie.db file in the same directory as your WORKFLOW.md. Confirm it exists after your first run:

ls -la /etc/sortie/.sortie.db

-rw-r--r-- 1 sortie sortie 32768 Mar 29 10:15 /etc/sortie/.sortie.db

If you’ve configured a custom db_path in your workflow file, check that path instead.

Read the startup logs

On startup, Sortie logs the database path and retry recovery. Look for these lines:

time=2026-03-29T10:00:01.100+00:00 level=INFO msg="database path resolved" db_path=/etc/sortie/.sortie.db

If there were pending retries from the previous run, you’ll see the orchestrator reconstruct them before entering the main loop. Issues with overdue retries fire immediately on the first tick.

For more on reading Sortie’s logs, see Monitor with Logs.

Use the dashboard

The built-in dashboard reads directly from SQLite. Run history, active sessions, and pending retries all reflect persisted state — they survive restarts along with everything else. See the Dashboard reference.

Test it yourself

The most convincing verification is a manual test:

# Start Sortie
sortie start ./WORKFLOW.md

# Wait for it to dispatch at least one issue (watch the logs)
# Then stop it
# Ctrl+C

# Restart immediately
sortie start ./WORKFLOW.md

After restart, confirm in the logs that:

The previously dispatched issue appears as a candidate on the first poll tick
The workspace is reused (look for workspace prepared without after_create firing)
If a retry was pending, the retry entry is loaded and the timer is reconstructed

What we configured

Nothing, actually. Persistence and restart recovery are built into Sortie’s default behavior. What you got from this guide:

Confidence that no work is lost. Retry queues, session budgets, and token accounting all survive restarts because they live in SQLite.
Understanding of the restart sequence. In-flight issues are re-discovered through the normal poll loop and dispatched into their existing workspaces.
A hook pattern for seamless restarts. after_create for one-time setup, before_run for refresh, after_run for preservation.
Verification steps. Database file check, startup log messages, dashboard inspection, and a manual restart test.

Related guides

Configure Retry Behavior — tune backoff timing and session budgets
Set Up Workspace Hooks — hook writing patterns and lifecycle details
Run as a systemd Service — automatic restart on failure with Restart=on-failure
Monitor with Logs — filter and interpret startup recovery messages

Was this page helpful?

Configure Dispatch Rules Control Agent Costs