How to Troubleshoot Common Failures

Each section below covers one failure — the log line you see, why it happens, and what to do. For the full error catalog with every error kind and retry formula, see the error reference.

Agent won't start¶

level=ERROR msg="worker run failed, non-retryable, releasing claim" error="agent: agent_not_found: claude not found in PATH"

The agent binary isn't installed or isn't on PATH.

Check whether the binary exists:
```
which claude
```

If it's installed under a different name or path, set agent.command:

agent:
  kind: claude-code
  command: /usr/local/bin/claude-code

For SSH workers, the binary must exist on every remote host. Exit code 127 in logs means the remote host is missing it:
```
ssh build01.internal "which claude && echo ok"
```
Confirm the fix: sortie validate ./WORKFLOW.md

Agent crashes on authentication¶

level=ERROR msg="worker run failed, scheduling retry" error="agent: port_exit: exit status 1"

Workers start and immediately crash. The actual cause — a missing ANTHROPIC_API_KEY — lives inside the agent subprocess, not in Sortie's error output. This is the most common deployment failure.

Verify the variable is set:
```
echo "${ANTHROPIC_API_KEY:-(unset)}"
```
For AWS Bedrock or Google Vertex AI, verify all required variables are set. See environment variables reference for the full list.
Run with --log-level debug to see the agent's stderr, which contains the actual auth error.

Tracker returns 401 or 403¶

level=ERROR msg="poll failed" error="tracker: tracker_auth_error: HTTP 401: Unauthorized"

The API token is wrong, expired, or lacks required permissions. This error is non-retryable — Sortie stops polling until you fix it.

Verify the environment variable resolves to a non-empty value:
```
echo "${SORTIE_JIRA_API_KEY:-(unset)}"
```

Test the token directly:

curl -s -H "Authorization: Bearer $SORTIE_JIRA_API_KEY" \
  "https://yourcompany.atlassian.net/rest/api/3/myself" | head -5

If you use handoff_state, the token needs write permissions: write:jira-work (classic) or write:issue:jira (granular).

Template render fails¶

level=ERROR msg="template render error in WORKFLOW.md (line 24): can't evaluate field titel in type map[string]any"

Sortie runs templates in strict mode — unknown variables are hard errors. Three common causes:

Typo in a field name. Check the name against the variable table. The error message names the exact field and line.
Unguarded nil field. .issue.parent is nil when no parent exists. Wrap it: {{ if .issue.parent }}{{ .issue.parent.identifier }}{{ end }}
Dot rebinding inside range. Inside {{ range .issue.labels }}, . is the current element. Use {{ $.issue.identifier }} to reach the root.

Run sortie validate ./WORKFLOW.md after every template edit to catch these before runtime.

Workspace won't create¶

level=ERROR msg="workspace create: permission denied: /opt/sortie_workspaces/PROJ-42"

Three variants:

Permission denied. The process user can't write to workspace.root. Fix permissions or change the root to a writable path like ~/sortie-workspaces.
Containment violation (path escapes root). An issue identifier produced a path outside the workspace root — a security boundary. Investigate the identifiers in your tracker.
Disk full. Check with df -h /opt/sortie_workspaces.

Hook script fails¶

level=WARN msg="worker run failed, scheduling retry" error="hook after_create: run: exit status 128"

A hook exited non-zero. after_create and before_run failures are fatal for the attempt; after_run and before_remove are logged but ignored.

Run with --log-level debug — Sortie captures the hook's stdout and stderr.

Test the hook manually:

mkdir /tmp/test-ws && cd /tmp/test-ws
git clone --depth 1 git@github.com:acme/backend.git .

Common causes: SSH key not forwarded, wrong repo URL, missing dependencies.

For timeout errors, increase hooks.timeout_ms in WORKFLOW.md.

Issues not being dispatched¶

level=INFO msg="tick completed" candidates=0 dispatched=0 running=0 retrying=0

Sortie is polling but finds nothing to dispatch.

State names must match exactly. Verify tracker.active_states matches your tracker (case-sensitive). "To Do" and "to do" are different states.
Use dry-run to see what Sortie would dispatch:
```
sortie --dry-run ./WORKFLOW.md
```
Each candidate gets a would_dispatch or skip_reason field in the log.
Concurrency cap reached. If running equals agent.max_concurrent_agents, new issues wait. Increase the cap or wait for running agents to finish.
Query filter too narrow. A typo in tracker.query_filter returns zero results. Use --dry-run --log-level debug to see the full query.

Sortie won't start at all¶

dispatch preflight failed: tracker.kind is required

Sortie validates the config at startup and reports all failures at once. Run sortie validate ./WORKFLOW.md to see every problem. The most common missing fields:

Field	Required by
`tracker.kind`	Always
`tracker.project`	Jira adapter
`tracker.api_key`	Jira adapter (after `$VAR` expansion)
`active_states` or `terminal_states`	At least one non-empty

If $VAR references aren't resolving, verify the variables are exported in the shell that runs Sortie:

env | grep SORTIE

See the workflow configuration reference for every field, default, and constraint.