Run Sortie in a Kubernetes cluster using plain manifests — a Deployment, PersistentVolumeClaim, ConfigMap, Service, and Secret. Sortie uses SQLite for persistence, so deployments are limited to a single replica. The manifests enforce this constraint with a Recreate strategy and a ReadWriteOnce volume.
Prerequisites¶
- A Kubernetes cluster (1.25+) with
kubectlconfigured - An agent-specific container image pushed to a registry your cluster can pull from (how to build one)
- A tested
WORKFLOW.md(quick start) - API credentials for your agent and tracker
Build and push your image¶
Sortie's published image is distroless — it contains only the binary. Build an agent-specific image using one of the example Dockerfiles, then push it to your container registry:
docker build -f examples/docker/claude-code.Dockerfile -t registry.example.com/sortie-claude:v1.0.0 .
docker push registry.example.com/sortie-claude:v1.0.0
For image building details, see How to use Sortie in Docker.
Create the namespace and Secret¶
Store API keys in a Kubernetes Secret. Never put credentials in ConfigMaps or environment variable literals in manifests.
Claude Code with Jira¶
kubectl create secret generic sortie-secrets \
--from-literal=ANTHROPIC_API_KEY="sk-..." \
--from-literal=SORTIE_JIRA_API_KEY="..." \
--from-literal=SORTIE_JIRA_ENDPOINT="https://your-org.atlassian.net" \
--from-literal=SORTIE_JIRA_PROJECT="PROJ"
Claude Code with GitHub Issues¶
kubectl create secret generic sortie-secrets \
--from-literal=ANTHROPIC_API_KEY="sk-..." \
--from-literal=SORTIE_GITHUB_TOKEN="ghp_..." \
--from-literal=SORTIE_GITHUB_PROJECT="owner/repo"
Copilot with GitHub Issues¶
kubectl create secret generic sortie-secrets \
--from-literal=GITHUB_TOKEN="ghp_..." \
--from-literal=SORTIE_GITHUB_TOKEN="ghp_..." \
--from-literal=SORTIE_GITHUB_PROJECT="owner/repo"
For tracker-specific credential details, see How to connect to Jira or How to connect to GitHub Issues.
Write the workflow ConfigMap¶
The ConfigMap holds the WORKFLOW.md that Sortie loads at startup. Edit the data section to match your tracker and agent configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: sortie-workflow
labels:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
app.kubernetes.io/part-of: sortie
data:
WORKFLOW.md: |
---
tracker:
kind: jira
endpoint: $SORTIE_JIRA_ENDPOINT
api_key: $SORTIE_JIRA_API_KEY
project: $SORTIE_JIRA_PROJECT
query_filter: "labels = 'agent-ready'"
active_states:
- To Do
- In Progress
in_progress_state: In Progress
handoff_state: Human Review
terminal_states:
- Done
- Won't Do
polling:
interval_ms: 45000
db_path: /home/sortie/data/.sortie.db
workspace:
root: /home/sortie/data/workspaces
agent:
kind: claude-code
command: claude
max_concurrent_agents: 2
server:
port: 7678
---
You are a senior engineer working on {{ .issue.identifier }}: {{ .issue.title }}
{{ if .issue.description }}
## Description
{{ .issue.description }}
{{ end }}
Tracker credentials use $VAR syntax — Sortie expands environment variables at runtime from the Secret. The workflow file itself contains no sensitive values.
For the full list of configuration fields, see the WORKFLOW.md configuration reference. For prompt template syntax, see How to write prompt templates.
Create the PersistentVolumeClaim¶
SQLite requires exclusive filesystem access. The PVC must use ReadWriteOnce:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sortie-data
labels:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
app.kubernetes.io/part-of: sortie
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
The 1Gi default is enough for months of run history and retry state. The SQLite database is small — a few megabytes even with thousands of completed sessions. The workspace root (where agents clone repos) lives inside this volume too, so increase the size if your repositories are large or you run many concurrent agents.
If your cluster has multiple storage classes, specify one explicitly:
spec:
storageClassName: standard-rwo
accessModes:
- ReadWriteOnce
For background on what Sortie persists and why it matters, see Why persistence changes everything.
Deploy the application¶
The Deployment runs a single replica with Recreate strategy. SQLite does not support concurrent writers, so scaling beyond one replica corrupts the database.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sortie
labels:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
app.kubernetes.io/part-of: sortie
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app.kubernetes.io/name: sortie
template:
metadata:
labels:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
app.kubernetes.io/part-of: sortie
spec:
terminationGracePeriodSeconds: 30
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: sortie
image: registry.example.com/sortie-claude:v1.0.0
args:
- "--host"
- "0.0.0.0"
- "--log-format"
- "json"
- "/home/sortie/config/WORKFLOW.md"
env:
- name: SORTIE_DB_PATH
value: /home/sortie/data/.sortie.db
ports:
- name: http
containerPort: 7678
protocol: TCP
envFrom:
- secretRef:
name: sortie-secrets
optional: false
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
startupProbe:
httpGet:
path: /readyz
port: http
failureThreshold: 30
periodSeconds: 2
livenessProbe:
httpGet:
path: /livez
port: http
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: http
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
volumeMounts:
- name: data
mountPath: /home/sortie/data
- name: workflow
mountPath: /home/sortie/config
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: data
persistentVolumeClaim:
claimName: sortie-data
- name: workflow
configMap:
name: sortie-workflow
- name: tmp
emptyDir:
sizeLimit: 64Mi
Key decisions in this manifest:
| Setting | Rationale |
|---|---|
replicas: 1 / Recreate |
SQLite requires exclusive access — no rolling updates, no concurrent pods |
runAsNonRoot / UID 1000 |
Matches the sortie user created in agent Dockerfiles. Claude Code refuses to run as root. |
readOnlyRootFilesystem |
Write access is restricted to the PVC mount and /tmp. Limits the blast radius if the container is compromised. |
fsGroup: 1000 |
Kubernetes sets group ownership on the PVC to match, so the non-root user can write to it |
--host 0.0.0.0 |
Binds the HTTP server to all interfaces so probes and the Service can reach it |
--log-format json |
Produces newline-delimited JSON for log aggregation. See How to monitor with logs. |
SORTIE_DB_PATH env var |
Configures the SQLite database path on the PVC. Also set via db_path in the workflow. |
/tmp emptyDir |
Some agent subprocesses and Go's os.CreateTemp need a writable temp directory |
Replace registry.example.com/sortie-claude:v1.0.0 with your actual image reference.
Expose the Service¶
A ClusterIP Service exposes the HTTP observability server within the cluster:
apiVersion: v1
kind: Service
metadata:
name: sortie
labels:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
app.kubernetes.io/part-of: sortie
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: sortie
app.kubernetes.io/component: orchestrator
ports:
- name: http
port: 7678
targetPort: http
protocol: TCP
This Service gives in-cluster access to the HTML dashboard, JSON API, Prometheus metrics, and health probes. To expose the dashboard externally, add an Ingress or LoadBalancer in front of it.
Apply the manifests¶
Apply everything at once from the example directory:
kubectl apply -f examples/k8s/
Or apply each manifest individually in order:
kubectl apply -f examples/k8s/pvc.yaml
kubectl apply -f examples/k8s/configmap.yaml
kubectl apply -f examples/k8s/deployment.yaml
kubectl apply -f examples/k8s/service.yaml
Verify the deployment¶
Check that the pod starts and passes health probes:
kubectl get pods -l app.kubernetes.io/name=sortie
Expected output:
NAME READY STATUS RESTARTS AGE
sortie-7b4f9c6d88-x2k4p 1/1 Running 0 45s
Inspect the startup logs:
kubectl logs -l app.kubernetes.io/name=sortie --tail=20
Confirm the readiness probe is passing:
kubectl get endpoints sortie
If the ENDPOINTS column shows an IP address, the pod is ready and the Service is routing traffic.
Port-forward to access the dashboard from your workstation:
kubectl port-forward svc/sortie 7678:7678
Open http://localhost:7678 to view the dashboard, or query the API:
curl -s http://localhost:7678/api/status | jq .
Update the workflow¶
To change the workflow without rebuilding the image, edit the ConfigMap:
kubectl edit configmap sortie-workflow
Kubernetes propagates ConfigMap changes to the mounted volume within the kubelet sync period (typically under 60 seconds). Because the ConfigMap is mounted as a directory (not via subPath), updates reach the container filesystem automatically.
Sortie's file watcher may not detect the Kubernetes symlink-swap mechanism that delivers these updates. If the new configuration is not picked up automatically, restart the pod:
kubectl rollout restart deployment sortie
Handle restarts and persistence¶
Sortie stores all durable state — retry queues, run history, session metadata, token counters — in SQLite on the PVC. When Kubernetes reschedules the pod (node drain, OOM kill, manual restart), the new pod mounts the same volume and resumes from the last committed transaction.
Test this by deleting the pod:
kubectl delete pod -l app.kubernetes.io/name=sortie
The Deployment controller recreates it. Check the logs for a warm-start message indicating that existing state was loaded. In-flight agent sessions that were interrupted are marked as timed-out and retried according to your retry configuration.
For a deeper look at what Sortie preserves across restarts, see How to resume sessions across restarts.
Monitor the deployment¶
Prometheus¶
If you run Prometheus in the cluster, add a scrape target or ServiceMonitor for the sortie Service on port 7678 at the /metrics endpoint. See How to monitor with Prometheus for PromQL queries and a Grafana dashboard.
Logs¶
JSON-formatted logs integrate with any Kubernetes log aggregation stack — Loki, Datadog, CloudWatch, ELK. Filter by structured fields like issue_id, session_id, or level:
kubectl logs -l app.kubernetes.io/name=sortie | jq 'select(.level == "ERROR")'
See How to monitor with logs for field descriptions and grep/jq patterns.
Production considerations¶
Resource limits¶
The default requests (100m CPU, 256Mi memory) and limits (500m CPU, 512Mi memory) are starting points. Sortie itself is lightweight, but agent subprocesses (Claude Code, Copilot) consume resources too. Monitor actual usage with kubectl top pod and adjust:
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
Storage sizing¶
The SQLite database grows slowly — a few megabytes per thousand completed sessions. The workspace root consumes more because it holds cloned repositories. Size the PVC based on the number of concurrent agents and the size of your repositories:
| Scenario | Recommended PVC size |
|---|---|
| 1–2 agents, small repos (< 100 MB each) | 1Gi |
| 2–5 agents, medium repos (100–500 MB each) | 5Gi |
| 5+ agents, large repos or monorepos | 10Gi+ |
Node affinity¶
The PVC uses ReadWriteOnce, which binds it to a single node. If the node goes down, the pod cannot reschedule until the volume detaches. For faster recovery, use a storage class that supports node-independent access (e.g., network-attached block storage like EBS, Persistent Disk, or Ceph RBD).
Security¶
The Deployment manifest follows Kubernetes pod security hardening guidelines:
- Runs as non-root with a fixed UID/GID
- Drops all Linux capabilities
- Uses a read-only root filesystem
- Applies a
RuntimeDefaultseccomp profile
If your cluster enforces Pod Security Standards, the manifest complies with the restricted profile. See Security model for Sortie's workspace isolation guarantees.
Graceful shutdown¶
Sortie handles SIGTERM for graceful shutdown. The terminationGracePeriodSeconds: 30 gives in-flight agent sessions time to checkpoint before the pod is killed. If your agent sessions are long-running, increase this value to avoid unnecessary retries.
Troubleshooting¶
Pod stays in Pending state: The PVC cannot be bound. Check that your cluster has a default storage class or that the PVC specifies one explicitly. Run kubectl describe pvc sortie-data to see the binding status.
Pod starts but crashes with CrashLoopBackOff: Inspect logs with kubectl logs -l app.kubernetes.io/name=sortie --previous. Common causes: missing Secret (check kubectl get secret sortie-secrets), invalid WORKFLOW.md syntax (test locally with sortie validate WORKFLOW.md), or wrong image reference.
Readiness probe fails: Sortie's /readyz endpoint returns HTTP 503 if any subsystem is unhealthy — database, workflow validation, or preflight checks. Port-forward and query the endpoint directly to see the per-subsystem status:
kubectl port-forward svc/sortie 7678:7678
curl -s http://localhost:7678/readyz | jq .
Permission denied on the data volume: The fsGroup: 1000 setting should handle ownership, but some storage drivers ignore it. Verify with:
kubectl exec -it deploy/sortie -- ls -la /home/sortie/data
If the directory is owned by root, your storage class may not support fsGroup. Add an init container to fix permissions:
initContainers:
- name: fix-permissions
image: busybox:1.36
command: ["sh", "-c", "chown -R 1000:1000 /home/sortie/data"]
volumeMounts:
- name: data
mountPath: /home/sortie/data
securityContext:
runAsUser: 0
SQLite database locked after crash: This can happen if the pod was killed without a graceful shutdown and the WAL file was not checkpointed. The next startup recovers automatically — SQLite replays the WAL on open. If the pod still fails, delete the -wal and -shm files from the data volume (Sortie recreates them):
kubectl exec -it deploy/sortie -- rm -f /home/sortie/data/.sortie.db-wal /home/sortie/data/.sortie.db-shm
Reference manifests¶
The Sortie repository maintains reference manifests that track the latest proven configuration:
| File | Description |
|---|---|
deployment.yaml |
Single-replica Deployment with Recreate strategy |
configmap.yaml |
Sample WORKFLOW.md mounted into the container |
service.yaml |
ClusterIP Service exposing port 7678 |
pvc.yaml |
1Gi ReadWriteOnce PVC for the SQLite database |