Operations
This page covers production deployment, backup and disaster recovery, and change management for Mantle.
Production Deployment
Using the Helm Chart
The recommended way to run Mantle in production is with the included Helm chart. It configures health probes, resource limits, and environment variables:
helm install mantle charts/mantle \
--set database.url="postgres://mantle:secret@db.internal:5432/mantle?sslmode=require" \
--set encryption.key="your-hex-key"
The Helm chart configures:
- Liveness probe pointing to
/healthz - Readiness probe pointing to
/readyz SIGTERMas the termination signal (aligns with Mantle’s graceful shutdown)
Health Probes
Configure your load balancer or orchestrator to use the health endpoints:
| Probe | Endpoint | Recommended Interval |
|---|---|---|
| Liveness | GET /healthz | 10s |
| Readiness | GET /readyz | 5s |
The readiness probe returns a non-200 status when the database connection is lost, which causes the load balancer to stop routing traffic to the unhealthy instance.
Environment Variables
In production, pass configuration through environment variables rather than config files:
export MANTLE_DATABASE_URL="postgres://mantle:secret@db.internal:5432/mantle?sslmode=require"
export MANTLE_API_ADDRESS=":8080"
export MANTLE_ENCRYPTION_KEY="your-64-char-hex-key"
export MANTLE_LOG_LEVEL="warn"
mantle serve
See Configuration for the full list of environment variables.
Migrations
The server runs migrations automatically on startup. You do not need a separate mantle init step in your deployment pipeline. This is safe to run with multiple replicas — migrations use database-level locking to prevent conflicts.
Backup and Disaster Recovery
Postgres is Mantle’s single point of state. All workflow definitions, execution history, step checkpoints, encrypted credentials, and audit events live in the database. Recovery from any failure depends on having a good database backup and access to your encryption key.
What to Back Up
| Asset | Location | Notes |
|---|---|---|
| Postgres database | Your database host | Contains all Mantle state: definitions, executions, credentials, audit events |
| Encryption key | MANTLE_ENCRYPTION_KEY env var or mantle.yaml | Required to decrypt credentials; store separately from database backups |
mantle.yaml | Configuration file on disk | Can be reconstructed, but easier to back up |
| Workflow YAML files | Version control (Git) | Authoritative source; can be re-applied with mantle apply |
Critical: Store the encryption key separately from database backups. If an attacker obtains both the database dump and the encryption key, they can decrypt all stored credentials.
Recommended Backup Approach
Managed Postgres (RDS, Cloud SQL, Azure Database):
- Enable automated daily snapshots with your cloud provider
- Enable WAL archiving (point-in-time recovery) for near-zero RPO
- Retain snapshots according to your compliance requirements
Self-hosted Postgres:
- Schedule
pg_dumpon a cron job (e.g., daily or hourly depending on RPO requirements):
pg_dump -Fc -h localhost -U mantle mantle > /backups/mantle-$(date +%Y%m%d-%H%M%S).dump
- Configure WAL archiving for continuous point-in-time recovery
- Ship backups to off-site storage (S3, GCS, or another region)
Recovery Procedure
- Restore Postgres from backup. Use your cloud provider’s restore flow or
pg_restore:
pg_restore -h localhost -U mantle -d mantle /backups/mantle-20260322-120000.dump
- Verify migration state. Run
mantle initto confirm all migrations are applied:
mantle init
- Verify credentials. Confirm the encryption key matches the one used when the backup was taken:
mantle secrets list
- Resume the server:
mantle serve
- Re-apply workflow YAML files if needed. Workflow definitions are stored in Postgres, so a database restore brings them back. However, if you need to apply changes that were made after the backup was taken, re-apply from version control:
mantle apply workflows/*.yaml
RPO and RTO Guidance
| Backup Strategy | Recovery Point Objective (RPO) | Recovery Time Objective (RTO) |
|---|---|---|
Daily pg_dump | Up to 24 hours of data loss | Minutes (restore + restart) |
Hourly pg_dump | Up to 1 hour of data loss | Minutes |
| WAL archiving (PITR) | Near-zero (seconds of data loss) | Minutes to restore to a point in time |
| Managed snapshots + WAL | Near-zero | Depends on cloud provider restore time |
For production deployments, WAL archiving combined with periodic base backups gives the best balance of RPO and operational simplicity.
Change Management
All changes to Mantle’s codebase and deployment follow a controlled process.
- Pull request review. All changes go through PR review on GitHub. At least one approval is required before merging to
main. - CI must pass. Every PR runs the full CI pipeline:
go test,go vet,golangci-lint,govulncheck, andgosec. PRs with failing checks are not merged. - Production deployments. Deploy via Helm with migration hooks. The Helm chart runs
mantle initas a pre-install/pre-upgrade hook to apply database migrations before the new binary starts serving traffic. - Rollback. If a deployment introduces a problem, roll back with
helm rollbackand verify the migration state:
helm rollback mantle <revision>
mantle migrate status
Mantle migrations are forward-only by design. Rolling back the binary to an older version is safe as long as the database schema is compatible. Check mantle migrate status to confirm.