Files
MOHPortal/docs/OPERATIONS_RUNBOOK.md

4.9 KiB
Raw Blame History

Operations Runbook

This document captures the operational playbooks for the MerchantsOfHope Supply & Demand Portal. It is intended for on-call engineers and SREs maintaining the platform across Coolify environments.

1. Service Topology

  • Backend API (merchantsofhope-supplyanddemandportal-backend)
    • Node.js 18, Express server on port 3001.
    • Entry point waits for PostgreSQL, runs migrations, optional seeding (RUN_SEED).
    • Health probe: GET /api/health.
    • Persistent uploads stored at /app/uploads/resumes (mounted volume required in production).
  • Frontend (merchantsofhope-supplyanddemandportal-frontend)
    • React 18 application served by the CRA dev server (in dev) or static bundle (in production image).
    • Communicates with the backend via REACT_APP_API_URL (set to the internal service URL).
  • PostgreSQL (merchantsofhope-supplyanddemandportal-database)
    • PostgreSQL 15, health-checked via pg_isready.
    • Volume-backed data directory merchantsofhope-supplyanddemandportal-postgres-data.

2. Environment Variables

Variable Purpose Default
POSTGRES_* Database credentials used by backend and DB container See .env.example
DATABASE_URL Overrides assembled connection string Derived automatically
JWT_SECRET Required for signing auth tokens none (must be supplied)
RATE_LIMIT_MAX, RATE_LIMIT_WINDOW_MS Express rate limiter configuration 100 req / 15 min
DB_POOL_MAX, DB_POOL_IDLE_MS, DB_POOL_CONNECTION_TIMEOUT_MS pg connection pool tuning 10 / 30000 / 5000
DB_WAIT_TIMEOUT_MS Maximum wait for database readiness in entrypoint 60000
RUN_MIGRATIONS Run schema migrations on container boot true
RUN_SEED Run seed data on container boot false
USE_DOCKER_TEST_DB Jest helper flag (set to false in CI to reuse managed Postgres) true locally
UPLOAD_DIR Resume storage path uploads/resumes

3. Deployments (Coolify)

  1. Ensure the Gitea pipeline has published new backend/frontend images (see workflow summary for SHA tags).
  2. In Coolify, update BACKEND_IMAGE / FRONTEND_IMAGE environment variables to the new tags.
  3. Trigger a deployment; Coolify will:
    • Bring up PostgreSQL (if not already running).
    • Start backend, wait for DB, run migrations, and expose /api/health.
    • Start frontend once backend healthcheck passes.
  4. Post-deploy checks:
    • curl https://<domain>/api/health returns 200 with JSON payload.
    • Frontend login screen reachable.
    • Review container logs for migration output (docker compose logs backend in Coolify shell).

4. Rollback Procedure

  1. Identify the previous known-good image tags (from Gitea workflow history or Coolify activity log).
  2. Update BACKEND_IMAGE / FRONTEND_IMAGE to the old tags.
  3. Redeploy in Coolify. Migrations are idempotent; no additional action needed.
  4. Validate health endpoints and smoke-test the UI.

5. Local Development

  • Run docker compose up --build to start the stack. The backend container waits for PostgreSQL, runs migrations automatically, and skips seeding by default. To seed once, run RUN_SEED=true docker compose up backend or execute docker compose exec ... npm run seed manually.
  • ./scripts/run-ci-tests.sh runs lint + unit tests with the same coverage thresholds as CI.
  • Backend tests rely on Docker; ensure Docker Desktop/Engine is running.

6. Backup & Restore

Database

  • Use docker compose exec merchantsofhope-supplyanddemandportal-database pg_dump -U <user> <db> to generate a dump file.
  • Restore via psql piping the dump into the running container.

Uploads

  • Archive the merchantsofhope-supplyanddemandportal-uploads volume (Coolify: Settings → Backups → Volume Snapshot).

7. Monitoring & Alerting

  • Healthcheck endpoints should be wired into external monitoring (e.g., Uptime Kuma, Grafana Cloud).
  • Rate limiter defaults protect against bursts; adjust RATE_LIMIT_MAX / RATE_LIMIT_WINDOW_MS if legitimate traffic patterns trigger 429s.

8. Incident Response Checklist

  1. Validate Health curl backend health endpoint, inspect Coolify container logs.
  2. Check Database docker compose exec ... pg_isready and docker compose exec ... psql -c 'SELECT NOW();'.
  3. Restart Services In Coolify or locally, redeploy backend/front containers (entrypoint will re-run migrations safely).
  4. Rollback if Needed Follow rollback steps above.
  5. Postmortem Capture root cause, update this runbook with remediation notes.

9. Security Posture

  • JWT secrets must be at least 32 bytes and rotated regularly.
  • Uploaded files are sanitized and stored on disk; configure antivirus scanning if compliance requires it.
  • Rate limiting is enabled globally; consider pairing with IP allowlists at the reverse proxy if stricter controls are needed.

Keep this runbook updated as infrastructure evolves.