Files
MOHPortal/docs/OPERATIONS_RUNBOOK.md

90 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Operations Runbook
This document captures the operational playbooks for the MerchantsOfHope Supply & Demand Portal. It is intended for on-call engineers and SREs maintaining the platform across Coolify environments.
## 1. Service Topology
- **Backend API (`merchantsofhope-supplyanddemandportal-backend`)**
- Node.js 18, Express server on port 3001.
- Entry point waits for PostgreSQL, runs migrations, optional seeding (`RUN_SEED`).
- Health probe: `GET /api/health`.
- Persistent uploads stored at `/app/uploads/resumes` (mounted volume required in production).
- **Frontend (`merchantsofhope-supplyanddemandportal-frontend`)**
- React 18 application served by the CRA dev server (in dev) or static bundle (in production image).
- Communicates with the backend via `REACT_APP_API_URL` (set to the internal service URL).
- **PostgreSQL (`merchantsofhope-supplyanddemandportal-database`)**
- PostgreSQL 15, health-checked via `pg_isready`.
- Volume-backed data directory `merchantsofhope-supplyanddemandportal-postgres-data`.
## 2. Environment Variables
| Variable | Purpose | Default |
| --- | --- | --- |
| `POSTGRES_*` | Database credentials used by backend and DB container | See `.env.example` |
| `DATABASE_URL` | Overrides assembled connection string | Derived automatically |
| `JWT_SECRET` | Required for signing auth tokens | none (must be supplied) |
| `RATE_LIMIT_MAX`, `RATE_LIMIT_WINDOW_MS` | Express rate limiter configuration | 100 req / 15 min |
| `DB_POOL_MAX`, `DB_POOL_IDLE_MS`, `DB_POOL_CONNECTION_TIMEOUT_MS` | pg connection pool tuning | 10 / 30000 / 5000 |
| `DB_WAIT_TIMEOUT_MS` | Maximum wait for database readiness in entrypoint | 60000 |
| `RUN_MIGRATIONS` | Run schema migrations on container boot | `true` |
| `RUN_SEED` | Run seed data on container boot | `false` |
| `USE_DOCKER_TEST_DB` | Jest helper flag (set to `false` in CI to reuse managed Postgres) | `true` locally |
| `UPLOAD_DIR` | Resume storage path | `uploads/resumes` |
## 3. Deployments (Coolify)
1. Ensure the Gitea pipeline has published new backend/frontend images (see workflow summary for SHA tags).
2. In Coolify, update `BACKEND_IMAGE` / `FRONTEND_IMAGE` environment variables to the new tags.
3. Trigger a deployment; Coolify will:
- Bring up PostgreSQL (if not already running).
- Start backend, wait for DB, run migrations, and expose `/api/health`.
- Start frontend once backend healthcheck passes.
4. Post-deploy checks:
- `curl https://<domain>/api/health` returns `200` with JSON payload.
- Frontend login screen reachable.
- Review container logs for migration output (`docker compose logs backend` in Coolify shell).
## 4. Rollback Procedure
1. Identify the previous known-good image tags (from Gitea workflow history or Coolify activity log).
2. Update `BACKEND_IMAGE` / `FRONTEND_IMAGE` to the old tags.
3. Redeploy in Coolify. Migrations are idempotent; no additional action needed.
4. Validate health endpoints and smoke-test the UI.
## 5. Local Development
- Run `docker compose up --build` to start the stack. The backend container waits for PostgreSQL, runs migrations automatically, and skips seeding by default. To seed once, run `RUN_SEED=true docker compose up backend` or execute `docker compose exec ... npm run seed` manually.
- `./scripts/run-ci-tests.sh` runs lint + unit tests with the same coverage thresholds as CI.
- Backend tests rely on Docker; ensure Docker Desktop/Engine is running.
## 6. Backup & Restore
### Database
- Use `docker compose exec merchantsofhope-supplyanddemandportal-database pg_dump -U <user> <db>` to generate a dump file.
- Restore via `psql` piping the dump into the running container.
### Uploads
- Archive the `merchantsofhope-supplyanddemandportal-uploads` volume (Coolify: Settings → Backups → Volume Snapshot).
## 7. Monitoring & Alerting
- Healthcheck endpoints should be wired into external monitoring (e.g., Uptime Kuma, Grafana Cloud).
- Rate limiter defaults protect against bursts; adjust `RATE_LIMIT_MAX` / `RATE_LIMIT_WINDOW_MS` if legitimate traffic patterns trigger 429s.
## 8. Incident Response Checklist
1. **Validate Health** `curl` backend health endpoint, inspect Coolify container logs.
2. **Check Database** `docker compose exec ... pg_isready` and `
docker compose exec ... psql -c 'SELECT NOW();'`.
3. **Restart Services** In Coolify or locally, redeploy backend/front containers (entrypoint will re-run migrations safely).
4. **Rollback if Needed** Follow rollback steps above.
5. **Postmortem** Capture root cause, update this runbook with remediation notes.
## 9. Security Posture
- JWT secrets must be at least 32 bytes and rotated regularly.
- Uploaded files are sanitized and stored on disk; configure antivirus scanning if compliance requires it.
- Rate limiting is enabled globally; consider pairing with IP allowlists at the reverse proxy if stricter controls are needed.
Keep this runbook updated as infrastructure evolves.