feat: harden containers and ci

This commit is contained in:
2025-10-16 22:56:33 -05:00
parent c51604fdb7
commit 8ca2756d7b
14 changed files with 293 additions and 17 deletions

View File

@@ -0,0 +1,88 @@
# Operations Runbook
This document captures the operational playbooks for the MerchantsOfHope Supply & Demand Portal. It is intended for on-call engineers and SREs maintaining the platform across Coolify environments.
## 1. Service Topology
- **Backend API (`merchantsofhope-supplyanddemandportal-backend`)**
- Node.js 18, Express server on port 3001.
- Entry point waits for PostgreSQL, runs migrations, optional seeding (`RUN_SEED`).
- Health probe: `GET /api/health`.
- Persistent uploads stored at `/app/uploads/resumes` (mounted volume required in production).
- **Frontend (`merchantsofhope-supplyanddemandportal-frontend`)**
- React 18 application served by the CRA dev server (in dev) or static bundle (in production image).
- Communicates with the backend via `REACT_APP_API_URL` (set to the internal service URL).
- **PostgreSQL (`merchantsofhope-supplyanddemandportal-database`)**
- PostgreSQL 15, health-checked via `pg_isready`.
- Volume-backed data directory `merchantsofhope-supplyanddemandportal-postgres-data`.
## 2. Environment Variables
| Variable | Purpose | Default |
| --- | --- | --- |
| `POSTGRES_*` | Database credentials used by backend and DB container | See `.env.example` |
| `DATABASE_URL` | Overrides assembled connection string | Derived automatically |
| `JWT_SECRET` | Required for signing auth tokens | none (must be supplied) |
| `RATE_LIMIT_MAX`, `RATE_LIMIT_WINDOW_MS` | Express rate limiter configuration | 100 req / 15 min |
| `DB_POOL_MAX`, `DB_POOL_IDLE_MS`, `DB_POOL_CONNECTION_TIMEOUT_MS` | pg connection pool tuning | 10 / 30000 / 5000 |
| `DB_WAIT_TIMEOUT_MS` | Maximum wait for database readiness in entrypoint | 60000 |
| `RUN_MIGRATIONS` | Run schema migrations on container boot | `true` |
| `RUN_SEED` | Run seed data on container boot | `false` |
| `UPLOAD_DIR` | Resume storage path | `uploads/resumes` |
## 3. Deployments (Coolify)
1. Ensure the Gitea pipeline has published new backend/frontend images (see workflow summary for SHA tags).
2. In Coolify, update `BACKEND_IMAGE` / `FRONTEND_IMAGE` environment variables to the new tags.
3. Trigger a deployment; Coolify will:
- Bring up PostgreSQL (if not already running).
- Start backend, wait for DB, run migrations, and expose `/api/health`.
- Start frontend once backend healthcheck passes.
4. Post-deploy checks:
- `curl https://<domain>/api/health` returns `200` with JSON payload.
- Frontend login screen reachable.
- Review container logs for migration output (`docker compose logs backend` in Coolify shell).
## 4. Rollback Procedure
1. Identify the previous known-good image tags (from Gitea workflow history or Coolify activity log).
2. Update `BACKEND_IMAGE` / `FRONTEND_IMAGE` to the old tags.
3. Redeploy in Coolify. Migrations are idempotent; no additional action needed.
4. Validate health endpoints and smoke-test the UI.
## 5. Local Development
- Run `docker compose up --build` to start the stack. The backend container waits for PostgreSQL, runs migrations automatically, and skips seeding by default. To seed once, run `RUN_SEED=true docker compose up backend` or execute `docker compose exec ... npm run seed` manually.
- `./scripts/run-ci-tests.sh` runs lint + unit tests with the same coverage thresholds as CI.
- Backend tests rely on Docker; ensure Docker Desktop/Engine is running.
## 6. Backup & Restore
### Database
- Use `docker compose exec merchantsofhope-supplyanddemandportal-database pg_dump -U <user> <db>` to generate a dump file.
- Restore via `psql` piping the dump into the running container.
### Uploads
- Archive the `merchantsofhope-supplyanddemandportal-uploads` volume (Coolify: Settings → Backups → Volume Snapshot).
## 7. Monitoring & Alerting
- Healthcheck endpoints should be wired into external monitoring (e.g., Uptime Kuma, Grafana Cloud).
- Rate limiter defaults protect against bursts; adjust `RATE_LIMIT_MAX` / `RATE_LIMIT_WINDOW_MS` if legitimate traffic patterns trigger 429s.
## 8. Incident Response Checklist
1. **Validate Health** `curl` backend health endpoint, inspect Coolify container logs.
2. **Check Database** `docker compose exec ... pg_isready` and `
docker compose exec ... psql -c 'SELECT NOW();'`.
3. **Restart Services** In Coolify or locally, redeploy backend/front containers (entrypoint will re-run migrations safely).
4. **Rollback if Needed** Follow rollback steps above.
5. **Postmortem** Capture root cause, update this runbook with remediation notes.
## 9. Security Posture
- JWT secrets must be at least 32 bytes and rotated regularly.
- Uploaded files are sanitized and stored on disk; configure antivirus scanning if compliance requires it.
- Rate limiting is enabled globally; consider pairing with IP allowlists at the reverse proxy if stricter controls are needed.
Keep this runbook updated as infrastructure evolves.