Operations and Observability¶
Runtime model (production)¶
- Run the API with multiple workers via gunicorn + uvicorn workers.
- Set
API_WORKERSaccording to available CPU (start with2 * cores, cap based on DB pool). run_services.shnow trapsSIGTERM/SIGINTand performs graceful child shutdown.
Example API startup:
Database migration workflow (Alembic)¶
Alembic scaffolding is included (alembic.ini, backend/alembic/).
Create migration:
Apply migrations:
Rollback one migration:
Deploy order¶
- Build artifact/container.
- Run
alembic upgrade headagainst target DB. - Start/restart API workers.
- Verify
/api/healthand/api/metrics.
Logging¶
| Setting | Purpose |
|---|---|
LOGGING_LEVEL |
App log level (INFO recommended for production) |
LOGGING_FORMAT |
Include %(request_id)s — wired via RequestIdFilter on all handlers |
LOGGING_LOCATION |
Rotating file path when LOG_TO_FILE=true (default backend_logs.log) |
LOG_JSON |
When true, emit one JSON object per line (for aggregators) |
LOG_TO_FILE |
Enable rotating file handler (10 MB × 20 backups) |
Privacy: JWT payloads and full chat queries are not logged at INFO. Use DEBUG locally for verbose auth/RAG text. Access lines: app.access logger (METHOD path status duration).
Local logs¶
*.log files (app.log, backend/app.log, etc.) are gitignored. After stopping uvicorn/Vite, remove them to reclaim disk:
Security¶
See SECURITY.md for pip-audit, bandit, and production hardening.
Metrics baseline¶
Exposed endpoints:
GET /api/metrics(Prometheus format)GET /api/metrics/db-pool(JSON snapshot)
Included metrics:
- HTTP request count/latency/errors
- Provider call latency and provider error reasons
- DB pool size/checkouts/overflow/usage ratio
Dashboard and alerts baseline¶
Suggested SLOs¶
- API availability:
>= 99.9%monthly (5xxresponses considered failures). - Auth / session API (no LLM):
p95 < 1.2s,p99 < 2.5sat steady load. - Chat with RAG (live LLM + retrieval): use phase-aware targets — see LOAD_TESTING.md (
K6_LATENCY_PROFILE=liveallows chatp95up to ~45s under ramp;mockprofile targets sub-second HTTP). - SSE time-to-first-token: track
chatbot_chat_first_token_latency_seconds(lower is better; dominated by condense + retrieve on live providers). - Error budget:
<= 0.1%failed requests over 30 days.
Alerts¶
- High 5xx rate:
rate(chatbot_http_requests_total{status_code=~"5.."}[5m]) > 0.02. - Slow auth/session: same histogram on
/api/auth/*and/api/chat/sessionswith> 1.2threshold. - Slow buffered chat:
histogram_quantile(0.95, sum(rate(chatbot_http_request_latency_seconds_bucket{path="/api/chat/chat"}[5m])) by (le)) > 45for live RAG, or> 1.2when using mock providers. - Slow first token:
histogram_quantile(0.95, rate(chatbot_chat_first_token_latency_seconds_bucket[5m])) > 30(tune per provider). - Provider failures spike:
increase(chatbot_provider_errors_total[10m]) > 20. - DB pool pressure:
avg_over_time(chatbot_db_pool_usage_ratio[5m]) > 0.85.
Runbook¶
High latency¶
- Check
chatbot_http_request_latency_secondsby path. - Check
chatbot_provider_latency_secondsfor degraded providers. - Increase
API_WORKERSonly if CPU has headroom. - If DB bound, raise
SQLALCHEMY_POOL_SIZEand validate DB connection limits.
Elevated 5xx¶
- Correlate with
chatbot_provider_errors_totaland application logs. - If provider timeouts dominate, tune
PROVIDER_TIMEOUT_SECONDSand retries. - If DB pool saturation is high, increase pool and reduce worker count temporarily.
DB pool saturation¶
- Inspect
/api/metrics/db-pooland dashboard forusage_ratioand overflow. - Reduce worker count or per-worker concurrency to lower connection demand.
- Increase DB max connections and corresponding app pool settings.
OAuth and local development¶
- Enable providers in
.env:OAUTH_ENABLED_PROVIDERS=github(orgoogle,github) plus client ID/secret vars (see.env.example). - Local dev: OAuth runs on the API (
OAUTH_REDIRECT_BASE_URL=http://127.0.0.1:8000); Vue usesVITE_OAUTH_API_URLand/oauth/handoff. Full checklist: PRODUCTION_TLS.md — Local OAuth. - Verify setup:
./scripts/verify_oauth.py(repo root, venv active). - Production HTTPS, redirect URIs, and
AUTH_COOKIE_SECURE: PRODUCTION_TLS.md.