Skip to content

Performance and scale

Operational tuning for latency, throughput, and cost.

Phase numbering: PRODUCT_ROADMAP.md tracks product delivery (RAG features on main). This doc and archive/PHASED_IMPROVEMENT_ROADMAP.md describe a separate campus production scale track (Redis HA, caching, EB hardening). Phase 5 (retrieval stack) — multi-query, rerank — is shipped on main; see LANGGRAPH.md.

Campus track Phase 0 — Shipped on main

Change Config / code Docs
Chat history window CHAT_HISTORY_MAX_MESSAGES .env.example, LOAD_TESTING.md
Optional stream demo delay STREAM_ARTIFICIAL_DELAY_MS (default 0) .env.example
SQLAlchemy pool SQLALCHEMY_POOL_SIZE, SQLALCHEMY_MAX_OVERFLOW .env.example, OPERATIONS.md
Multi-worker API (EB) API_WORKERS in run_services.sh OPERATIONS.md
SSE first-token metric chatbot_chat_first_token_latency_seconds OPERATIONS.md

Runbooks: OPERATIONS.md. Load validation: LOAD_TESTING.md.


Documentation checklist — Campus Phase 1 (not implemented)

Goal: exact Redis response cache, deeper observability, realistic k6 mix.

Doc Update when Phase 1 lands
.env.example RESPONSE_CACHE_ENABLED, RESPONSE_CACHE_TTL_SECONDS, CACHE_BYPASS_HEADER
OPERATIONS.md Cache hit rate metric, invalidation on KB deploy
LOAD_TESTING.md Mixed scenario; cache warm vs cold
ARCHITECTURE.md Optional cache layer in chat flow
roadmap/archive/PHASED_IMPROVEMENT_ROADMAP.md Mark 1a–1c complete
EVALUATION.md Whether cached answers are excluded from RAGAS

Documentation checklist — Campus Phase 2 (partially superseded)

Goal (campus track): retrieval quality at scale.

Phase 5 already shipped multi-query, metadata filters, and LangGraph rerank (FlashRank + keyword) on main. When implementing remaining campus Phase 2 items (e.g. semantic cache, ingestion pipeline), update:

Doc Update
.env.example Any new cache / ingestion flags
EVALUATION.md RAGAS comparison after changes
roadmap/archive/PHASED_IMPROVEMENT_ROADMAP.md Mark completed slices
roadmap/LANGGRAPH.md New graph nodes if any

Documentation checklist — Campus Phase 3 (not implemented)

Goal: multi-instance reliability, idempotency, cost governance.

Doc Update when Phase 3 lands
.env.example Production REDIS_URL, idempotency TTL, budget caps
OPERATIONS.md Redis HA, idempotent chat retries
RELEASE.md Promote + cache flush notes
LOAD_TESTING.md Retry / duplicate POST scenarios