Architecture¶
Campus RAG Assistant is a retrieval-augmented chat product: a FastAPI backend, a Vue 3 SPA (frontend-vue/), and an optional Streamlit client on the same REST API.
For design goals and decision rationale, see DESIGN.md.
Evolution from upstream ets-berkeley-edu/chabot: dual frontends, pluggable AWS / Azure / mock providers, Bedrock Knowledge Base retrieval over OpenSearch Serverless (replacing direct OpenSearch client calls from the app), SSE streaming, JWT cookie auth, Alembic migrations, and Prometheus metrics.
System architecture¶
Diagrams live in docs/assets/. The v2 overview is in the root README. Below: detailed v2, then v1 (upstream ets-berkeley-edu/chabot) for comparison.
Detailed (v2)¶

Upstream reference (v1)¶
Original upstream chabot architecture (Streamlit-only UI, LangChain → OpenSearch + Bedrock directly):

Diagram notes¶
| Area | Upstream chabot (v1) | Campus RAG Assistant (v2) |
|---|---|---|
| UI | Streamlit only | Vue 3 SPA (primary); Streamlit optional, same API |
| API | Chat endpoints | SSE POST /api/chat/stream, sessions CRUD, feedback, sources |
| Auth | — | JWT in HTTP-only cookies (/api/auth/*) |
| Retrieval (AWS) | LangChain → OpenSearch directly | Bedrock Knowledge Base API (AmazonKnowledgeBasesRetriever); vector store: OpenSearch Serverless (vector/keyword/hybrid index) behind the KB |
| Retrieval (Azure) | — | Azure AI Search vector + keyword/hybrid index; Azure OpenAI embeddings |
| LLM | Bedrock only | Bedrock or Azure OpenAI or mock via LLM_PROVIDER |
| DB | PostgreSQL | PostgreSQL + Alembic (no create_all in production) |
| Ops | LangSmith | LangSmith + Prometheus (/api/metrics, pool snapshot, first-token histogram); chat history capped via CHAT_HISTORY_MAX_MESSAGES — PERFORMANCE.md |
| Quality | — | RAGAS harness (backend/tests/eval/), k6 load tests |
| Deploy | EB + Nginx + Terraform | Same pattern; run_services.sh starts API (+ Streamlit on EB); Vue often hosted separately (CDN/static) with FRONTEND_URL / CORS |
| Asset | Description |
|---|---|
architecture_v2.png |
High-level overview — shown in README |
architecture_detailed_v2.png |
Current architecture with component detail |
architecture_v1.png |
Upstream chabot (historical reference) |
AWS retrieval: Bedrock Knowledge Base and OpenSearch¶
On AWS, the application calls the Bedrock Knowledge Base retrieve API—not OpenSearch HTTP endpoints directly. In a typical deployment:
App (LangChain AmazonKnowledgeBasesRetriever)
→ Bedrock Knowledge Base (retrieve, metadata filters)
→ OpenSearch Serverless (vector index + chunk storage)
| Component | Role |
|---|---|
| Bedrock Knowledge Base | Managed RAG entry point: sync connectors, chunking, retrieve API, citation metadata |
| OpenSearch Serverless | Vector (and often hybrid) index backing the KB; ingestion and index lifecycle owned by AWS |
| ServiceNow / LMS corpus | Source content ingested into the KB (e.g. knowledge articles synced to the index) |
v1 (upstream chabot) invoked OpenSearch from application code. v2 keeps OpenSearch in the platform stack but routes retrieval through the KB API for simpler ops and consistent metadata filters (build_bedrock_vector_filter in backend/app/services/retrieval.py).
Azure path uses Azure AI Search instead of OpenSearch—same provider pattern, different backing service.
Chat request flow¶
sequenceDiagram
participant UI as Vue SPA
participant API as FastAPI /api/chat
participant RAG as RAGService
participant Graph as LangGraph (optional)
participant KB as Provider retriever
UI->>API: POST /stream (SSE) + research_mode
API->>RAG: stream_query / query + history
alt RAG_ENGINE=langgraph
RAG->>Graph: condense / multi_query / retrieve / rerank
Graph->>KB: retrieve (KB path)
KB-->>Graph: documents + metadata
Graph-->>RAG: answer + sources
else RAG_ENGINE=chain
RAG->>KB: retrieve context
KB-->>RAG: documents + metadata
end
RAG-->>API: tokens + sources (+ source_kind)
API-->>UI: SSE status/token/done
UI->>UI: normalize markdown, render + sources panel
- Streaming (preferred):
POST /api/chat/streamemits Server-Sent Events (token, thendonewith sources). The Vue store appends tokens live, then persists the final message. - Buffered fallback:
POST /api/chat/chatreturns the full assistant message when streaming fails or is disabled. - Sessions: Messages belong to a
ChatSessionper user; history is passed into the LangChain conversational chain for follow-up questions. - Answer shape: The model is instructed via
backend/app/templates/prompt_prefix.txtto use a consistent Markdown template (summary →##sections → bold lead-ins → bullets / numbered steps). Backend and frontend apply light sanitization only (drop prompt leakage, optional**Title**→## Title); they do not rewrite structure with topic-specific heuristics.
Backend¶
- Entry:
backend/app/main.pybuilds the FastAPI app; runs SQLAlchemycreate_allonly in dev/test (production uses Alembic); configures CORS, and mounts routers under/api/authand/api/chat. - Configuration: Pydantic settings in
backend/app/config/default.py, loaded viabackend/app/core/config_manager.pyfrom layered.envfiles (APP_ENV, repo root.env,.env.{APP_ENV}). - Auth: JWT plus HTTP-only cookies (
/api/auth/login-json, register, OAuth via/api/auth/oauth/{provider}/…; dev uses API-port callback (OAUTH_REDIRECT_BASE_URLon:8000) and one-time redirect to Vue/oauth/handoff— PRODUCTION_TLS.md. CookieSecureandSameSitefollowAUTH_COOKIE_*settings (see.env.example, PRODUCTION_TLS.md). - RAG:
backend/app/services/rag.py—RAG_ENGINE=chain(default in tests via conftest) uses a LangChain conversational retrieval chain;RAG_ENGINE=langgraphrunsbackend/app/services/graph/with KB path condense → multi_query → retrieve → rerank → generate → format (web path skips rerank; see LANGGRAPH.md, WEB_RESEARCH.md). - LangGraph streaming: When
RAG_ENGINE=langgraph,/api/chat/streamemits astatusevent, runs the graph in a worker thread, then streams the buffered answer in paced chunks (not token-level Bedrock streaming). UseRAG_ENGINE=chainforastream_eventsTTFT. - Research mode: Optional
research_mode=webon chat requests whenWEB_RESEARCH_ENABLED=true; responses includesource_kindand a web disclaimer when applicable. - Singleton:
get_rag_service()returns one sharedRAGServiceinstance (thread-safe) for all chat handlers. - Providers:
backend/app/services/providers/registers LLM and retriever implementations (aws,azure,mock) selected byLLM_PROVIDER,RETRIEVER_PROVIDER, optionalRAG_PROVIDER, andRAG_FORCE_MOCK. When bothLLM_PROVIDERandRETRIEVER_PROVIDERare set, they take precedence overRAG_PROVIDER.
Chat API surface (summary)¶
| Endpoint | Purpose |
|---|---|
POST /api/chat/stream |
SSE streaming reply |
POST /api/chat/chat |
Buffered reply |
GET/POST/DELETE /api/chat/sessions |
Conversation CRUD |
POST /api/chat/feedback |
Thumbs up/down |
GET /api/auth/oauth/{provider}/start |
OAuth redirect (e.g. github) |
GET /api/auth/oauth/{provider}/callback |
OAuth callback on API origin; dev handoff to Vue /oauth/handoff |
GET /api/chat/messages/{id}/sources |
Source metadata for a message |
Frontend (frontend-vue/)¶
- Data flow: Axios client (
src/api/) → Pinia stores (src/stores/) → views/components. Cookies sent withwithCredentials. - Chat UI:
ChatView+ sidebar session list;MessageBubble(Markdown, user/assistant lanes, accessible accent);SourcesPanel/SourcesSummarybelow assistant replies;MessageFeedback; SSE streaming with typing/status indicator. Dev server:http://127.0.0.1:5173. - Routing: Vue Router guards call
fetchCurrentUserfor protected routes. - Testing: Vitest + MSW (
src/mocks/) for unit/integration tests; Playwright undere2e/(see E2E.md).
Production-oriented behavior¶
When APP_ENV is production or prod (configurable via .env):
ENABLE_DEV_API_ROUTESdefaults to false (hides/api/auth/debug-auth,/api/chat/test_langsmith).ENABLE_OPENAPI_DOCSdefaults to false (no Swagger/ReDoc/OpenAPI JSON).AUTH_COOKIE_SECUREdefaults to true.
Override any of these explicitly in .env when needed.
CORS¶
- If
BACKEND_CORS_ORIGINSis*, the app allows a fixed list of local origins plusFRONTEND_URL. - For production, set
BACKEND_CORS_ORIGINSto an explicit comma-separated list of allowed origins (see.env.example).
Testing note¶
Integration tests mock RAG by patching backend.app.api.chat.get_rag_service (the name bound in the chat router module), not only backend.app.services.rag.get_rag_service, because the router imports that function by reference at load time.
Rate limiting¶
backend/app/core/rate_limit.py— process-local sliding windows on auth/chat (RATE_LIMIT_ENABLED). Use Redis-backed limits for multi-instance production.