Skip to content

Design and architecture decisions

This document records why the system is shaped the way it is. For component diagrams and request flows, see ARCHITECTURE.md. For graph node detail, see roadmap/LANGGRAPH.md.

Last updated: 2026-05-19


Product scope

Campus RAG Assistant is a retrieval-augmented chat application for campus teaching, learning, and education IT knowledge (for example Canvas LMS and LTI tooling, accessibility and inclusive teaching guidance, and ServiceNow IT knowledge articles). Users ask natural-language questions; the system retrieves grounded context, generates a structured answer with citations, and keeps per-user chat history.

It is an independent extension of the upstream chabot codebase: same problem domain (institutional knowledge), expanded platform surface (Vue SPA, provider registry, LangGraph pipeline, formal evaluation).


Product boundaries

In scope

  • Q&A over institutional knowledge — Canvas LMS, LTI tools, accessibility, inclusive teaching and learning, ServiceNow IT articles, and institutional policies.
  • Cited answers with expandable source excerpts in the UI.
  • Multi-turn chat with session history and thumbs-up/down feedback.
  • Per-tenant prompt and topic configuration (tenant.rag_config).
  • Operator controls: feature flags for retrieval tuning, web research, and RAG engine selection.

Out of scope (by design)

  • General-purpose chat without retrieval grounding (KB path always retrieves first).
  • Silent open-web answers — web mode requires an explicit user toggle and shows a disclaimer.
  • Unbounded agent tool loops — orchestration is a fixed LangGraph with optional bounded rewrite (Phase 6), not open-ended multi-agent autonomy.
  • Clinical or HIPAA-regulated use — this codebase targets education IT knowledge; do not deploy against PHI without a separate compliance program.

Success signals

Signal Mechanism
Answer usefulness User feedback on messages; qualitative review of traces
Grounding Source panel + RAGAS faithfulness on golden set
Retrieval coverage RAGAS context_recall vs curated ground_truth
Operability CI green on mock RAG; Prometheus metrics; LangSmith per-node spans on graph path

Design goals

Goal How we approach it
Grounded answers Retrieval before generation; sources returned to the client and shown in the UI
Operable in dev and prod Mock providers for local/CI; AWS/Azure paths for live KB; health and metrics endpoints
Observable RAG LangSmith traces (per-node with LangGraph); RAGAS golden-set regression; Prometheus on the API
Safe extension Explicit graph nodes and feature flags; opt-in web research with disclaimer; topic scoping via config
Deployable incrementally Alembic migrations; mainqarelease CD; optional strict eval gates on release

Major decisions

Dual RAG engines (chain vs langgraph)

RAG_ENGINE=chain RAG_ENGINE=langgraph
Implementation LangChain ConversationalRetrievalChain Compiled graph in backend/app/services/graph/
Streaming True token streaming via astream_events Status event + paced chunks after graph.invoke()
Observability Chain-level LangSmith runs Per-node spans (condense, multi_query, retrieve, rerank, …)
Retrieval tuning Chain retriever settings Multi-query, metadata filters, rerank as explicit nodes
Default in tests Yes (conftest forces chain so CI needs no AWS) Local/live when configured in .env

Rationale: The chain path preserves low-latency SSE and a simple mental model. LangGraph adds a testable orchestration layer and room for retrieval stages without growing a monolithic chain class. Both paths share the same provider registry and response shape so the API and UI stay engine-agnostic.

Code: backend/app/services/rag.py, backend/app/services/graph/.


Bedrock Knowledge Base with OpenSearch (AWS)

AWS stack: Bedrock Knowledge Base (retrieve API) + OpenSearch Serverless (typical vector store behind the KB). The app uses AmazonKnowledgeBasesRetriever—not direct OpenSearch client calls.

retrieve node → Bedrock KB API → OpenSearch Serverless index
Piece Responsibility
OpenSearch Serverless Chunk embeddings, vector/hybrid search, index storage
Bedrock Knowledge Base Connectors, sync, retrieve orchestration, result metadata for citations
This application RETRIEVER_PROVIDER=aws, BEDROCK_KNOWLEDGE_BASE_ID, optional Bedrock metadata filters

Azure stack: Azure AI Search fills the same role (no OpenSearch)—RETRIEVER_PROVIDER=azure.

Rationale: v1 (upstream chabot) coupled the app to OpenSearch queries. v2 keeps OpenSearch in the platform architecture but uses the KB API so ingestion, index policies, and retrieve semantics stay managed by AWS—one retriever interface in the provider registry for both clouds.

Code: backend/app/services/providers/retriever/aws.py, backend/app/services/retrieval.py (metadata filters).

Code (registry): backend/app/services/providers/ (AWS/Azure/mock).


Provider registry (LLM + retriever)

LLM_PROVIDER and RETRIEVER_PROVIDER select aws, azure, or mock implementations. RAG_FORCE_MOCK=true forces mock for demos and CI.

Rationale: Same API and UI across environments; tox and new contributors run without cloud credentials. Explicit env vars beat implicit “whatever is in .env” for support and docs.

Code: backend/app/services/providers/, backend/app/config/default.py, .env.example.


LangGraph KB path: multi-query → retrieve → rerank

condense → multi_query → retrieve → rerank → generate → format
Stage Purpose
condense Turn follow-up questions into a standalone retrieval query
multi_query Expand queries; fuse results (RRF) for better recall
retrieve Bedrock KB → OpenSearch Serverless or Azure AI Search (vector + keyword/hybrid); optional metadata filters; fetch RERANK_CANDIDATE_K docs when reranking
rerank FlashRank or keyword backend to trim noise before generation
generate LLM answer grounded on selected chunks
format Normalize metadata (sources, source_kind, markdown shape)

Rationale: Recall and precision are tuned in retrieval, not only in the prompt. Each stage is flag-gated (MULTI_QUERY_*, RERANK_*, METADATA_FILTER_*) so operators can compare profiles (see eval_baseline_2026-05-19.md and ./scripts/run_eval_phase5.sh).

Code: backend/app/services/graph/nodes.py, backend/app/services/retrieval.py, backend/app/services/rerank.py.

Web path intentionally skips rerank: condense → web_search → generate → format (WEB_RESEARCH.md).


Opt-in web research

Web search is per message (research_mode=web), gated by WEB_RESEARCH_ENABLED, with a disclaimer in the UI and source_kind=web in metadata.

Rationale: Campus KB answers should default to governed corpus content. Open web is a deliberate user choice, not silent fallback when retrieval is weak.

Code: backend/app/services/tools/web_search.py, graph routing in nodes.py, Vue ChatInput / stores.


Two evaluation layers (RAGAS + LangSmith)

Tool Role
RAGAS Offline quality metrics on a fixed golden dataset (backend/tests/eval/); optional strict gates via RAGAS_QUALITY_GATE
LangSmith Online trace inspection per session and per graph node

Rationale: RAGAS answers “did we regress on known questions?” LangSmith answers “what happened on this slow or wrong turn?” CI runs unit tests with mock RAG; full RAGAS is slow and AWS-dependent, so it is optional locally and on release when configured (EVALUATION.md).


API-port OAuth with SPA handoff

GitHub OAuth callback runs on the API origin (OAUTH_REDIRECT_BASE_URL, typically :8000), then redirects to Vue /oauth/handoff with a one-time code.

Rationale: OAuth state and cookies stay on one origin during the provider round-trip; avoids state_mismatch when the browser hits both Vite (:5173) and the API during login.

Code: backend/app/api/auth/oauth_handoff.py (or equivalent), PRODUCTION_TLS.md.


Tenant-hydrated prompts

Per-tenant tenant.rag_config in Postgres can override topics, prompts, and related RAG settings.

Rationale: One deployment serving multiple logical tenants or campuses without separate builds. See TENANT_CONFIG.md.


History and performance guardrails

Chat history is capped (CHAT_HISTORY_MAX_MESSAGES) to bound prompt size and cost. Prometheus exposes pool and first-token style metrics (PERFORMANCE.md).

Rationale: Long sessions should not silently blow context windows or latency SLOs.


Capability map (where to read more)

Capability Primary doc Implementation
Chat + SSE ARCHITECTURE.md backend/app/api/chat.py, frontend-vue/src/stores/chat.ts
LangGraph pipeline LANGGRAPH.md backend/app/services/graph/
Web research WEB_RESEARCH.md backend/app/services/tools/web_search.py
Auth / OAuth PRODUCTION_TLS.md backend/app/api/auth/
Evaluation EVALUATION.md backend/tests/eval/, scripts/run_eval_phase5.sh
CI/CD CI.md, RELEASE.md .github/workflows/
Operations OPERATIONS.md Alembic, metrics, run scripts
Delivery phases roadmap/PRODUCT_ROADMAP.md Shipped vs optional work

Alternatives considered (short)

Topic Alternative Why not (for this codebase)
Orchestration Open-ended multi-agent (CrewAI, etc.) Harder to test and observe; prefer explicit graph for production RAG
Retrieval App-managed chunking + direct OpenSearch only Bedrock KB + OpenSearch Serverless: managed sync and retrieve API; app avoids index client ops
Streaming Only buffered responses Chain path keeps true SSE; graph path trades TTFT for span clarity until Phase 6a
Web Always-on web augmentation Conflicts with KB trust model; opt-in + disclaimer is clearer
DB schema create_all in production Alembic-only in prod for repeatable deploys

Extension points (planned or optional)

Documented in roadmap/PRODUCT_ROADMAP.md:

  • LangGraph-native SSE — stream from astream_events instead of post-invoke chunking
  • Bounded rewrite loopRAG_AGENTIC_ENABLED (quality retry without open agents)
  • Campus scale — Redis rate limits, HA, EB hardening (archive/PHASED_IMPROVEMENT_ROADMAP.md)