Portfolio Case Study: Campus RAG Assistant¶
One-line summary¶
A production-style RAG platform for governed institutional knowledge, built to demonstrate AI product engineering and platform architecture.
Problem¶
Campus knowledge is scattered across LMS guides, ServiceNow articles, and policy documents. Staff and students need fast answers they can verify—not another generic chatbot that guesses from the open web.
My role¶
I owned the platform transformation work represented in this repository: the Vue product UI, provider registry, AWS / Azure / mock execution modes, LangGraph orchestration, RAGAS evaluation harness, LangSmith observability, CI/CD, load testing, and operational documentation.
The project builds from the public ets-berkeley-edu/chabot codebase, which established the campus chatbot domain. This repository extends that base into a source-reviewable AI platform architecture artifact for portfolio and educational review. It is not an official UC Berkeley or UC product.
Architecture¶

| Layer | Components |
|---|---|
| UI | Vue 3 SPA (primary); optional Streamlit on the same API |
| API | FastAPI — SSE streaming, sessions, feedback, JWT/OAuth |
| RAG | LangGraph (condense → multi_query → retrieve → rerank → generate) or LangChain chain (true token streaming) |
| Providers | AWS Bedrock KB, Azure AI Search + OpenAI, mock (CI/local) |
| Data | PostgreSQL + Alembic; per-tenant rag_config |
| Quality | RAGAS golden set, LangSmith traces, Prometheus, k6 |
Detailed diagrams and request flows: ARCHITECTURE.md.
Key decisions¶
| Decision | Rationale | ADR |
|---|---|---|
| Provider registry (AWS / Azure / mock) | Same API and UI across environments; CI runs without cloud credentials | ADR-001 |
Dual RAG engines (chain vs langgraph) |
Chain preserves true SSE; LangGraph adds observable stages and retrieval tuning | ADR-002 |
| Opt-in web research | Governed KB-first; open web is explicit per message with disclaimer | ADR-003 |
| RAGAS gates as release controls | Honest baselines on PR CI; strict gates on release milestones only | ADR-004 |
| Bedrock KB API (not direct OpenSearch) | Managed sync, retrieve, and citation metadata; simpler ops | DESIGN.md |
Measured outcomes¶
| Signal | Evidence |
|---|---|
| Test breadth | ~48 backend, frontend, e2e, and eval test files; tox -e lint,backend,frontend-vue on every PR |
| RAGAS baseline | 10-question golden set; AWS Phase 5 tuned profile: context_recall 0.80 (passes gate) |
| CI without cloud | Mock providers; RAG_FORCE_MOCK=true; no AWS credentials in GitHub Actions |
| Load profile | k6 validates auth, session CRUD, and chat under load — LOAD_TESTING.md |
| Observability | LangSmith per-node spans on LangGraph path; Prometheus /api/metrics |
Full score tables: eval_baseline_2026-05-19.md.
Known limits¶
- Eval set is small (10 rows) and corpus-specific—good for regression baseline, not production quality claims.
- Context precision (~0.50) is the main quality bottleneck; next levers are ingestion/chunking and rerank tuning.
- LangGraph path buffers output into paced SSE chunks rather than true token streaming (Phase 6a optional).
- UC license limits commercial reuse; treat as portfolio/educational fork, not a drop-in commercial product.
What this demonstrates¶
- Lead AI engineering — retrieval tuning, orchestration, citations, eval discipline, failure-mode thinking
- AI platform architecture — multicloud abstraction, tenant config, mock/live environments, CI/CD
- Product judgment — opt-in web research, topic scoping, source transparency, feedback loop
- Evaluation discipline — RAGAS + LangSmith as complementary tools; honest baselines
- Production-readiness thinking — metrics, rate limits, migrations, security notes, hardening backlog
Related¶
- README — quick start and portfolio highlights
- DESIGN.md — goals, boundaries, tradeoffs
- PRODUCTION_HARDENING.md — scale and ops backlog
- docs/adr/ — architecture decision records