Skip to content

Portfolio Case Study: Campus RAG Assistant

One-line summary

A production-style RAG platform for governed institutional knowledge, built to demonstrate AI product engineering and platform architecture.

Problem

Campus knowledge is scattered across LMS guides, ServiceNow articles, and policy documents. Staff and students need fast answers they can verify—not another generic chatbot that guesses from the open web.

My role

I owned the platform transformation work represented in this repository: the Vue product UI, provider registry, AWS / Azure / mock execution modes, LangGraph orchestration, RAGAS evaluation harness, LangSmith observability, CI/CD, load testing, and operational documentation.

The project builds from the public ets-berkeley-edu/chabot codebase, which established the campus chatbot domain. This repository extends that base into a source-reviewable AI platform architecture artifact for portfolio and educational review. It is not an official UC Berkeley or UC product.

Architecture

High-level architecture

Layer Components
UI Vue 3 SPA (primary); optional Streamlit on the same API
API FastAPI — SSE streaming, sessions, feedback, JWT/OAuth
RAG LangGraph (condensemulti_queryretrievererankgenerate) or LangChain chain (true token streaming)
Providers AWS Bedrock KB, Azure AI Search + OpenAI, mock (CI/local)
Data PostgreSQL + Alembic; per-tenant rag_config
Quality RAGAS golden set, LangSmith traces, Prometheus, k6

Detailed diagrams and request flows: ARCHITECTURE.md.

Key decisions

Decision Rationale ADR
Provider registry (AWS / Azure / mock) Same API and UI across environments; CI runs without cloud credentials ADR-001
Dual RAG engines (chain vs langgraph) Chain preserves true SSE; LangGraph adds observable stages and retrieval tuning ADR-002
Opt-in web research Governed KB-first; open web is explicit per message with disclaimer ADR-003
RAGAS gates as release controls Honest baselines on PR CI; strict gates on release milestones only ADR-004
Bedrock KB API (not direct OpenSearch) Managed sync, retrieve, and citation metadata; simpler ops DESIGN.md

Measured outcomes

Signal Evidence
Test breadth ~48 backend, frontend, e2e, and eval test files; tox -e lint,backend,frontend-vue on every PR
RAGAS baseline 10-question golden set; AWS Phase 5 tuned profile: context_recall 0.80 (passes gate)
CI without cloud Mock providers; RAG_FORCE_MOCK=true; no AWS credentials in GitHub Actions
Load profile k6 validates auth, session CRUD, and chat under load — LOAD_TESTING.md
Observability LangSmith per-node spans on LangGraph path; Prometheus /api/metrics

Full score tables: eval_baseline_2026-05-19.md.

Known limits

  • Eval set is small (10 rows) and corpus-specific—good for regression baseline, not production quality claims.
  • Context precision (~0.50) is the main quality bottleneck; next levers are ingestion/chunking and rerank tuning.
  • LangGraph path buffers output into paced SSE chunks rather than true token streaming (Phase 6a optional).
  • UC license limits commercial reuse; treat as portfolio/educational fork, not a drop-in commercial product.

What this demonstrates

  • Lead AI engineering — retrieval tuning, orchestration, citations, eval discipline, failure-mode thinking
  • AI platform architecture — multicloud abstraction, tenant config, mock/live environments, CI/CD
  • Product judgment — opt-in web research, topic scoping, source transparency, feedback loop
  • Evaluation discipline — RAGAS + LangSmith as complementary tools; honest baselines
  • Production-readiness thinking — metrics, rate limits, migrations, security notes, hardening backlog