Changelog¶
Notable changes to Campus RAG Assistant — independent extension of the UC Berkeley ETS Chabot platform (campus-rag-assistant).
Keep a Changelog format.
Attribution and license: README.
Author & maintainer: sandeep-jay — sole implementation author
of the upstream Berkeley ETS Chabot codebase (copyright UC Regents) and author of
this independent extension. Distributed under the UC Regents
LICENSE; see NOTICE. Not an official UC Berkeley product.
Convention: sections use session dates (when work happened). GitHub PR numbers are noted where the public merge story matters.
Edit [Unreleased] while you work. When a session is done, rename it to
## [YYYY-MM-DD] — short title and open a new [Unreleased].
[Unreleased]¶
Documentation¶
- Reposition README and MkDocs landing page to lead with ownership and architecture; move upstream attribution into a dedicated "Origin and Scope" section.
- Add
docs/REVIEWER_GUIDE.mdwith 90-second read, senior-signal evidence map, and per-persona review paths; surfaced in MkDocs nav near the top. - Replace "Demo readiness" with "Review artifacts" framing in README and docs site.
- Rewrite
docs/PORTFOLIO_CASE_STUDY.md"My role" section so platform ownership leads and upstream context follows. - Drop weak phrases ("not a weekend chatbot", "portfolio-grade") near the top of public-facing docs.
Security¶
- Frontend dev-tool CVEs remediated — upgraded
frontend-vuetest/build tooling sonpm audit --audit-level=moderateis clean:vite6.4.2,esbuild0.25.0,vitest/@vitest/coverage-v84.1.7, plus lockfile transitive fixes forws8.20.1andbrace-expansion5.0.6. Verified withnpm run typecheck,npm test -- --run(130 tests),npm run build, andnpm audit --audit-level=moderate.langgraph/langgraph-checkpointalerts remain deferred because patched checkpoint/LangGraph combinations requirelangchain-core1.x and conflict with the current LangChain 0.3 stack. - Vulnerable Python pins bumped to patched releases (closes 7 Dependabot alerts on
main): authlib==1.3.2→1.6.12— patches 1 CRITICAL JWS injection (GHSA-9ggr-2464-2j32) and 5 HIGH advisories (OIDC bypass, padding oracle, DoS, account takeover).langchain==0.3.20→0.3.30— patches HIGH LangSmith deserialize advisory.langchain-community==0.3→0.3.27— patches HIGH advisory.streamlit==1.30.0→1.54.0— patches 2 MEDIUM Windows-only path-traversal / SSRF advisories. Verified by running the fullbackendtest suite (118 passed, 0 new regressions vs baseline) and restarting the local backend/frontend; OAuth import paths and Streamlit demo all clean. The riskierlanggraph/langgraph-checkpointmajor-version migration and thevite/esbuild(frontend dev tooling) bumps are tracked separately as follow-up work in the Dependabot alert queue.
Fixed¶
-
Frontend theme tokens compile correctly (delete-conversation dialog overlap) —
frontend-vue/src/assets/main.csswas using Tailwind v4 (@tailwindcss/vite+@import 'tailwindcss';) without a@themeblock, so design-token utilities (bg-background,bg-card,bg-muted,border-border,text-foreground,text-muted-foreground, etc.) were silent no-ops. Surfaces appeared opaque only becausebodysetbackground-color: hsl(var(--background)); inside the scrolling session list, the sidebar's sticky delete-confirmation panel had no actual background and rendered transparently over the session items. Added an@theme inlinemapping for the existing--background,--foreground,--card,--popover,--primary,--secondary,--muted,--accent,--destructive,--border,--input, and--ringCSS variables so the utilities compile. Fixes the delete-conversation dialog overlap and theUserMenudropdown which used the same pattern. -
Dependency review PR guard — added
dependency review (new high/critical CVEs)to CI usingactions/dependency-review-action@v4. It runs on pull requests and fails when a dependency diff introduces a new high/critical advisory. Dependabot alerts remain enabled, while Dependabot security auto-update PRs remain disabled for manual triage. - Secret-leak defense in depth (gitleaks layers) — five independent guards now sit between a credential and a public push:
.gitignore.env*catch-all plus pattern blocks for keys/credentials/tfvars/secrets dirs;backend/tests/core/test_env_template.py(everySettingsfield documented; no real-lookingSecretStrvalues in.env.example);- local
.githooks/pre-pushrunsgitleaks detect --log-opts="<remote>..HEAD"and fails the push on any finding (scripts/install-hooks.shwires it locally and, with--global, into~/.config/git/hooks); - new
tox -e secretsenv andgitleaks (history + diff)job in.github/workflows/ci.ymlrungitleaks detect --all --reflog --no-mergeson every PR and push; - GitHub repo settings now have Secret Scanning, Push Protection, and Dependabot alerts enabled — even
git push --no-verifyis blocked at GitHub's edge for known-provider patterns. (Dependabot security updates / auto-PR bumps are intentionally left off — they kept breaking the build; vulnerabilities are triaged manually from the alert queue.) - Tool attribution guard —
.githooks/commit-msgdelegates to.githooks/tool_attribution_guard.py, stripping AI-tool authorship/attribution lines (Co-authored-by:,Signed-off-by:,Made with <tool> ..., vendor URL footers) from commit messages.scripts/install-hooks.sh --globalinstalls the same guard into~/.config/git/hooks/so every repo on the workstation gets the same protection. - Secret hardening (Pydantic
SecretStr) —SECRET_KEY,AWS_SECRET_ACCESS_KEY,AZURE_OPENAI_API_KEY,AZURE_SEARCH_KEY,OAUTH_GOOGLE_CLIENT_SECRET,OAUTH_GITHUB_CLIENT_SECRET,LANGCHAIN_API_KEY, andTAVILY_API_KEYinbackend/app/config/default.pyare now typedSecretStr, so they redact inrepr, logs, exceptions, andmodel_dump_json. Cleartext is read only at the boundary (JWT codec, OAuth client, Azure SDK, Tavily, LangSmith).simple_tracer.pyno longer logs the LangSmith key in env-var dumps. - Pydantic settings tightening —
DefaultSettings.model_configswitched fromextra='allow'toextra='ignore'with explicitSettingsConfigDict(case-sensitive, UTF-8, optionalsecrets_dirfor Docker/K8s secret mounts). Previously-undeclaredAZURE_SEARCH_VECTOR_FIELDnow declared (text_vectordefault) and documented in.env.example. - CI guard for env template —
backend/tests/core/test_env_template.pyasserts everyDefaultSettingsfield is documented in.env.example(per-field forSecretStrfields) and that noSecretStrfield carries a non-placeholder uncommented value. - Secrets management doc —
docs/SECURITY.mdnow describes the load precedence, the list ofSecretStrfields, a production checklist (secret stores, instance roles, rotation), and the leak-response runbook.
Changed¶
- Environment identifier consolidated — dropped the legacy
ENVIRONMENTfield fromDefaultSettings/DevelopmentSettings/TestSettings;APP_ENVis now the only source of truth. Updatedbackend/app/main.py,backend/app/api/oauth_routes.py,backend/verify_configs.py,frontend-streamlit/app/{config,main}.py,scripts/verify_oauth.py,tox.ini(removed three redundantENVIRONMENT = testlines),.env.example, anddocs/ARCHITECTURE.md. - GitHub Pages documentation site — added MkDocs Material scaffold, docs CI build/deploy workflow,
tox -e docs, a Pages landing page, and README/documentation positioning polish around the independent extension framing. .env.examplerewritten into 16 numbered sections with explicit[REQUIRED]/[REQUIRED IF <cond>]/[OPTIONAL — default: <v>]labels on every entry; duplicates removed and previously-undocumented fields added..env.testreorganized to mirror the same section numbering as.env.example;APP_ENV=testset explicitly.
[2026-05-19] — Portfolio polish (PR #24)¶
Changed¶
- Docs (portfolio polish) — README restructured for progressive disclosure (pitch, role alignment, highlights, upstream delta, quality baseline); added PORTFOLIO_CASE_STUDY.md, docs/adr/ (4 ADRs), PRODUCTION_HARDENING.md; reframed EVALUATION.md baseline paragraph; updated docs/README.md index.
- README — Overview without
<details>collapsibles (full architecture, design, screenshots, LangSmith traces visible); CI status badge. - Eval — expanded eval_baseline_2026-05-19.md (retained AWS scores, Azure sweep table, findings); eval respects LangGraph when
RAGAS_EVAL=1; Phase 5 script precision-balanced profile;RAGAS_DO_NOT_TRACKin tox eval. - Logging —
RequestIdFilteron handlers; redact JWT payloads and chat queries at INFO; cap vendor loggers; optionalLOG_JSON; access log viaapp.access; single idempotentinitialize_logger(). - Docs — documentation cohesion: DESIGN.md (product boundaries, decisions); README problem/quality sections; rename PRODUCT_ROADMAP.md; de-portfolio language; generic campus/Canvas LMS framing; document Bedrock KB + OpenSearch Serverless alongside Azure Search; expanded docs/README.md index; ARCHITECTURE.md LangGraph/OAuth/research_mode; WEB_RESEARCH.md KB path diagram; CI.md portfolio quick tox note; E2E.md OAuth note; RELEASE.md tag message; deduped CHANGELOG
[Unreleased].
[2026-05-19] — Portfolio features (PRs #13–#17)¶
Added¶
- Phase 5 retrieval — multi-query expansion/fusion, optional Bedrock/client metadata filters, rerank node (condense → multi_query → retrieve → rerank → generate).
- Phase 5 rerank — LangGraph
reranknode; FlashRank + keyword fallback;RERANK_*settings; candidate fetch viaRERANK_CANDIDATE_K. - Phase 3 lite — README Quality & observability; LangSmith
chat-session-*run names; curated goldenground_truth;scripts/promote_golden_draft.py. - RAGAS golden bootstrap —
scripts/bootstrap_golden_dataset.py,backend/tests/eval/seed_questions.json; golden set refreshed from live AWS KB (10 rows). - OAuth (dev) — API-port OAuth + one-time handoff to Vue (
/oauth/handoff) fixes GitHubstate_mismatchacross Vite proxy ports.
Changed¶
- Docs — README refresh (highlights, LangGraph/web/eval features, stack table); screenshot gallery under
docs/assets/{product,observability,auth}/; doc index (docs/README.md), ARCHITECTURE.md, WEB_RESEARCH.md; consolidated LangSmith capture in EVALUATION.md;.gitignorefor.cursor/and golden draft. - Phase 5 retrieval tuning — RRF document fusion, keyword prefilter before rerank; tuned eval profile in
scripts/run_eval_phase5.sh(faithfulness/recall up vs initial Phase 5; precision still below gate). - Phase 3 lite — portfolio RAGAS baseline policy; LangSmith trace screenshots in README; Phase 3 roadmap marked done (lite).
Fixed¶
[2026-05-19] — GitHub Actions CI/CD¶
Added¶
- CI/CD — GitHub Actions:
ci.yml(tox onmain+ PRs),cd.yml(Vue build + optional EB deploy onqa/release); docs/CI.md. Removed.travis.yml.
Fixed¶
- CI — pin
@rollup/rollup-*platform packages infrontend-vueoptionalDependencies (fixes Linuxnpm cioptional-deps bug). - CI — use
HUSKY=0 npm ci(not--ignore-scripts) so Rollup native bindings install on Linux runners. - CI —
frontend-vuetox env skipsnvm usewhenCI=true(GHA) or nvm is absent; CI workflow runs tox sequentially.
Changed¶
- CI —
tox -e lint,backend,frontend-vuegreen; ruff format/fix, LangGraph import fixes, ChatViewresearch-modebinding. - Docs — roadmap cleanup: PRODUCT_ROADMAP.md is the single index; removed
TODAY_SPRINT.mdandroadmap/README.md; campus scale track moved to archive/PHASED_IMPROVEMENT_ROADMAP.md. - Tests —
conftestforcesRAG_ENGINE=chainso API stream tests stay isolated from developer.env.
[2026-05-18] — Security dependency bumps¶
Changed¶
- Runtime dependencies — FastAPI 0.115.x (Starlette CVE fixes),
python-multipart>=0.0.27,python-jose>=3.4,PyJWT>=2.12,requests/urllib3/httpxupgrades,gunicorn>=22,python-dotenv>=1.2.2. - LangGraph pins — exact
langgraph==0.2.76+langgraph-checkpoint==2.0.26(resolveshttpxconflict with LangChain 0.3).
Added¶
- docs/SECURITY.md — audit commands, production hardening checklist, dependency policy.
[2026-05-18] — LangGraph live validation (AWS KB parity)¶
Added¶
- LangGraph live path —
run_rag_graphimport fix; live AWS KB smoke validated withRAG_ENGINE=langgraph(sources + coherent answers). - LangGraph SSE — status event while graph runs in
asyncio.to_thread; paced token chunks for progressive UI (simulated streaming until graph-native stream). - Web research (frontend) —
research_modeon API; Pinia +ChatInputtoggle whenVITE_WEB_RESEARCH_ENABLED=true(serverWEB_RESEARCH_ENABLEDrequired). - Docs — sprint checklist updates; LangGraph latency notes in LANGGRAPH.md.
Changed¶
- Dependencies — pin
langgraph0.2.x +langgraph-checkpoint2.x for LangChain 0.3 compatibility (see upcoming security branch for broader bumps). scripts/run-backend-venv.sh— start via./venv/bin/python -m uvicornso reload uses project venv.
Fixed¶
- Answer leakage — stronger
_strip_condensed_question_leakage(backend +normalizeAssistantContent). - Empty-state prompts —
{{ prompt }}mustache inMessageList.vue. - SSE burst render —
requestAnimationFramebetween tokens inchat.ts. - Graph unit tests — mock patch fixture and KB-path LLM stub.
[2026-05-18] — GitHub OAuth, LangGraph scaffold, and chat UI polish¶
Added¶
- GitHub OAuth —
POST /api/auth/oauth/{provider}/startand callback routes;OAuthButtons.vue; Alembic0003user OAuth fields;scripts/verify_oauth.py. backend/app/core/auth_cookies.py— shared HTTP-only JWT cookie helpers for login and OAuth.- RAG streaming —
stream_query_async()via LangChainastream_events; SSEstatusevents; condensed-question leakage stripping inrag.py. - LangGraph scaffold —
backend/app/services/graph/(runner, nodes, state); opt-inweb_searchtool;RAG_ENGINE=langgraphconfig (default remainschain). - Docs — PRODUCTION_TLS.md (HTTPS + OAuth redirects); SPRINT_2026-05-18_LANGGRAPH.md; WEB_RESEARCH.md.
- Vue chat UI — typography scale (
text-chat-*,.chat-prose); wider layout; mobile sidebar overlay; accessible user message accent; assistant sources stacked below replies; sticky composer.
Changed¶
- Local dev defaults — Vite on
http://127.0.0.1:5173(strictPort);FRONTEND_URL/OAUTH_REDIRECT_BASE_URLaligned to avoidMismatchingStateError(see PRODUCTION_TLS.md). - Frontend — Pinia chat store appends stream tokens immediately; dedicated
/api/chat/streamVite proxy (no buffering). - Auth API — login/register use shared cookie helpers; OAuth links or creates users by provider subject.
- Roadmap docs — LangGraph and portfolio roadmap updated for sprint status.
requirements.txt— LangGraph-related dependencies for graph scaffold.
[2026-05-18] — Generic tenant-hydrated RAG prompts¶
Added¶
backend/app/services/tenant_rag_config.py— load branding from env +tenant.rag_config(JSONB).- Alembic
0002—tenant.rag_configcolumn. docs/TENANT_CONFIG.md— config shape and resolution order.samples/berkeley/tenant_rag_config.json— optional Berkeley RTL sample profile (not default).
Changed¶
- Prompt templates — generic
prompt_prefix.txt/few_shot_examples.jsonwith{{placeholders}}. - Chat + RAG — hydrate prompts per request from the signed-in user's tenant.
PROJECT_NAME/.env.example—ASSISTANT_NAME,SUPPORTED_TOPICS,OUT_OF_SCOPE_MESSAGE.- README — bring-your-own KB + tenant config.
[2026-05-18] — Performance Phase 0 quick wins¶
Added¶
docs/PERFORMANCE.md— Phase 0 shipped tuning; documentation checklists for Phase 1–3.- Config:
CHAT_HISTORY_MAX_MESSAGES,STREAM_ARTIFICIAL_DELAY_MS,SQLALCHEMY_POOL_SIZE,SQLALCHEMY_MAX_OVERFLOW(see.env.example). - Prometheus:
chatbot_chat_first_token_latency_seconds(SSE time-to-first-token). - Test:
test_get_session_messages_respects_max_messages.
Changed¶
- Streaming: removed fixed
time.sleepon SSE tokens; optional demo delay viaSTREAM_ARTIFICIAL_DELAY_MSin RAG only. - Chat API:
_load_chat_history()caps messages passed to LangChain. - DB: SQLAlchemy engine uses configured pool +
pool_pre_ping. run_services.sh: multi-worker uvicorn viaAPI_WORKERS/UVICORN_WORKERS(default 2).docs/OPERATIONS.md: SLOs split for auth/session vs live RAG; first-token alert hint.docs/roadmap/PHASED_IMPROVEMENT_ROADMAP.md: Phase 0 perf shipped note; FlashRank marked Phase 2 / not inrag.pyyet.
[2026-05-18] — Docs cleanup and Campus RAG Assistant rebrand¶
Changed¶
- README: product-first Campus RAG Assistant opening; license/attribution under License.
- GitHub repo renamed to campus-rag-assistant; About description updated.
- changelog/CHANGELOG.md — single session-based log under
changelog/(other files in folder gitignored). - Trimmed ARCHITECTURE.md; clarified known gaps (buffered chat vs SSE).
Removed¶
docs/PORTFOLIO.md,docs/EXECUTION_PLAN.md,docs/DOC_AUDIT.md,scripts/new-changelog.sh.
Added¶
- Full session history in changelog/CHANGELOG.md (2025 Berkeley baseline + 2026 fork sessions).
[2026-05-17] — tox and Vue in CI¶
Merged as PR #9.
Added¶
- tox
frontend-vueenv:npm ci, typecheck, ESLint, Vitest (Node 20 via.nvmrc). - requirements.txt:
langchain-openai, Azure SDKs;bcrypt>=4.0.1,<4.1.0for passlib in tox. - Lazy Azure provider imports in
backend/app/services/providers/__init__.py.
Changed¶
- tox
backend:RATE_LIMIT_ENABLED=false, excludeslow(RAGAS) by default. - README Testing:
tox -e lint,backend,frontend-streamlit,frontend-vue. - Ruff/pytest marker cleanups.
[2026-05-17] — Portfolio publish to GitHub¶
PRs #1–#8 → campus-rag-assistant main.
Packages work from May 2026 dev sessions into reviewable commits.
PR #1 — Dev tooling¶
.githooks/pre-commit,scripts/install-hooks.sh,run-backend-venv.sh,run-frontend-vue.sh,kill-dev-servers.sh, load-test helpers..gitignoreportfolio hygiene.
PR #2 — Alembic¶
alembic.ini,backend/alembic/,0001_initial_schema.py.
PR #3 — Platform middleware¶
request_context.py,metrics.py,rate_limit.py,dev_routes.py; wired inmain.py,auth.py,chat.py.
PR #4 — Providers, RAG, eval¶
backend/app/services/providers/(AWS / Azure / mock);rag.pyregistry wiring.backend/tests/eval/RAGAS golden harness.
PR #5 — Vue 3 SPA¶
frontend-vue/scaffold, API/auth, chat UI, sessions, sources, Vitest + Playwright scaffolding.
PR #6 — Streamlit client¶
frontend-streamlit/(auth, chat services, UI components, pytest).
PR #7 — Load tests¶
load-tests/k6 smoke + auth-chat-session; user seed script.
PR #8 — Docs and README¶
docs/architecture, operations, E2E, evaluation, roadmaps, LangGraph design.- Portfolio README, mock
.env.example.
Post-publish cleanup (PR #7–#8 follow-ups)¶
- Removed duplicate
frontend/tree;run_services.sh/ tox →frontend-streamlit/. main.py:create_allonly in dev/test;requirements.txt:alembic,redis.- Removed root
root-open-k6.js, empty rootpackage-lock.json.
[2026-05-01] — RAG platform, Vue, providers (dev session)¶
Implementation work; landed on main via 2026-05-17 PRs above. Some session notes describe features not merged.
Added — on main¶
- Vue 3 SPA, Streamlit tree, provider registry, Redis rate limiter, Prometheus metrics, dev routes, RAGAS eval, core scripts.
Added — session plan only (not on main)¶
POST /api/chat/stream(SSE); FlashRankRERANK_*; extra tox envs (eval,load-smoke, …).
Changed / fixed / security¶
rag.py,chat.py, schemas,.env.example, requirements, ruff / pytest; password rules;MessageBubble.vue; generic chat 500s; EB health proxy.
Follow-ups¶
- RAGAS golden Q&A; production
REDIS_URL; LangGraph / SSE / rerank — docs/roadmap/LANGGRAPH.md.
[2026-05-01] — Logging and request correlation (dev session)¶
Summary¶
One request id per HTTP request (X-Request-ID); optional JSON logs; quieter auth logs.
Added¶
request_context.py,LOG_JSON, tests,kill-dev-servers.sh, Vueinterceptors.tsforX-Request-ID.
Changed / removed / security¶
logger.py,main.py,security.py, config; removed unusedLOGGING_PROPAGATION_LEVEL; no JWT dumps at INFO.
Berkeley ETS Chabot (baseline)¶
Chabot — campus RAG chatbot for UC Berkeley ETS over AWS Bedrock.
Upstream: ets-berkeley-edu/chabot.
© The Regents of the University of California — LICENSE.
sandeep-jay led implementation (CBO-tracked PRs below). Regents headers remain on derived files in this fork.
[2025-08-01] — Streamlit UX and frontend tests¶
Added¶
- Streamlit refactor: chat interface, message display, feedback UI and stylesheets ([CBO-86]).
- Frontend test suite covering auth, chat, and message modules ([CBO-89]).
[2025-06-13] — Backend and API test suites¶
Added¶
- Pytest for RAG workflow, AWS/Bedrock/LangSmith, auth, DB/models/services ([CBO-69], [CBO-71], [CBO-72], [CBO-84]).
- Chat API interaction tests: CRUD, feedback, sources, mocks ([CBO-70]).
pyproject.tomltool config; tox/travis alignment ([CBO-72]).
[2025-06-05] — Streamlit cleanup¶
Changed¶
- Removed basic Streamlit prototype in favor of modular refactor ([CBO-99]).
[2025-05-30] — Chat API and Streamlit auth¶
Added¶
- Chat endpoints: sessions, messages, feedback,
test_langsmith([CBO-45]–[CBO-47]). - Streamlit login, auth module, client services ([CBO-65], [CBO-66], [CBO-80], [CBO-81]).
[2025-05-29] — JWT auth and advanced RAG¶
Added¶
- JWT authentication module and auth endpoints ([CBO-74], [CBO-75]).
- Advanced RAG with Bedrock integration ([CBO-85]).
[2025-05-28] — Elastic Beanstalk deploy sketch¶
Added¶
.ebextensionsand Nginx config for FastAPI + Streamlit ([CBO-63]).
[2025-05-12] — Bedrock RAG and first UI¶
Added¶
- AWS, LangChain, Bedrock; simple RAG and
/chatintegration; prompt templates ([CBO-31], [CBO-34], [CBO-36], [CBO-41]). - Basic Streamlit chat UI + LangSmith tracing ([CBO-36], [CBO-42]).
- ruff and tox ([CBO-67]).
[2025-05-05] — FastAPI foundation¶
Added¶
- FastAPI boilerplate (
/,/health) ([CBO-30]). - Pydantic-settings config manager ([CBO-32]).
- Modular logger ([CBO-35]).
- SQLAlchemy + chatbot table design ([CBO-49]).
[2025-05-13] — CI and README¶
Added¶
- Travis CI linters ([CBO-82]).
- README instructions ([CBO-82]).
Changed¶
.gitignorefor.toxand.ruff*([NOJIRA]).