Skip to content

Changelog

Notable changes to Campus RAG Assistant — independent extension of the UC Berkeley ETS Chabot platform (campus-rag-assistant).

Keep a Changelog format.
Attribution and license: README.

Author & maintainer: sandeep-jay — sole implementation author of the upstream Berkeley ETS Chabot codebase (copyright UC Regents) and author of this independent extension. Distributed under the UC Regents LICENSE; see NOTICE. Not an official UC Berkeley product.

Convention: sections use session dates (when work happened). GitHub PR numbers are noted where the public merge story matters.

Edit [Unreleased] while you work. When a session is done, rename it to ## [YYYY-MM-DD] — short title and open a new [Unreleased].

[Unreleased]

Documentation

  • Reposition README and MkDocs landing page to lead with ownership and architecture; move upstream attribution into a dedicated "Origin and Scope" section.
  • Add docs/REVIEWER_GUIDE.md with 90-second read, senior-signal evidence map, and per-persona review paths; surfaced in MkDocs nav near the top.
  • Replace "Demo readiness" with "Review artifacts" framing in README and docs site.
  • Rewrite docs/PORTFOLIO_CASE_STUDY.md "My role" section so platform ownership leads and upstream context follows.
  • Drop weak phrases ("not a weekend chatbot", "portfolio-grade") near the top of public-facing docs.

Security

  • Frontend dev-tool CVEs remediated — upgraded frontend-vue test/build tooling so npm audit --audit-level=moderate is clean: vite 6.4.2, esbuild 0.25.0, vitest / @vitest/coverage-v8 4.1.7, plus lockfile transitive fixes for ws 8.20.1 and brace-expansion 5.0.6. Verified with npm run typecheck, npm test -- --run (130 tests), npm run build, and npm audit --audit-level=moderate. langgraph / langgraph-checkpoint alerts remain deferred because patched checkpoint/LangGraph combinations require langchain-core 1.x and conflict with the current LangChain 0.3 stack.
  • Vulnerable Python pins bumped to patched releases (closes 7 Dependabot alerts on main):
  • authlib==1.3.21.6.12 — patches 1 CRITICAL JWS injection (GHSA-9ggr-2464-2j32) and 5 HIGH advisories (OIDC bypass, padding oracle, DoS, account takeover).
  • langchain==0.3.200.3.30 — patches HIGH LangSmith deserialize advisory.
  • langchain-community==0.30.3.27 — patches HIGH advisory.
  • streamlit==1.30.01.54.0 — patches 2 MEDIUM Windows-only path-traversal / SSRF advisories. Verified by running the full backend test suite (118 passed, 0 new regressions vs baseline) and restarting the local backend/frontend; OAuth import paths and Streamlit demo all clean. The riskier langgraph / langgraph-checkpoint major-version migration and the vite / esbuild (frontend dev tooling) bumps are tracked separately as follow-up work in the Dependabot alert queue.

Fixed

  • Frontend theme tokens compile correctly (delete-conversation dialog overlap)frontend-vue/src/assets/main.css was using Tailwind v4 (@tailwindcss/vite + @import 'tailwindcss';) without a @theme block, so design-token utilities (bg-background, bg-card, bg-muted, border-border, text-foreground, text-muted-foreground, etc.) were silent no-ops. Surfaces appeared opaque only because body set background-color: hsl(var(--background)); inside the scrolling session list, the sidebar's sticky delete-confirmation panel had no actual background and rendered transparently over the session items. Added an @theme inline mapping for the existing --background, --foreground, --card, --popover, --primary, --secondary, --muted, --accent, --destructive, --border, --input, and --ring CSS variables so the utilities compile. Fixes the delete-conversation dialog overlap and the UserMenu dropdown which used the same pattern.

  • Dependency review PR guard — added dependency review (new high/critical CVEs) to CI using actions/dependency-review-action@v4. It runs on pull requests and fails when a dependency diff introduces a new high/critical advisory. Dependabot alerts remain enabled, while Dependabot security auto-update PRs remain disabled for manual triage.

  • Secret-leak defense in depth (gitleaks layers) — five independent guards now sit between a credential and a public push:
  • .gitignore .env* catch-all plus pattern blocks for keys/credentials/tfvars/secrets dirs;
  • backend/tests/core/test_env_template.py (every Settings field documented; no real-looking SecretStr values in .env.example);
  • local .githooks/pre-push runs gitleaks detect --log-opts="<remote>..HEAD" and fails the push on any finding (scripts/install-hooks.sh wires it locally and, with --global, into ~/.config/git/hooks);
  • new tox -e secrets env and gitleaks (history + diff) job in .github/workflows/ci.yml run gitleaks detect --all --reflog --no-merges on every PR and push;
  • GitHub repo settings now have Secret Scanning, Push Protection, and Dependabot alerts enabled — even git push --no-verify is blocked at GitHub's edge for known-provider patterns. (Dependabot security updates / auto-PR bumps are intentionally left off — they kept breaking the build; vulnerabilities are triaged manually from the alert queue.)
  • Tool attribution guard.githooks/commit-msg delegates to .githooks/tool_attribution_guard.py, stripping AI-tool authorship/attribution lines (Co-authored-by:, Signed-off-by:, Made with <tool> ..., vendor URL footers) from commit messages. scripts/install-hooks.sh --global installs the same guard into ~/.config/git/hooks/ so every repo on the workstation gets the same protection.
  • Secret hardening (Pydantic SecretStr)SECRET_KEY, AWS_SECRET_ACCESS_KEY, AZURE_OPENAI_API_KEY, AZURE_SEARCH_KEY, OAUTH_GOOGLE_CLIENT_SECRET, OAUTH_GITHUB_CLIENT_SECRET, LANGCHAIN_API_KEY, and TAVILY_API_KEY in backend/app/config/default.py are now typed SecretStr, so they redact in repr, logs, exceptions, and model_dump_json. Cleartext is read only at the boundary (JWT codec, OAuth client, Azure SDK, Tavily, LangSmith). simple_tracer.py no longer logs the LangSmith key in env-var dumps.
  • Pydantic settings tighteningDefaultSettings.model_config switched from extra='allow' to extra='ignore' with explicit SettingsConfigDict (case-sensitive, UTF-8, optional secrets_dir for Docker/K8s secret mounts). Previously-undeclared AZURE_SEARCH_VECTOR_FIELD now declared (text_vector default) and documented in .env.example.
  • CI guard for env templatebackend/tests/core/test_env_template.py asserts every DefaultSettings field is documented in .env.example (per-field for SecretStr fields) and that no SecretStr field carries a non-placeholder uncommented value.
  • Secrets management docdocs/SECURITY.md now describes the load precedence, the list of SecretStr fields, a production checklist (secret stores, instance roles, rotation), and the leak-response runbook.

Changed

  • Environment identifier consolidated — dropped the legacy ENVIRONMENT field from DefaultSettings/DevelopmentSettings/TestSettings; APP_ENV is now the only source of truth. Updated backend/app/main.py, backend/app/api/oauth_routes.py, backend/verify_configs.py, frontend-streamlit/app/{config,main}.py, scripts/verify_oauth.py, tox.ini (removed three redundant ENVIRONMENT = test lines), .env.example, and docs/ARCHITECTURE.md.
  • GitHub Pages documentation site — added MkDocs Material scaffold, docs CI build/deploy workflow, tox -e docs, a Pages landing page, and README/documentation positioning polish around the independent extension framing.
  • .env.example rewritten into 16 numbered sections with explicit [REQUIRED] / [REQUIRED IF <cond>] / [OPTIONAL — default: <v>] labels on every entry; duplicates removed and previously-undocumented fields added.
  • .env.test reorganized to mirror the same section numbering as .env.example; APP_ENV=test set explicitly.

[2026-05-19] — Portfolio polish (PR #24)

Changed

  • Docs (portfolio polish) — README restructured for progressive disclosure (pitch, role alignment, highlights, upstream delta, quality baseline); added PORTFOLIO_CASE_STUDY.md, docs/adr/ (4 ADRs), PRODUCTION_HARDENING.md; reframed EVALUATION.md baseline paragraph; updated docs/README.md index.
  • README — Overview without <details> collapsibles (full architecture, design, screenshots, LangSmith traces visible); CI status badge.
  • Eval — expanded eval_baseline_2026-05-19.md (retained AWS scores, Azure sweep table, findings); eval respects LangGraph when RAGAS_EVAL=1; Phase 5 script precision-balanced profile; RAGAS_DO_NOT_TRACK in tox eval.
  • LoggingRequestIdFilter on handlers; redact JWT payloads and chat queries at INFO; cap vendor loggers; optional LOG_JSON; access log via app.access; single idempotent initialize_logger().
  • Docs — documentation cohesion: DESIGN.md (product boundaries, decisions); README problem/quality sections; rename PRODUCT_ROADMAP.md; de-portfolio language; generic campus/Canvas LMS framing; document Bedrock KB + OpenSearch Serverless alongside Azure Search; expanded docs/README.md index; ARCHITECTURE.md LangGraph/OAuth/research_mode; WEB_RESEARCH.md KB path diagram; CI.md portfolio quick tox note; E2E.md OAuth note; RELEASE.md tag message; deduped CHANGELOG [Unreleased].

[2026-05-19] — Portfolio features (PRs #13–#17)

Added

  • Phase 5 retrieval — multi-query expansion/fusion, optional Bedrock/client metadata filters, rerank node (condense → multi_query → retrieve → rerank → generate).
  • Phase 5 rerank — LangGraph rerank node; FlashRank + keyword fallback; RERANK_* settings; candidate fetch via RERANK_CANDIDATE_K.
  • Phase 3 lite — README Quality & observability; LangSmith chat-session-* run names; curated golden ground_truth; scripts/promote_golden_draft.py.
  • RAGAS golden bootstrapscripts/bootstrap_golden_dataset.py, backend/tests/eval/seed_questions.json; golden set refreshed from live AWS KB (10 rows).
  • OAuth (dev) — API-port OAuth + one-time handoff to Vue (/oauth/handoff) fixes GitHub state_mismatch across Vite proxy ports.

Changed

  • Docs — README refresh (highlights, LangGraph/web/eval features, stack table); screenshot gallery under docs/assets/{product,observability,auth}/; doc index (docs/README.md), ARCHITECTURE.md, WEB_RESEARCH.md; consolidated LangSmith capture in EVALUATION.md; .gitignore for .cursor/ and golden draft.
  • Phase 5 retrieval tuning — RRF document fusion, keyword prefilter before rerank; tuned eval profile in scripts/run_eval_phase5.sh (faithfulness/recall up vs initial Phase 5; precision still below gate).
  • Phase 3 lite — portfolio RAGAS baseline policy; LangSmith trace screenshots in README; Phase 3 roadmap marked done (lite).

Fixed


[2026-05-19] — GitHub Actions CI/CD

Added

  • CI/CD — GitHub Actions: ci.yml (tox on main + PRs), cd.yml (Vue build + optional EB deploy on qa/release); docs/CI.md. Removed .travis.yml.

Fixed

  • CI — pin @rollup/rollup-* platform packages in frontend-vue optionalDependencies (fixes Linux npm ci optional-deps bug).
  • CI — use HUSKY=0 npm ci (not --ignore-scripts) so Rollup native bindings install on Linux runners.
  • CIfrontend-vue tox env skips nvm use when CI=true (GHA) or nvm is absent; CI workflow runs tox sequentially.

Changed

  • CItox -e lint,backend,frontend-vue green; ruff format/fix, LangGraph import fixes, ChatView research-mode binding.
  • Docs — roadmap cleanup: PRODUCT_ROADMAP.md is the single index; removed TODAY_SPRINT.md and roadmap/README.md; campus scale track moved to archive/PHASED_IMPROVEMENT_ROADMAP.md.
  • Testsconftest forces RAG_ENGINE=chain so API stream tests stay isolated from developer .env.

[2026-05-18] — Security dependency bumps

Changed

  • Runtime dependencies — FastAPI 0.115.x (Starlette CVE fixes), python-multipart>=0.0.27, python-jose>=3.4, PyJWT>=2.12, requests/urllib3/httpx upgrades, gunicorn>=22, python-dotenv>=1.2.2.
  • LangGraph pins — exact langgraph==0.2.76 + langgraph-checkpoint==2.0.26 (resolves httpx conflict with LangChain 0.3).

Added

  • docs/SECURITY.md — audit commands, production hardening checklist, dependency policy.

[2026-05-18] — LangGraph live validation (AWS KB parity)

Added

  • LangGraph live pathrun_rag_graph import fix; live AWS KB smoke validated with RAG_ENGINE=langgraph (sources + coherent answers).
  • LangGraph SSE — status event while graph runs in asyncio.to_thread; paced token chunks for progressive UI (simulated streaming until graph-native stream).
  • Web research (frontend)research_mode on API; Pinia + ChatInput toggle when VITE_WEB_RESEARCH_ENABLED=true (server WEB_RESEARCH_ENABLED required).
  • Docs — sprint checklist updates; LangGraph latency notes in LANGGRAPH.md.

Changed

  • Dependencies — pin langgraph 0.2.x + langgraph-checkpoint 2.x for LangChain 0.3 compatibility (see upcoming security branch for broader bumps).
  • scripts/run-backend-venv.sh — start via ./venv/bin/python -m uvicorn so reload uses project venv.

Fixed

  • Answer leakage — stronger _strip_condensed_question_leakage (backend + normalizeAssistantContent).
  • Empty-state prompts{{ prompt }} mustache in MessageList.vue.
  • SSE burst renderrequestAnimationFrame between tokens in chat.ts.
  • Graph unit tests — mock patch fixture and KB-path LLM stub.

[2026-05-18] — GitHub OAuth, LangGraph scaffold, and chat UI polish

Added

  • GitHub OAuthPOST /api/auth/oauth/{provider}/start and callback routes; OAuthButtons.vue; Alembic 0003 user OAuth fields; scripts/verify_oauth.py.
  • backend/app/core/auth_cookies.py — shared HTTP-only JWT cookie helpers for login and OAuth.
  • RAG streamingstream_query_async() via LangChain astream_events; SSE status events; condensed-question leakage stripping in rag.py.
  • LangGraph scaffoldbackend/app/services/graph/ (runner, nodes, state); opt-in web_search tool; RAG_ENGINE=langgraph config (default remains chain).
  • DocsPRODUCTION_TLS.md (HTTPS + OAuth redirects); SPRINT_2026-05-18_LANGGRAPH.md; WEB_RESEARCH.md.
  • Vue chat UI — typography scale (text-chat-*, .chat-prose); wider layout; mobile sidebar overlay; accessible user message accent; assistant sources stacked below replies; sticky composer.

Changed

  • Local dev defaults — Vite on http://127.0.0.1:5173 (strictPort); FRONTEND_URL / OAUTH_REDIRECT_BASE_URL aligned to avoid MismatchingStateError (see PRODUCTION_TLS.md).
  • Frontend — Pinia chat store appends stream tokens immediately; dedicated /api/chat/stream Vite proxy (no buffering).
  • Auth API — login/register use shared cookie helpers; OAuth links or creates users by provider subject.
  • Roadmap docs — LangGraph and portfolio roadmap updated for sprint status.
  • requirements.txt — LangGraph-related dependencies for graph scaffold.

[2026-05-18] — Generic tenant-hydrated RAG prompts

Added

  • backend/app/services/tenant_rag_config.py — load branding from env + tenant.rag_config (JSONB).
  • Alembic 0002tenant.rag_config column.
  • docs/TENANT_CONFIG.md — config shape and resolution order.
  • samples/berkeley/tenant_rag_config.json — optional Berkeley RTL sample profile (not default).

Changed

  • Prompt templates — generic prompt_prefix.txt / few_shot_examples.json with {{placeholders}}.
  • Chat + RAG — hydrate prompts per request from the signed-in user's tenant.
  • PROJECT_NAME / .env.exampleASSISTANT_NAME, SUPPORTED_TOPICS, OUT_OF_SCOPE_MESSAGE.
  • README — bring-your-own KB + tenant config.

[2026-05-18] — Performance Phase 0 quick wins

Added

  • docs/PERFORMANCE.md — Phase 0 shipped tuning; documentation checklists for Phase 1–3.
  • Config: CHAT_HISTORY_MAX_MESSAGES, STREAM_ARTIFICIAL_DELAY_MS, SQLALCHEMY_POOL_SIZE, SQLALCHEMY_MAX_OVERFLOW (see .env.example).
  • Prometheus: chatbot_chat_first_token_latency_seconds (SSE time-to-first-token).
  • Test: test_get_session_messages_respects_max_messages.

Changed

  • Streaming: removed fixed time.sleep on SSE tokens; optional demo delay via STREAM_ARTIFICIAL_DELAY_MS in RAG only.
  • Chat API: _load_chat_history() caps messages passed to LangChain.
  • DB: SQLAlchemy engine uses configured pool + pool_pre_ping.
  • run_services.sh: multi-worker uvicorn via API_WORKERS / UVICORN_WORKERS (default 2).
  • docs/OPERATIONS.md: SLOs split for auth/session vs live RAG; first-token alert hint.
  • docs/roadmap/PHASED_IMPROVEMENT_ROADMAP.md: Phase 0 perf shipped note; FlashRank marked Phase 2 / not in rag.py yet.

[2026-05-18] — Docs cleanup and Campus RAG Assistant rebrand

Changed

  • README: product-first Campus RAG Assistant opening; license/attribution under License.
  • GitHub repo renamed to campus-rag-assistant; About description updated.
  • changelog/CHANGELOG.md — single session-based log under changelog/ (other files in folder gitignored).
  • Trimmed ARCHITECTURE.md; clarified known gaps (buffered chat vs SSE).

Removed

  • docs/PORTFOLIO.md, docs/EXECUTION_PLAN.md, docs/DOC_AUDIT.md, scripts/new-changelog.sh.

Added

  • Full session history in changelog/CHANGELOG.md (2025 Berkeley baseline + 2026 fork sessions).

[2026-05-17] — tox and Vue in CI

Merged as PR #9.

Added

  • tox frontend-vue env: npm ci, typecheck, ESLint, Vitest (Node 20 via .nvmrc).
  • requirements.txt: langchain-openai, Azure SDKs; bcrypt>=4.0.1,<4.1.0 for passlib in tox.
  • Lazy Azure provider imports in backend/app/services/providers/__init__.py.

Changed

  • tox backend: RATE_LIMIT_ENABLED=false, exclude slow (RAGAS) by default.
  • README Testing: tox -e lint,backend,frontend-streamlit,frontend-vue.
  • Ruff/pytest marker cleanups.

[2026-05-17] — Portfolio publish to GitHub

PRs #1–#8 → campus-rag-assistant main. Packages work from May 2026 dev sessions into reviewable commits.

PR #1 — Dev tooling

  • .githooks/pre-commit, scripts/install-hooks.sh, run-backend-venv.sh, run-frontend-vue.sh, kill-dev-servers.sh, load-test helpers.
  • .gitignore portfolio hygiene.

PR #2 — Alembic

  • alembic.ini, backend/alembic/, 0001_initial_schema.py.

PR #3 — Platform middleware

  • request_context.py, metrics.py, rate_limit.py, dev_routes.py; wired in main.py, auth.py, chat.py.

PR #4 — Providers, RAG, eval

  • backend/app/services/providers/ (AWS / Azure / mock); rag.py registry wiring.
  • backend/tests/eval/ RAGAS golden harness.

PR #5 — Vue 3 SPA

  • frontend-vue/ scaffold, API/auth, chat UI, sessions, sources, Vitest + Playwright scaffolding.

PR #6 — Streamlit client

  • frontend-streamlit/ (auth, chat services, UI components, pytest).

PR #7 — Load tests

  • load-tests/ k6 smoke + auth-chat-session; user seed script.

PR #8 — Docs and README

  • docs/ architecture, operations, E2E, evaluation, roadmaps, LangGraph design.
  • Portfolio README, mock .env.example.

Post-publish cleanup (PR #7–#8 follow-ups)

  • Removed duplicate frontend/ tree; run_services.sh / toxfrontend-streamlit/.
  • main.py: create_all only in dev/test; requirements.txt: alembic, redis.
  • Removed root root-open-k6.js, empty root package-lock.json.

[2026-05-01] — RAG platform, Vue, providers (dev session)

Implementation work; landed on main via 2026-05-17 PRs above. Some session notes describe features not merged.

Added — on main

  • Vue 3 SPA, Streamlit tree, provider registry, Redis rate limiter, Prometheus metrics, dev routes, RAGAS eval, core scripts.

Added — session plan only (not on main)

  • POST /api/chat/stream (SSE); FlashRank RERANK_*; extra tox envs (eval, load-smoke, …).

Changed / fixed / security

  • rag.py, chat.py, schemas, .env.example, requirements, ruff / pytest; password rules; MessageBubble.vue; generic chat 500s; EB health proxy.

Follow-ups


[2026-05-01] — Logging and request correlation (dev session)

Summary

One request id per HTTP request (X-Request-ID); optional JSON logs; quieter auth logs.

Added

  • request_context.py, LOG_JSON, tests, kill-dev-servers.sh, Vue interceptors.ts for X-Request-ID.

Changed / removed / security

  • logger.py, main.py, security.py, config; removed unused LOGGING_PROPAGATION_LEVEL; no JWT dumps at INFO.

Berkeley ETS Chabot (baseline)

Chabot — campus RAG chatbot for UC Berkeley ETS over AWS Bedrock.
Upstream: ets-berkeley-edu/chabot.
© The Regents of the University of California — LICENSE.

sandeep-jay led implementation (CBO-tracked PRs below). Regents headers remain on derived files in this fork.


[2025-08-01] — Streamlit UX and frontend tests

Added

  • Streamlit refactor: chat interface, message display, feedback UI and stylesheets ([CBO-86]).
  • Frontend test suite covering auth, chat, and message modules ([CBO-89]).

[2025-06-13] — Backend and API test suites

Added

  • Pytest for RAG workflow, AWS/Bedrock/LangSmith, auth, DB/models/services ([CBO-69], [CBO-71], [CBO-72], [CBO-84]).
  • Chat API interaction tests: CRUD, feedback, sources, mocks ([CBO-70]).
  • pyproject.toml tool config; tox/travis alignment ([CBO-72]).

[2025-06-05] — Streamlit cleanup

Changed

  • Removed basic Streamlit prototype in favor of modular refactor ([CBO-99]).

[2025-05-30] — Chat API and Streamlit auth

Added

  • Chat endpoints: sessions, messages, feedback, test_langsmith ([CBO-45]–[CBO-47]).
  • Streamlit login, auth module, client services ([CBO-65], [CBO-66], [CBO-80], [CBO-81]).

[2025-05-29] — JWT auth and advanced RAG

Added

  • JWT authentication module and auth endpoints ([CBO-74], [CBO-75]).
  • Advanced RAG with Bedrock integration ([CBO-85]).

[2025-05-28] — Elastic Beanstalk deploy sketch

Added

  • .ebextensions and Nginx config for FastAPI + Streamlit ([CBO-63]).

[2025-05-12] — Bedrock RAG and first UI

Added

  • AWS, LangChain, Bedrock; simple RAG and /chat integration; prompt templates ([CBO-31], [CBO-34], [CBO-36], [CBO-41]).
  • Basic Streamlit chat UI + LangSmith tracing ([CBO-36], [CBO-42]).
  • ruff and tox ([CBO-67]).

[2025-05-05] — FastAPI foundation

Added

  • FastAPI boilerplate (/, /health) ([CBO-30]).
  • Pydantic-settings config manager ([CBO-32]).
  • Modular logger ([CBO-35]).
  • SQLAlchemy + chatbot table design ([CBO-49]).

[2025-05-13] — CI and README

Added

  • Travis CI linters ([CBO-82]).
  • README instructions ([CBO-82]).

Changed

  • .gitignore for .tox and .ruff* ([NOJIRA]).