System overview¶

Scribe IQ is a governed clinical documentation AI prototype that turns an offline synthetic clinical corpus into a clinical product surface with auditability built in from the schema up: Postgres/pgvector serving, FastAPI APIs, provider-agnostic LLM workflows, a Next.js clinical UI, and Responsible AI review. LLM and embedding providers are configurable per deployment: the demo defaults to Groq for completions and OpenAI for embeddings, but the same code paths run against Azure OpenAI or Amazon Bedrock for institutional postures. AI interaction audit is modeled as first-class Postgres data in ai_interactions; admin routes and UI only control whether those rows are exposed for inspection.

This document is the architecture story: diagrams, capability flags, extension seams. For rationale and alternatives considered, see DESIGN_NOTES.md. For exact routes, schema, and flags, see docs/architecture/IMPLEMENTED_BASELINE.md. For the provider configuration matrix, see docs/guides/LLM_AND_EMBEDDING_PROVIDERS.md.

Runtime flow¶

flowchart TD
    Browser["Browser (Next.js)"]
    API["FastAPI (asyncpg pool)"]
    DB["Postgres 16 + pgvector"]
    LLM["LLM provider<br/>(Groq / Azure OpenAI / Bedrock)"]
    EMB["Embedding provider<br/>(OpenAI / Azure OpenAI / Bedrock)"]
    AuditTbl["ai_interactions append"]

    Browser -->|"REST + X-Request-ID"| API
    API --> DB
    API -->|"chat / note gen / meeting prep"| LLM
    API -->|"embed query / note"| EMB
    API -->|"append-only audit"| AuditTbl
    AuditTbl --> DB

The request path is intentionally short: the browser calls FastAPI, FastAPI queries Postgres directly, and only the AI-touching routes (chat, note generation, meeting prep) reach external LLM/embedding services. Every request carries an X-Request-ID that propagates through structured logs, making user-visible actions traceable end-to-end without logging PHI in bodies.

Corpus lifecycle¶

The corpus is built offline, not on demand. The application reads a stable, validated artifact; the data pipeline is a separate concern that can be re-run independently.

flowchart LR
    Synthea["Synthea JAR (seed 42)"]
    NotePool["Note pool (ACI-Bench + MTSamples + MedSynth)"]
    Pipeline["data_prep scripts 01-09"]
    JSONL["clinical_corpus_v2/ JSONL"]
    Loader["scribe-load-corpus"]
    Tables["patients + notes + embeddings (Postgres)"]

    Synthea --> Pipeline
    NotePool --> Pipeline
    Pipeline --> JSONL
    JSONL --> Loader
    Loader --> Tables

Synthea produces a deterministic synthetic patient population. The note pool contributes realistic clinical narrative from public datasets. The nine-step data_prep pipeline matches notes to synthetic patients, scores quality, selects a cohort, adapts notes via the configured LLM provider for consistency, and emits validated JSONL with a dataset card and audit report. The backend loader (scribe-load-corpus) upserts that JSONL into Postgres; with --embed, it generates embeddings via the configured embedding provider (OpenAI, Azure OpenAI, or Amazon Bedrock — see docs/guides/LLM_AND_EMBEDDING_PROVIDERS.md) into the notes.embedding vector column.

For pipeline detail, see data_prep/README.md and docs/reference/corpus_offline_pipeline_v2_brief.md.

Capability flags¶

Health (GET /health) reports which capabilities are configured; the UI surfaces degraded states instead of failing silently.

Flag	What it unlocks	Default
`LLM_PROVIDER`	Selects LLM provider (`groq`, `azure_openai`, `bedrock`) for chat, pre-meeting summaries, and note generation	`groq`
`GROQ_API_KEY`	Groq completions when `LLM_PROVIDER=groq`	unset
`AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` + `AZURE_OPENAI_CHAT_DEPLOYMENT`	Azure OpenAI completions when `LLM_PROVIDER=azure_openai`	unset
`AWS_REGION` + `AWS_BEDROCK_CHAT_MODEL_ID` (+ AWS credentials or role/profile)	Bedrock completions when `LLM_PROVIDER=bedrock`	unset
`EMBEDDING_PROVIDER` + provider credentials + `scribe-load-corpus --embed`	RAG chat over note embeddings; chat returns 503 until embeddings exist	`openai` (unset credentials)
`NOTE_GENERATION_ENABLED`	`POST /notes/generate` accepts writes	`false`
`MEETING_PREP_ENABLED`	`GET /patients/{id}/meeting-prep` produces LLM summaries	`true`
`RESPONSIBLE_AI_ADMIN_ENABLED` (backend)	`/admin/responsible-ai/*` admin routes mounted	`false`
`NEXT_PUBLIC_SCRIBE_ADMIN_UI` (frontend)	Responsible AI Control Center nav and pages	`false`
`BACKEND_API_KEY`	API key gate on all non-public routes	unset
`CORS_RELAX_LOCAL`	Local/LAN demo CORS regex	`false`

Provider boundary¶

The LLM and embedding providers are the only external network boundaries on the AI request path. Everything else — patient data, notes, encounter context, audit rows — stays inside the Postgres instance for the duration of a request.

What leaves the deployment: the selected prompt context (system instructions, retrieved note excerpts, user messages, transcripts) is sent to the configured LLM provider; the query and note text required for embeddings is sent to the configured embedding provider.
What does not leave: identifiers, raw chart rows, audit table contents, structured codes, request bodies. The provider sees only what the route handler explicitly serializes into the prompt or embedding payload.
Per-deployment, not per-request: LLM_PROVIDER and EMBEDDING_PROVIDER are settings, not knobs that change at runtime. Switching providers in production would be a deployment-level decision with an embedding rebuild step where applicable.

For the full provider matrix, environment variables, and rebuild workflow, see docs/guides/LLM_AND_EMBEDDING_PROVIDERS.md. For PHI posture and the explicit caveat that enterprise providers do not by themselves create compliance, see PRIVACY_AND_PROVIDER_BOUNDARIES.md.

Architecture themes (pointers)¶

These themes are intentional; alternatives considered, depth, and production deltas live in DESIGN_NOTES.md.

Colocation: vectors live in Postgres with relational rows; corpus is produced offline and loaded, not generated per request.
Grounding: chat is retrieval-first with a citation-shaped prompt contract ([note:uuid]), not a post-hoc verifier loop.
Governance: ai_interactions is a first-class table for completed and handled degraded AI paths; admin UI is optional exposure.

Extension points¶

Extension	Where it plugs in
Alternative LLM provider	`app/llm/` — `Settings.llm_provider`
Alternative embedding provider	`app/embeddings/` — `Settings.embedding_provider` (`openai` / `azure_openai` / `bedrock` / `none`)
Agentic tool loop for chat	`app/api/chat.py` — single-shot today; tool calls can extend without changing audit shape
Production authentication	`OptionalApiKeyMiddleware` — replace with SSO/RBAC at the middleware layer
Audio transcription	`docs/roadmap/SCRIBE_IQ_UI_ROADMAP.md` §12 — `POST /transcribe` before `POST /notes/generate`
Multi-tenant isolation	`domain` on `patients` / `notes` — row-level or pool work, not a new data model

Repository anchors¶

Concern	Location
Local Postgres + pgvector	`docker-compose.yml` (host port 5433)
Backend	`backend/`
Frontend	`frontend/`
Corpus pipeline	`data_prep/`
Generated corpus artifact	`data/clinical_corpus_v2/`
As-built API and schema	`docs/architecture/IMPLEMENTED_BASELINE.md`
Run instructions	`docs/guides/QUICKSTART.md`
Provider configuration	`docs/guides/LLM_AND_EMBEDDING_PROVIDERS.md`
Product framing	`docs/overview/PRODUCT_CONTEXT.md`
Design rationale	`docs/overview/DESIGN_NOTES.md`