Scribe IQ¶

View source on GitHub ->

Governed clinical documentation AI prototype. Synthetic longitudinal records, clinical-note-grounded RAG, chart review, structured note generation, and Responsible AI auditability — with provider-agnostic LLM and embedding layers and explicit production boundaries.

Clinical AI is only useful if it is grounded in the patient record, clear about its limits, and auditable when it influences human work. Scribe IQ is built around that premise.

Built on synthetic data. Not for clinical decision-making.

Why this repo exists¶

Scribe IQ demonstrates how governed institutional data platform patterns translate into healthcare-shaped AI workflows. Built as an architecture review artifact for academic health, research, and education technology environments, it surfaces:

longitudinal records and sensitive notes as the core data product
offline corpus construction separated from runtime serving
retrieval-grounded AI workflows with visible source context
provider boundary awareness for LLMs and embeddings
audit-first AI design with redacted previews and prompt/model traceability
explicit production deltas for PHI, SSO/RBAC, tenancy, BAA-backed deployment, and observability

For role-fit interpretation across healthcare, university, research, and education-IT architecture reviews, see Target role alignment.

What this shows¶

Layer	What is demonstrated
Corpus / data product	Nine-step offline `data_prep/` pipeline over Synthea and public clinical note sources; generated corpus artifact with manifest, dataset card, validation checks, and audit report
Serving substrate	FastAPI, Postgres/pgvector, Alembic migrations, async database access, and one governed store for patient rows, notes, embeddings, and audit records
Clinical workflows	Patient chart, encounter viewer, care timeline, pre-meeting prep, structured note generation, and grounded RAG chat
Responsible AI	Citation contract, append-only `ai_interactions`, redacted previews, prompt/model traceability, source traces, and Responsible AI Control Center

Corpus¶

Scribe IQ runs on a synthetic longitudinal patient and encounter dataset assembled from the following open sources — no real PHI is used:

Synthea — synthetic patient spine: demographics, encounters, conditions, medications, observations, and longitudinal structure.
MTSamples — public outpatient-style clinical note examples.
MedSynth — synthetic SOAP-style clinical notes and dialogue/note pairs.
ACI-Bench — encounter dialogue examples used in showcase workflows.

data_prep/ matches public note examples to Synthea encounters, scores candidate fit, adapts notes for patient-level consistency, validates outputs, and emits clinical_corpus_v2/ with a manifest, dataset card, and audit report.

Synthetic data only. No real PHI is used. The system is for demonstration and architecture review, not clinical decision-making.

Architecture¶

data_prep/ → clinical_corpus_v2/ artifact → scribe-load-corpus → Postgres/pgvector → FastAPI → Next.js → ai_interactions audit table.

flowchart TB
  subgraph Offline["Offline corpus pipeline"]
    Raw["Raw synthetic + public sources"] --> Staging["data_prep staging"]
    Staging --> Artifact["clinical_corpus_v2 artifact"]
    Artifact --> Loader["scribe-load-corpus"]
  end

  subgraph Runtime["Runtime app"]
    Next["Next.js UI"] --> API["FastAPI"]
    API --> LLM["LLM provider<br/>Groq · Azure OpenAI · Bedrock"]
    API --> Embed["Embedding provider<br/>OpenAI · Azure OpenAI · Bedrock"]
    API --> Audit["ai_interactions<br/>audit table"]
    Audit --> Admin["Responsible AI Control Center"]
  end

  Loader --> PG[("Postgres + pgvector")]
  API --> PG

Documentation¶

This page mirrors the repository README for the live documentation site.

Visitor	Best entry point
New here	`docs/overview/REVIEWER_GUIDE.md`
Product / architecture reviewer	`docs/overview/PORTFOLIO_CASE_STUDY.md`
Role-fit reviewer	`docs/overview/TARGET_ROLE_ALIGNMENT.md`
Technical reviewer	`docs/overview/SYSTEM_OVERVIEW.md`
Data platform reviewer	`docs/guides/CORPUS_ARTIFACTS.md`
Local setup	`docs/guides/QUICKSTART.md`
Full docs	`docs/README.md`

Screenshots¶

The UI is backed by a synthetic Synthea cohort; on-screen labels make this explicit.

Patient list¶

Patient list with cohort stats and filters

Patient chart¶

Patient context, Synthea profile, and chart tabs

Pre-meeting summary¶

Pre-meeting summary with care timeline

Encounter viewer¶

Clinical dialogue alongside structured note

Responsible AI¶

Responsible AI control center

Stack¶

Layer	Technology
Frontend	Next.js (App Router), TypeScript
Backend	FastAPI, asyncpg, Pydantic
Data store	Postgres 16 with pgvector
LLM	Groq, Azure OpenAI, or Amazon Bedrock
Embeddings	OpenAI, Azure OpenAI, or Amazon Bedrock
Migrations	Alembic
Corpus pipeline	Python, Synthea, MTSamples, MedSynth, ACI-Bench, Hugging Face datasets, Groq

Feature availability¶

Every external dependency is optional. The system degrades gracefully and reports what is configured via GET /health.

Without any keys	LLM provider credentials	Embedding provider credentials + `--embed`	Admin flags
Patient list, charts, encounter viewer	Pre-meeting summaries, structured note generation	RAG chat with citations	Responsible AI Control Center

For the full flag matrix, see docs/overview/SYSTEM_OVERVIEW.md.

Getting started¶

Local run requires a generated or restored corpus artifact at data/clinical_corpus_v2/. This artifact is produced by the offline data_prep/ pipeline and is not committed as application source. If the directory is empty after clone, see docs/guides/CORPUS_ARTIFACTS.md. Full prerequisites, optional capability paths, and troubleshooting: docs/guides/QUICKSTART.md.

docker compose up -d
cd backend && python -m venv .venv && source .venv/bin/activate && pip install -e .
alembic upgrade head
scribe-load-corpus
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000
# in a second terminal
cd frontend && nvm use && npm install && npm run dev

Frontend: http://localhost:3000. Backend: http://127.0.0.1:8000/health.

Demo readiness¶

Area	Status
Synthetic clinical corpus pipeline	Implemented
Runtime app: charts, encounters, meeting prep, RAG chat, note generation	Implemented
Responsible AI audit surfaces	Implemented
PHI readiness	Intentionally not claimed
SSO / multi-tenant isolation	Deferred production seam
Hosted demo URL	Planned / optional

License¶

This project is source-available for portfolio review and educational purposes only. Commercial use is prohibited without prior written permission. See LICENSE.