Skip to content

Scribe IQ

CI License Docs

View source on GitHub ->

Governed clinical documentation AI prototype. Synthetic longitudinal records, clinical-note-grounded RAG, chart review, structured note generation, and Responsible AI auditability — with provider-agnostic LLM and embedding layers and explicit production boundaries.

Clinical AI is only useful if it is grounded in the patient record, clear about its limits, and auditable when it influences human work. Scribe IQ is built around that premise.

Built on synthetic data. Not for clinical decision-making.

Why this repo exists

Scribe IQ demonstrates how governed institutional data platform patterns translate into healthcare-shaped AI workflows. Built as an architecture review artifact for academic health, research, and education technology environments, it surfaces:

  • longitudinal records and sensitive notes as the core data product
  • offline corpus construction separated from runtime serving
  • retrieval-grounded AI workflows with visible source context
  • provider boundary awareness for LLMs and embeddings
  • audit-first AI design with redacted previews and prompt/model traceability
  • explicit production deltas for PHI, SSO/RBAC, tenancy, BAA-backed deployment, and observability

For role-fit interpretation across healthcare, university, research, and education-IT architecture reviews, see Target role alignment.


What this shows

Layer What is demonstrated
Corpus / data product Nine-step offline data_prep/ pipeline over Synthea and public clinical note sources; generated corpus artifact with manifest, dataset card, validation checks, and audit report
Serving substrate FastAPI, Postgres/pgvector, Alembic migrations, async database access, and one governed store for patient rows, notes, embeddings, and audit records
Clinical workflows Patient chart, encounter viewer, care timeline, pre-meeting prep, structured note generation, and grounded RAG chat
Responsible AI Citation contract, append-only ai_interactions, redacted previews, prompt/model traceability, source traces, and Responsible AI Control Center

Corpus

Scribe IQ runs on a synthetic longitudinal patient and encounter dataset assembled from the following open sources — no real PHI is used:

  • Synthea — synthetic patient spine: demographics, encounters, conditions, medications, observations, and longitudinal structure.
  • MTSamples — public outpatient-style clinical note examples.
  • MedSynth — synthetic SOAP-style clinical notes and dialogue/note pairs.
  • ACI-Bench — encounter dialogue examples used in showcase workflows.

data_prep/ matches public note examples to Synthea encounters, scores candidate fit, adapts notes for patient-level consistency, validates outputs, and emits clinical_corpus_v2/ with a manifest, dataset card, and audit report.

Synthetic data only. No real PHI is used. The system is for demonstration and architecture review, not clinical decision-making.


Architecture

data_prep/clinical_corpus_v2/ artifact → scribe-load-corpus → Postgres/pgvector → FastAPI → Next.js → ai_interactions audit table.

flowchart TB
  subgraph Offline["Offline corpus pipeline"]
    Raw["Raw synthetic + public sources"] --> Staging["data_prep staging"]
    Staging --> Artifact["clinical_corpus_v2 artifact"]
    Artifact --> Loader["scribe-load-corpus"]
  end

  subgraph Runtime["Runtime app"]
    Next["Next.js UI"] --> API["FastAPI"]
    API --> LLM["LLM provider<br/>Groq · Azure OpenAI · Bedrock"]
    API --> Embed["Embedding provider<br/>OpenAI · Azure OpenAI · Bedrock"]
    API --> Audit["ai_interactions<br/>audit table"]
    Audit --> Admin["Responsible AI Control Center"]
  end

  Loader --> PG[("Postgres + pgvector")]
  API --> PG

Documentation

This page mirrors the repository README for the live documentation site.

Visitor Best entry point
New here docs/overview/REVIEWER_GUIDE.md
Product / architecture reviewer docs/overview/PORTFOLIO_CASE_STUDY.md
Role-fit reviewer docs/overview/TARGET_ROLE_ALIGNMENT.md
Technical reviewer docs/overview/SYSTEM_OVERVIEW.md
Data platform reviewer docs/guides/CORPUS_ARTIFACTS.md
Local setup docs/guides/QUICKSTART.md
Full docs docs/README.md

Screenshots

The UI is backed by a synthetic Synthea cohort; on-screen labels make this explicit.

Patient list

Patient list with cohort stats and filters

Patient chart

Patient context, Synthea profile, and chart tabs

Pre-meeting summary

Pre-meeting summary with care timeline

Encounter viewer

Clinical dialogue alongside structured note

Responsible AI

Responsible AI control center


Stack

Layer Technology
Frontend Next.js (App Router), TypeScript
Backend FastAPI, asyncpg, Pydantic
Data store Postgres 16 with pgvector
LLM Groq, Azure OpenAI, or Amazon Bedrock
Embeddings OpenAI, Azure OpenAI, or Amazon Bedrock
Migrations Alembic
Corpus pipeline Python, Synthea, MTSamples, MedSynth, ACI-Bench, Hugging Face datasets, Groq

Feature availability

Every external dependency is optional. The system degrades gracefully and reports what is configured via GET /health.

Without any keys LLM provider credentials Embedding provider credentials + --embed Admin flags
Patient list, charts, encounter viewer Pre-meeting summaries, structured note generation RAG chat with citations Responsible AI Control Center

For the full flag matrix, see docs/overview/SYSTEM_OVERVIEW.md.

Getting started

Local run requires a generated or restored corpus artifact at data/clinical_corpus_v2/. This artifact is produced by the offline data_prep/ pipeline and is not committed as application source. If the directory is empty after clone, see docs/guides/CORPUS_ARTIFACTS.md. Full prerequisites, optional capability paths, and troubleshooting: docs/guides/QUICKSTART.md.

docker compose up -d
cd backend && python -m venv .venv && source .venv/bin/activate && pip install -e .
alembic upgrade head
scribe-load-corpus
uvicorn app.main:app --reload --host 127.0.0.1 --port 8000
# in a second terminal
cd frontend && nvm use && npm install && npm run dev

Frontend: http://localhost:3000. Backend: http://127.0.0.1:8000/health.


Demo readiness

Area Status
Synthetic clinical corpus pipeline Implemented
Runtime app: charts, encounters, meeting prep, RAG chat, note generation Implemented
Responsible AI audit surfaces Implemented
PHI readiness Intentionally not claimed
SSO / multi-tenant isolation Deferred production seam
Hosted demo URL Planned / optional

License

This project is source-available for portfolio review and educational purposes only. Commercial use is prohibited without prior written permission. See LICENSE.