About & Notice¶
What this is¶
scribe-iq-lakehouse is a portfolio engineering artifact — a production-pattern healthcare
data lakehouse built to be reviewed, run, and reasoned about. It is the data-platform layer
beneath a small family of clinical-AI projects (see Downstream & Portfolio).
Data & privacy notice¶
Built entirely on Synthea Coherent synthetic data (AWS Open Data, no credentials required). It contains no real patient information — no PHI. Genomic content is simulated inheritance, not clinical variants (Responsible Data). Not for clinical decision-making.
License¶
MIT. The synthetic source dataset is published by the Synthea project under its own open terms.
Source & companion work¶
- Code: github.com/sandeep-jay/scribe-iq-lakehouse
- Downstream consumer: scribe-iq (clinical-documentation AI)
- Companion repo:
fabric-lakehouse-hls-readmission— a Databricks demo migrated to Fabric (different narrative, no code dependency)
How the docs are maintained¶
- The Data Dictionary and the Gold JSON Schema are generated from code and verified by a CI gate — they cannot drift (ADR-011).
- Decisions are recorded as ADRs; the Architecture page tracks the as-built system; the Changelog records what changed.
Contact¶
Sandeep Jayaprakash — github.com/sandeep-jay.