ADR-006: DICOM stop_before_pixels metadata extraction¶
Date: 2026-05-27 Status: Accepted Deciders: Sandeep Jayaprakash
Context¶
Synthea Coherent includes DICOM MRI files for a subset of patients. Full pixel extraction requires GPU infrastructure for meaningful segmentation and would add significant complexity and runtime to the pipeline. However, DICOM header metadata (modality, body part, study description, manufacturer, field strength) provides clinically useful context for Gold encounter_summary without pixel processing.
Decision¶
Extract DICOM metadata using pydicom.dcmread(io.BytesIO(data), stop_before_pixels=True)
in every case. Never load pixel data in this pipeline phase. Preserve dicom_binary_id
in silver.imaging_study as a forward pointer for Phase 4 pixel extraction. Document
this decision explicitly in PRODUCTION_NOTES.md and in every DICOM-related code comment.
Alternatives considered¶
| Option | Pros | Cons | Why rejected |
|---|---|---|---|
| Full pixel extraction | Complete imaging pipeline | Requires GPU, MONAI, significant complexity | Out of scope for Phase 1 |
| Skip DICOM entirely | Simplest | Loses imaging metadata from Gold | Reduces corpus richness |
| Metadata only (current) | Fast, no GPU, clinically useful | Pixel ML deferred to Phase 4 | Right trade-off for portfolio scope |
Consequences¶
Positive: - Imaging metadata enriches Gold encounter_summary (modality, body_site, study_description) - Fast extraction — header read only, no pixel loading - dicom_binary_id preserved for Phase 4 without re-ingestion
Negative: - Pixel-level ML (segmentation, radiomics) is explicitly deferred to Phase 4 - Phase 4 requires GPU environment and MONAI — separate workstream
Neutral: - Matches production pattern: metadata extraction often precedes pixel processing in real pipelines
Implementation notes¶
local/transforms/fhir_parser.py— _extract_dicom_headers() with stop_before_pixels=Truelocal/transforms/silver_imaging.py— Silver table writerfabric/notebooks/07_silver_imaging.ipynb— Fabric extraction- Phase 4 path:
pydicom.dcmread(path)without stop_before_pixels + MONAI segmentation - dicom_binary_id in silver.imaging_study is the Phase 4 entry point