ADR-008: Dict-based FHIR parsing (not fhir.resources models)¶
Date: 2026-05-27 Status: Accepted Deciders: Sandeep Jayaprakash
Context¶
The spec (section 5.3) sketches FHIRBundleParser using the fhir.resources library
(from fhir.resources.bundle import Bundle) with attribute access on typed Pydantic
models (e.g. binary_resource.data). While implementing Session 1, two project rules
constrained that approach:
.claude/rules/transforms.mdand the healthcare-data skill mandate.get()with a default for every FHIR field — never direct key/attribute access — because Synthea bundles have optional fields everywhere.- Transforms in
local/transforms/must stay pure and dependency-light (ADR-002): they are imported into Fabric Spark UDFs, where a heavy Pydantic dependency and strict model validation are friction.
Inspecting a real Coherent bundle (Al123_Medhurst46_*.json, 815 KB, 333 entries)
surfaced concrete mismatches with strict typed models:
- References use the
urn:uuid:<id>form, and practitioners are referenced by identifier query (Practitioner?identifier=...|9999999799), not by resource id. - SOAP notes are Base64 inline in
DocumentReference.content[].attachment.data(no separateBinaryresource in most bundles), contradicting the spec'sextract_soap_note(binary_resource, doc_ref)signature. ImagingStudy.series.modality/.bodySiteare bareCodingdicts, notCodeableConceptwrappers.
fhir.resources strict validation routinely rejects real-world Synthea quirks, which
would force per-field error handling anyway — eroding the benefit of typed models.
Decision¶
Parse FHIR bundles as plain Python dicts using .get() access throughout, with no
fhir.resources dependency. FHIRBundleParser.parse_bundle() indexes bundle entries by
resourceType and by id (covering both resource.id and entry.fullUrl), then
dispatches to extract_* methods that return flat record dicts aligned with the Silver
schemas. A strip_reference() helper normalizes urn:uuid:, Type/id, and bare-id
references. extract_soap_note(doc_ref, binary_index=None) resolves both inline
attachments and Binary-by-url references.
pydicom (ADR-006) is imported lazily inside _extract_dicom_headers so the module
loads and tests run without it installed.
Alternatives considered¶
| Option | Pros | Cons | Why rejected |
|---|---|---|---|
fhir.resources typed models (spec sketch) |
Standards-validated, IDE autocomplete | Heavy dep in Spark UDFs; strict validation fails on Synthea quirks; attribute access conflicts with the .get() rule |
Friction outweighs typing benefit on synthetic data |
fhirclient / other FHIR SDK |
Similar typing | Same dependency-weight and validation issues | Same as above |
Dict-based .get() parsing (chosen) |
Tolerant of optional/odd fields, zero heavy deps, matches transform rules, trivially testable | No compile-time schema guarantees | Schema is enforced downstream in Silver validation (Session 2) instead |
Consequences¶
Positive:
- Transforms stay pure and import-light — drop straight into Fabric Spark UDFs.
- Tolerant of real Synthea quirks (urn:uuid:, inline attachments, bare Codings).
- Easy to unit test with small inline dicts and one compact fixture.
Negative: - No compile-time FHIR schema validation; field-name typos surface only at runtime/test. Mitigated by full test coverage and downstream Silver schema validation (Session 2).
Neutral:
- Diverges from the spec's literal fhir.resources sketch; the spec's intent (extract all
resource types, decode SOAP, metadata-only DICOM/genomics) is fully preserved.
Implementation notes¶
local/transforms/fhir_parser.py—FHIRBundleParser,strip_reference, helpers.- Section detection maps both Markdown clinical headers ("# Chief Complaint", "# Assessment
and Plan") and classic "SUBJECTIVE:/OBJECTIVE:" markers to S/O/A/P flags (ADR-005).
Coherent notes lack an Objective section, so
has_objectiveis frequentlyFalse— honest, documented behavior, not a bug. has_pathogenic_variantuses negation-aware text detection so "No pathogenic variants detected" →False(ADR-007).tests/test_fhir_parser.py,tests/test_silver_soap_notes.py— full coverage on the synthetictests/fixtures/sample_bundle.json.