This story draft by @escholar has not been reviewed by an editor, YET.

Enhancing Health Data Interoperability with Large Language Models: A FHIR Study: Methods

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.


(1) Yikuan Li, MS, Northwestern University Feinberg School of Medicine & Siemens Medical Solutions;

(2) Hanyin Wang, BMed, Northwestern University Feinberg School of Medicine;

(3) Halid Z. Yerebakan, PhD, Siemens Medical Solutions;

(4) Yoshihisa Shinagawa, PhD, Siemens Medical Solutions;

(5) Yuan Luo, PhD, FAMIA, Northwestern University Feinberg School of Medicine.

Table of Links



Results and Discussions

Conclusion & References


Data Annotation To the best of our knowledge, there is no largely publicly available dataset in the FHIR standard that is generated from contextual data. Therefore, we have chosen to annotate a dataset containing both free-text input and structured output in FHIR formats. The free-text input was derived from the discharge summaries of the MIMICIII datase. 3 Thanks to the 2018 n2c2 medication extraction challenge 4 , which essentially involves named entity recognition tasks, elements in medication statements have been identified. Our annotations built upon these n2c2 annotations and standardized the free text into multiple clinical terminology coding systems, such as NDC, RxNorm, and SNOMED. We organized the contexts and codes into FHIR medicationStatement resources. The converted FHIR resources underwent validation by the official FHIR validator ( to ensure compliance with FHIR standards, including structure, datatype, code sets, display names, and more. These validated results were considered the gold standard transformation results and could be used to test against the LLMs. No ethical concerns exist regarding data usage, as both the MIMIC and n2c2 datasets are publicly available to authorized users.

Large Language Model We used OpenAI's GPT-4 model as the LLM for FHIR format transformation. We used five separate prompts to instruct the LLM to transform input free text into medication (including medicationCode, strength, and form), route, schedule, dosage, and, reason, respectively. All prompts adhered to a template with the following strucuture: task instructions, expected output FHIR templates in .JSON format, 4-5 conversion examples, a comprehensive list of codes from which the model can make selections, and then the input text. As there was no finetuning or domain-specific adaptation in our experiments, we initially had the LLM generate a small subset (N=100). Then, we manually reviewed the discrepancies between the LLM-generated FHIR output and our human annotations. Common mistakes were identified and used to refine the prompts. It's important to note that we did not have the access to the whole lists of NDC, RxNorm, and SNOMED Medication codes for drug names, as well as SNOMED Finding codes for reasons. Additionally, even if we had such comprehensive lists, they would have exceeded the token limits for LLMs. Thus, we did not task LLMs with coding these entities; instead, we instructed them to identify the contexts mentioned in the input text. For other elements, e.g. drug routes and forms, numbering in the hundreds, we allowed LLMs to directly code them. When evaluating the LLM-generated output, our primary criterion was the exact match rate, which necessitates precise alignment with human annotations in all aspects, including codes, structures, and more. Additionally, we reported precision, recall, and F1 scores for specific element occurrences. We accessed the GPT-4 APIs through the Azure OpenAI service, aligning with responsible use guidelines for MIMIC data. The specific model we used was 'gpt-4-32k' in its '2023-05-15' version. Each text input was individually transformed into a MedicationStatement resource. To optimize efficiency, we made multiple asynchronous API calls.

. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community