This story draft by @escholar has not been reviewed by an editor, YET.

Enhancing Health Data Interoperability with Large Language Models: Results and Discussions

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.


(1) Yikuan Li, MS, Northwestern University Feinberg School of Medicine & Siemens Medical Solutions;

(2) Hanyin Wang, BMed, Northwestern University Feinberg School of Medicine;

(3) Halid Z. Yerebakan, PhD, Siemens Medical Solutions;

(4) Yoshihisa Shinagawa, PhD, Siemens Medical Solutions;

(5) Yuan Luo, PhD, FAMIA, Northwestern University Feinberg School of Medicine.

Table of Links



Results and Discussions

Conclusion & References

Results and Discussions

The results of annotation and FHIR generation are presented in Table 1. In summary, we annotated 3,671 medication resources, covering over 625 distinct medications and associated with 354 reasons. The Large Language Model (LLM) achieved an impressive accuracy rate of over 90% and an F1 score exceeding 0.96 across all elements. In prior studies, F1 scores reached 0.750 in timing.repeat, 0.878 in timing.route, and 0.899 in timing dosage. 1 The LLM improved these F1 scores by at least 8%. It's worth noting that the previous studies used a smaller private dataset, did not employ the strictest evaluation metrics like exact match rate, skipped terminology coding, and required extensive training. On further investigation, we were also impressed by the high accuracy in terminology coding (which essentially involves a classification task with more than 100 classes), mathematical conversion (e.g., inferring a duration of 10 days when the input mentions 'TID, dispense 30 tablets'), format conformity (with less than a 0.3% chance that the results cannot be interpreted in .JSON format), and cardinality (the LLM can handle both 1:N and 1:1 relationships).

The accuracy of the output is highly dependent on the instruction prompts used. Based on our extensive trials and errors, we have the following recommendations: i) provide diverse conversion examples that encompass a wide range of heterogeneous edge cases; ii) use strong language, such as “MUST”, to ensure that the output adheres to the expected formats and rults; iii) continuously update and refine the prompts by reviewing results from a small subset, which can help identify common mistakes and enhance overall accuracy; iv) be cautious about out-of-vocabulary codings. LLMs may attempt to cater users by inventing codes that do not exist when they cannot find a close match.

. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community