Why EHR Data Doesn't Fit Neat ML Tables

The problem nobody talks about: why hospital data breaks standard machine learning

Imagine trying to understand a patient by filling out a form every hour, even if nothing happened. Or worse, trying to force continuous monitoring data into rigid hourly buckets when the crucial events—a medication given, a test result returned, a vital sign spike—happen at random times. Hospital data doesn't cooperate with neat grids, and pretending it does means throwing away the precise timing information that often matters most clinically.

Most machine learning practitioners learn with data that comes in orderly tables, rectangular matrices where every row is an observation and every column is a feature. Electronic health records violently reject this assumption. Real hospitals generate events asynchronously, with vast stretches of nothing followed by bursts of activity. A patient might have no recorded events for six hours, then receive three medications and a lab test within fifteen minutes. This isn't an edge case—it's how medicine actually works.

Yet the research landscape treating this reality has been fragmented. Different papers use different terminology. Models get compared using inconsistent evaluation protocols. Some work silently assumes data can be regularized into fixed time intervals; others acknowledge the sparsity but handle it differently. Without a unified framework, progress slows, and the real problem—that standard multivariate time series approaches force clinical reality into shapes it doesn't naturally fit—remains in the background, never quite surfaced as the core challenge.

This fragmentation is what this review addresses directly. It establishes a unified framework for thinking about event streams in health data, showing that when we stop fighting the sparsity and irregularity of real medicine, we unlock better AI models.

What actually happens in a patient's medical journey

Before building better models, we need a shared language for what we're actually modeling. Think of a patient's medical record as a continuous timeline with annotations marking every clinical event. Events are the basic units: a vital sign measured, a medication given, a lab test ordered, a diagnosis recorded. Each event has a precise timestamp, a type or category, and sometimes a numerical value. The order matters profoundly, and the gaps between events carry information too.

An event stream is fundamentally a continuous sequence of timestamped events. Unlike regular time series where something gets measured at fixed intervals—say, every hour or every day—an event stream captures exactly when things happen in the clinical workflow. A medication given at 9:47 AM followed by a lab test at 10:15 AM creates a specific causal and temporal narrative. That ten-minute gap is information. It tells you about the timing of clinical decisions and their effects.

The key structural feature is sparsity. Most patients most of the time don't have new events happening. This is unlike many standard time series datasets where something is being measured every time step. Hospital data lives in a sparse, irregular regime where the events themselves carry most of the information. Long stretches of nothing punctuate brief periods of intense activity. Traditional time series methods struggle here because they're designed to extract patterns from dense, regularly sampled data.

Consider how this differs from the standard approach. In multivariate time series modeling, practitioners typically represent patient data as a matrix where rows are time points and columns are clinical variables. This requires deciding on a time resolution—should each row represent one hour, one day, one week? Then for every time point where no measurement exists, you either leave it blank (missing data) or fill it in with interpolation or imputation. Both choices distort the underlying reality.

Illustration showing raw EHR data with heterogeneous event types (vital signs, lab tests, medications, procedures, diagnoses) arriving at irregular times, contrasted with how multivariate time series forces this into regular time-grid representation

The figure above makes this contrast vivid. Raw EHR data illustrates how heterogeneous clinical events don't arrive on a regular schedule. The multivariate time series representation shows what happens when you force that irregular data into regular intervals, filling in gaps where no events occurred. Information is lost—not just the precise timing, but the fact that certain time periods had no clinical activity at all.

Event streams preserve this precision. Each event retains its exact timestamp, its type or category, and any numerical values attached to it. The representation remains fundamentally sparse, matching how hospitals actually work. There's no need to pretend measurements happened at times they didn't.

The three dimensions that matter: a taxonomy for event streams

Models can fail in different ways. Some ignore when events happen and treat the sequence as unordered. Others ignore what type of event it is, treating a medication the same as a lab result. Some struggle with the numerical values attached to events, discarding potentially important information like the magnitude of a blood pressure reading or dosage. A useful taxonomy separates models by which of these three dimensions they actually handle well. It's asking: does this model care about time, does it care about what kind of event, and does it care about how much?

This taxonomy transforms the landscape from chaotic to navigable. By sorting models into categories based on what they preserve, you can understand why certain models are better for certain tasks and what trade-offs each approach makes.

Handling event time: the when

Time carries clinical meaning in multiple ways. Some models treat an event sequence as simply ordered, caring only that event A came before event B. Others incorporate the actual time gaps between events, recognizing that ten medications given within an hour carries entirely different clinical meaning than those same ten medications spread over a month. Still others work with relative timing (time since admission) or duration information (how long a patient stayed on a medication).

Where a model sits on this spectrum determines whether it can capture clinically meaningful temporal patterns. A model that ignores time gaps might learn that certain medication sequences are common without understanding that one sequence happens over days while another happens in minutes. That distinction often matters for prognosis.

Handling event type: the what

Events have categories that matter for medical reasoning. A vital sign reading, a medication, and a diagnosis are fundamentally different kinds of information. Simple models might ignore these types entirely, treating all events as generic signals. More sophisticated models use type embeddings or maintain type-specific neural pathways, allowing the model to reason differently about different event categories.

The distinction matters because clinical logic often reasons within event types (all medications together, all lab results together) and across them (does this medication cause this lab abnormality?). A model insensitive to event type loses this structure.

Handling event value: the how much

Many events include numerical values: blood pressure of 140/90, glucose of 250 mg/dL, a medication dosage of 500mg. Some models discard this information, treating events as binary (happened or didn't). Others integrate values directly into their representations. The choice affects what the model can learn, especially for prognostic tasks where the magnitude of a measurement often carries clinical weight. An extremely elevated glucose reading carries different prognostic significance than a slightly elevated reading, and a model that ignores this distinction loses predictive power.

Overview of event stream modelling showing the taxonomy categorizing models by how they handle event time, type, and value, with different architectural approaches arranged by these three dimensions

The taxonomy organizes the entire field by showing how different models map onto these three core dimensions. This becomes the foundational organizational chart for understanding what separates one approach from another.

Teaching models to learn from messy, sparse data

Now that we understand what event streams are and how to categorize them, the question becomes: how do we actually teach these models? This reveals that the training approach is as important as the architecture itself. The same model trained differently can behave like a fundamentally different model.

Supervised learning foundation

Supervised learning in healthcare is straightforward in concept: give the model examples of past patient trajectories and the outcomes that followed, then ask it to predict outcomes for new patients. The catch is defining what counts as training signal. Do you predict the very next event? The outcome in the next 30 days? Whether a patient will readmit?

In supervised event stream modeling, common prediction tasks include next event prediction (given all previous events, predict the next event's type, timestamp, or value), outcome prediction (given a patient's history up to some point, predict a clinical outcome like mortality or readmission), and multi-task learning where the model jointly predicts multiple outcomes or multiple next events. Each framing teaches the model to attend to different patterns.

For sparse data, standard approaches become inefficient because most predictions will be "nothing happens," providing little gradient signal. Some approaches use weighted losses or contrastive objectives to handle this imbalance, ensuring that the rare but important events still drive learning.

Self-supervised learning and pretraining

Here's the key tension in healthcare AI: hospitals have vast amounts of unlabeled data (every patient record) but relatively little labeled data (patients with specific, carefully validated outcomes). What if models could learn something useful about how patients change over time without needing labels? Self-supervised learning asks the model to predict parts of its own input, using the data itself as the supervision signal.

This represents the frontier of practical healthcare AI. Labels are expensive and often unavailable, but raw patient data is abundant. Self-supervised pretraining allows models to learn general patterns of how patients change, which downstream models can then adapt to specific clinical questions. The practical advantage is significant: pretraining on unlabeled data from a large hospital system, then fine-tuning on labeled data from a specific outcome or condition, often outperforms training from scratch on only labeled data.

The paper identifies four main self-supervised approaches. Next-token prediction works like predicting the next word in a sentence—the model observes events up to time T and learns to predict what happens next. This teaches the model about natural event sequences and causal progressions. It's surprisingly powerful because it forces the model to internalize disease progression, typical medication sequences, and clinical workflows.

Masked event reconstruction randomly hides some events in a patient's history, then asks the model to reconstruct them from context. This teaches the model to understand dependencies among events, recognizing patterns like "if this medication is present, which lab test might be abnormal?"

Contrastive learning creates two views of the same patient's data and teaches the model that these views should have similar representations, while different patients have dissimilar representations. This forces the model to learn meaningful structure in patient similarity. Temporal prediction goes further, asking the model to predict the entire future trajectory of a patient rather than just the next event.

Illustration of self-supervised learning methods showing next-token prediction, masked event reconstruction, contrastive learning, and temporal prediction approaches

The four main self-supervised approaches visualized, showing what each method hides or compares. The conceptual differences become immediately clear.

Practical training considerations

Beyond the conceptual approach, several practical decisions shape whether event stream models actually work. Handling variable sequence lengths matters because some models require fixed-length inputs, requiring choices between padding, truncation, or using architectures like Transformers that natively handle variable length. These choices affect what information is preserved and which patients are represented fairly.

Dealing with missing values involves assumptions about how data were generated. Should a medication that could affect a lab value but wasn't recorded be treated as "definitely not given" or as "unknown"? Different approaches encode different assumptions with real consequences for what the model learns.

Temporal normalization requires deciding whether to use absolute timestamps, relative time since admission, or time-to-event. Raw absolute times often contain non-clinical information (seasonal effects, hospital operational changes) that confuses the model. Class imbalance arises in outcome prediction where positive events like death or readmission are often rarer than negative events. Standard methods like weighted sampling or focal loss help, but the choice should match the clinical use case.

Where event stream models actually solve problems

An abstract framework only matters if it solves real clinical problems. Event streams shine in scenarios where irregular timing and precise sequencing matter. They're less obviously beneficial if you only care about "did this patient ever receive this medication" but incredibly powerful when you ask "what happens when we give this medication, and how quickly does the lab value change?"

Clinical outcome prediction represents the most direct application. Readmission, mortality, and length of stay predictions benefit from event streams because the trajectory leading to adverse outcomes is naturally encoded in the precise sequence and timing of events. A patient's final hospital days as an event stream—medications given, tests ordered, results received—often predicts whether they'll readmit better than static summaries. Hospitals use these predictions to decide which patients need more intensive discharge planning, making accuracy clinically valuable.

Disease progression and trajectory modeling exploits the temporal nature of event streams. Conditions follow typical patterns: some patients get diagnosed, start medication, and improve steadily; others' conditions worsen despite treatment. Event streams can learn these typical progressions and flag when a patient deviates from expected trajectories. Early detection of atypical progression could trigger intervention before complications occur. This is fundamentally temporal and sequential in nature in ways that static models can't capture.

Treatment effect estimation benefits from the precision event streams provide. In a sparse event stream, you can see exactly when a medication was given and exactly when subsequent lab tests or clinical events occurred. This precision allows more accurate causal inference than data where timing is coarse. Some patients improve quickly after a medication, others don't, and the sequence and timing of events might explain why. Personalized medicine requires understanding which treatments work for which patients, and event stream models can integrate detailed evidence in ways that respect the temporal reality of clinical practice.

Early warning systems and anomaly detection leverage the fact that you can learn what a typical patient trajectory looks like and spot deviations in real time. A patient whose event sequence suddenly diverges from their usual pattern might be deteriorating. The precision of event timing means these changes can be detected quickly. The faster a clinician is alerted to patient deterioration, the more time they have to intervene.

Phenotyping and subgroup discovery reveals hidden patient groupings by using event streams and learned embeddings. Patients with the same diagnosis can be very different, but event stream models might uncover more meaningful patient groupings by recognizing that some trajectories are fundamentally similar even if their surface-level diagnoses differ. Clinical practice is often built around diagnoses, but these models could reveal which subgroups respond differently to treatments or progress at different rates.

The connection to related work on benchmarking clinical time series summarization shows how event stream understanding can improve how we communicate patient information. Additionally, research on learning temporal embeddings from electronic health records directly applies these taxonomies to create representations that capture temporal structure.

The frontier: what we still don't know

Science progresses by identifying frontiers, the unsolved questions that guide future work. Event stream modeling in healthcare remains an active, developing field.

Interpretability and clinical trust represents a critical gap. Event stream models, especially those using deep learning, are often black boxes. When a model predicts that a patient will readmit, a clinician needs to understand why. Which events drove that prediction? Did the model pick up on clinically meaningful patterns, or is it spurious correlation? Methods for explaining event stream models remain underdeveloped. Creating interpretable models that doctors can trust and understand is essential before these systems can be widely deployed in real clinical settings.

Handling irregular, missing, and noisy data at scale remains challenging. Real hospital data is messier than the sanitized datasets used in research. Records have data entry errors, missing values, and artifacts from how the EHR system works rather than how medicine works. Current methods make assumptions (like data missing completely at random) that often don't hold in practice.

Temporal generalization asks whether models trained on historical data work when clinical practice changes. Hospitals update protocols, new medications enter use, populations shift. A model trained on 2020 data might fail on 2024 patients because clinical workflows evolved. Understanding how event stream models generalize across time is largely unexplored.

Multimodal integration is becoming increasingly important as hospitals collect more diverse data—not just structured EHR events but also clinical notes, imaging studies, and physiological waveforms. Work on platform-agnostic multimodal digital human modelling points toward integrating these modalities, but methods for combining event streams with unstructured or high-dimensional data remain immature.

Handling distribution shift matters when models face patient populations or clinical conditions different from training data. Research on learning clinical representations under systematic distribution shift highlights how models degrade when data distributions change, but event stream approaches to robustness are underdeveloped.

Causality and treatment effect heterogeneity require moving beyond prediction to understanding. Event streams encode what happened, but healthcare ultimately cares about what would happen under different interventions. Learning not just to predict outcomes but to estimate heterogeneous treatment effects for individual patients remains largely open.

Privacy and federation become essential at scale. Patient data is sensitive, and hospitals can't share raw records across institutions. Federated learning approaches for event streams exist but are early. Methods for privacy-preserving event stream modeling while maintaining predictive power would unlock collaboration across hospital systems.

The research landscape has shifted from "how do we fit healthcare data into standard time series methods" to "what are the right methods for inherently temporal, sparse, irregular healthcare data?" Event stream modeling provides the conceptual framework for that shift. But the frontier remains: turning this framework into systems that are interpretable, robust, private, and actually deployable in real hospitals. The foundation is in place. The hard work of building on it has just begun.

This is a Plain English Papers summary of a research paper called The Taxonomies, Training, and Applications of Event Stream Modelling for Electronic Health Records. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.