~/arun-k
CompletedBuilt 2025-04

ICU Mortality Prediction

A machine learning approach to catching ICU deterioration early, using what the body signals before clinicians intervene.

RXGBoostNLPSHAPML
ICU Mortality Prediction

ICU Mortality Prediction

A machine learning approach to catching ICU deterioration early, using what the body signals before clinicians intervene.

The Brief

ICU patients don't deteriorate all at once. Risk accumulates quietly, in a creatinine reading that keeps climbing, in a blood pressure that refuses to stabilise. The question this project tried to answer: can a model surface that signal early enough to be useful, using only data that already exists in the patient record?

The Data

The MIMIC-III clinical dataset. Rich, messy, and built for exactly this kind of problem. Patient timelines were reconstructed from shifted dates to recover true ages, a non-trivial step, since age is one of the strongest mortality predictors in the ICU.

The more interesting decision was how to treat physiological vitals. Averages are deceptive; a patient whose heart rate swings between 48 and 140 reads as normal at 94bpm. I used minimum and maximum values per observation window instead, capturing the instability that means something clinically, not the smooth summary that hides it.

The Approach

ICD-9 diagnosis codes are typically one-hot encoded and treated as independent flags. But diagnoses aren't independent; comorbidities follow patterns, and conditions cascade. I applied GloVe embeddings to represent each code as a vector, placing co-occurring conditions closer together in space. The model could then read a patient's diagnostic history as a connected narrative rather than a flat checklist.

For the classifier, XGBoost with scale_pos_weight adjusted for the class imbalance; ICU mortality sits around 11%, which means a naïve model can hit 89% accuracy by predicting everyone survives. Recall was the metric that mattered: how many high-risk patients did we fail to flag?

The Result

AUC of 0.95. Recall improved from 0.40 to 0.62 after reweighting. The accuracy dropped slightly, which was intentional. In a clinical context, a missed high-risk patient is a categorically worse outcome than a false alarm.

SHAP values were added to explain individual predictions: which vitals were pushing risk up, which diagnoses were contributing, and by how much. A model that can't answer "why" is a model a clinician won't use.

What I Learned

Optimising for the right metric matters more than optimising well. Most of the meaningful decisions in this project weren't about model architecture; they were about feature engineering, class weighting, and whether the output could be trusted by someone who wasn't a data scientist.

Clinical AI lives or dies on explainability. That was the real lesson.

Honest Limitations

This was built on a public dataset and hasn't been validated on a local patient population or near a real clinical workflow. Real deployment would require prospective validation, calibration, and regulatory considerations well beyond the scope of this project. What this demonstrates is the thinking: the feature choices, the metric priorities, and the interpretability layer. That framing transfers even when the model itself doesn't ship.

WHY IT EXISTS

To move beyond simple classification and provide clinicians with early warning signs, giving them more time to intervene and save lives.

WHAT WAS HARD

Clinical data reality: Reconstructing patient timelines from shifted dates and capturing physiological instability through Min/Max vitals instead of misleading averages.

WHAT I'D DO DIFFERENTLY

I would use Recurrent Neural Networks (RNNs) or Transformers to capture the temporal sequence of diagnoses, rather than just static co-occurrence patterns.

Technical Notes
01
Engineered 50-dimensional GloVe embeddings for ICD-9 codes to capture medical context.
02
Balanced class weights (7.91) to improve recall from 0.40 to 0.62 for high-risk patients.
03
Integrated SHAP values to provide transparent, patient-level explanations for clinical trust.
04
Achieved a ROC-AUC of 0.95 across a 5-fold cross-validation suite.