Smarter Sepsis Care: Deep Learning Model Guides Steroid Treatment Decisions

ICU patient monitoring forms the foundation for AI models to predict which sepsis patients will benefit from corticosteroid therapy, enabling personalized treatment decisions. Courtesy of mikemacmarketing from Openverse via Amsterdam UMC

Sepsis remains one of the most challenging conditions in critical care medicine, marked by a dysregulated host response to infection that can rapidly progress to multi-organ failure. Despite advances in intensive care, the 30-day survival rate for patients with septic shock is only 60% to 70% in developed countries. One of the most debated treatment strategies involves corticosteroid therapy. It exhibits significant interindividual heterogeneity in treatment effects (ITE)—clinical evidence is inconsistent, with some patients benefiting while others may face increased mortality risk due to immunosuppression, a dilemma traditional clinical trials struggle to resolve.

To address this, a research team from the Department of Intensive Care Medicine, Amsterdam UMC, employed causal deep learning to precisely identify subgroups of intensive care unit (ICU) patients with sepsis who could benefit from corticosteroid treatment. The team developed a predictive model based on the treatment-agnostic representation network (TARNet), focusing on 28-day mortality as the primary outcome. They trained the model using the public and freely available AmsterdamUMCdb database, which included 2,920 patients who met the Sepsis-3 diagnostic criteria (1,378 in the treatment group and 1,542 in the control group). Each patient’s profile included 19 clinical variables collected within 24 hours of admission, such as lactate levels, pH, and the PaO₂/FiO₂ ratio. To ensure the model’s reliability and generalizability, the team conducted external validation using the US MIMIC-IV v2.2 database, comprising a much larger cohort of 30,639 Sepsis-3 patients. The study was published in the Journal of Intensive Medicine on Sept. 23, 2025.

The model demonstrated excellent performance: internal validation achieved an area under the receiver operating characteristic curve (AUROC) of 0.79 and a Brier score of 0.14, while external validation yielded an AUROC of 0.71 and a Brier score of 0.14. Calibration curves indicated good agreement between predicted and actual outcomes. Furthermore, TARNet achieved near-perfect covariate balance (Wasserstein distance: 3.6 × 10⁻⁷ internal, 4.2 × 10⁻⁷ external), significantly outperforming traditional propensity score matching (PSM).

Patients were then classified into three groups based on a clinically meaningful threshold of a 10% change in predicted 28-day mortality, including treatment responders (245 patients, >10% mortality decrease), non-responders (2,098 patients), and harmed individuals (577 patients, >10% mortality increase). The analysis revealed that patients with severe metabolic acidosis (characterized by low pH and low bicarbonate) and circulatory dysfunction (elevated lactate and creatinine levels) benefited the most from corticosteroid therapy. This aligns with the pathophysiological understanding of sepsis, where hemodynamic instability is a major driver of poor outcomes.

By learning a shared representation before branching into separate potential outcome predictors, TARNet calculates the ITE for each patient, accurately distinguishing between "benefit," "no benefit," and "harm" subgroups. Its superior covariate balancing capability (e.g., TARNet Wasserstein distance 3.6 × 10⁻⁷ vs. PSM 0.28 in the internal database) effectively mitigates confounding bias, providing a robust foundation for causal inference.

Another strength of the research is its rigorous validation framework. The dual-database approach using two large public datasets ensures result reliability. The "internal development + external validation" design employed the European AmsterdamUMCdb (2,920 patients) and the American MIMIC-IV (30,639 patients) databases. Notably, even though patients in the MIMIC-IV database were generally less severely ill and had lower mortality rates, the model’s performance remained stable (AUROC: 0.71 and Brier score: 0.14), demonstrating the generalizability of the findings across diverse populations and clinical settings.

The study also prioritized clinical relevance in its design. The 19 variables selected are routinely available in ICUs worldwide and overlap with those used in widely accepted severity scoring systems such as APACHE IV and SOFA. The 10% mortality change threshold used to classify treatment response reflects real-world clinical decision-making, ensuring that the findings are directly translatable to practice.

“Our study addresses a critical gap in current practice. Causal deep learning enables the estimation of individualized treatment effects, overcoming the limitations of traditional population level studies. This is the first application of the TARNet model in the context of corticosteroid therapy for sepsis,” says Ameet Jagesar.

Overall, the finding suggests the need to refine patient assessment in sepsis by integrating multidimensional, objective indicators into decision-making, thereby reducing reliance on empirical judgment and enhancing the scientific basis of therapeutic strategies.

Source: Amsterdam UMC