Offline Reinforcement Learning for Sepsis Management
A Dueling Double Deep Q-Network Approach
DOI:
https://doi.org/10.3991/ijoe.v22i05.60003Keywords:
Sepsis, Intensive care, Offline reinforcement learning, Dueling Double Deep Q-Network, Intravenous fluids, Vasopressors, Clinical decision supportAbstract
Sepsis requires rapid and individualized management of intravenous (IV) fluids and vasopressors, yet treatment strategies vary widely in clinical practice. This study develops an offline reinforcement learning (RL) framework based on a dueling double deep Q-network (DDQN) to model dosing policies for adult sepsis patients in the intensive care unit (ICU). Following preprocessing of a large ICU cohort under Sepsis-3 criteria, patient states were represented using 48 clinical variables, and actions were defined as 25 discrete IV fluid–vasopressor combinations. Reward estimation incorporated changes in SOFA score and lactate, with terminal rewards reflecting survival outcomes. The agent was trained using KL-regularized offline RL and evaluated using weighted importance sampling (WIS), fitted Q-evaluation (FQE), and weighted doubly robust (WDR) estimators. The selected model (β_KL = 0.0005) achieved higher estimated returns than the historical clinician policy across all OPE metrics on the held-out test set (WIS 9.18 vs. 8.60; FQE 18.42 vs. 17.76; WDR 17.80 vs. 17.50). Policy distribution analysis indicated differences in treatment allocation patterns across dosing bins. Permutation feature importance (PFI) identified systolic blood pressure, arterial pH, sodium, and INR among the most influential variables. These findings support the feasibility of offline RL for modeling treatment policies in sepsis management and motivate further validation in prospective or simulation-based settings.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Endah Purwanti, Fatima Hasya Puspa Kasih, Franky Chandra Satria Arisgraha

This work is licensed under a Creative Commons Attribution 4.0 International License.

