Offline Reinforcement Learning for Sepsis Management: A Dueling Double Deep Q-Network Approach

Endah Purwanti; Fatima Hasya Puspa Kasih; Franky Chandra Satria Arisgraha

doi:10.3991/ijoe.v22i05.60003

Authors

Endah Purwanti Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia
Fatima Hasya Puspa Kasih Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia
Franky Chandra Satria Arisgraha Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia

DOI:

https://doi.org/10.3991/ijoe.v22i05.60003

Keywords:

Sepsis, Intensive care, Offline reinforcement learning, Dueling Double Deep Q-Network, Intravenous fluids, Vasopressors, Clinical decision support

Abstract

Sepsis requires rapid and individualized management of intravenous (IV) fluids and vasopressors, yet treatment strategies vary widely in clinical practice. This study develops an offline reinforcement learning (RL) framework based on a dueling double deep Q-network (DDQN) to model dosing policies for adult sepsis patients in the intensive care unit (ICU). Following preprocessing of a large ICU cohort under Sepsis-3 criteria, patient states were represented using 48 clinical variables, and actions were defined as 25 discrete IV fluid–vasopressor combinations. Reward estimation incorporated changes in SOFA score and lactate, with terminal rewards reflecting survival outcomes. The agent was trained using KL-regularized offline RL and evaluated using weighted importance sampling (WIS), fitted Q-evaluation (FQE), and weighted doubly robust (WDR) estimators. The selected model (β_KL = 0.0005) achieved higher estimated returns than the historical clinician policy across all OPE metrics on the held-out test set (WIS 9.18 vs. 8.60; FQE 18.42 vs. 17.76; WDR 17.80 vs. 17.50). Policy distribution analysis indicated differences in treatment allocation patterns across dosing bins. Permutation feature importance (PFI) identified systolic blood pressure, arterial pH, sodium, and INR among the most influential variables. These findings support the feasibility of offline RL for modeling treatment policies in sepsis management and motivate further validation in prospective or simulation-based settings.

Offline Reinforcement Learning for Sepsis Management

A Dueling Double Deep Q-Network Approach

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Rankings

Other journals