Offline Reinforcement Learning for Sepsis Management

A Dueling Double Deep Q-Network Approach

Authors

  • Endah Purwanti Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia
  • Fatima Hasya Puspa Kasih Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia
  • Franky Chandra Satria Arisgraha Biomedical Engineering, Universitas Airlangga, Surabaya, Indonesia

DOI:

https://doi.org/10.3991/ijoe.v22i05.60003

Keywords:

Sepsis, Intensive care, Offline reinforcement learning, Dueling Double Deep Q-Network, Intravenous fluids, Vasopressors, Clinical decision support

Abstract


Sepsis requires rapid and individualized management of intravenous (IV) fluids and vasopressors, yet treatment strategies vary widely in clinical practice. This study develops an offline reinforcement learning (RL) framework based on a dueling double deep Q-network (DDQN) to model dosing policies for adult sepsis patients in the intensive care unit (ICU). Following preprocessing of a large ICU cohort under Sepsis-3 criteria, patient states were represented using 48 clinical variables, and actions were defined as 25 discrete IV fluid–vasopressor combinations. Reward estimation incorporated changes in SOFA score and lactate, with terminal rewards reflecting survival outcomes. The agent was trained using KL-regularized offline RL and evaluated using weighted importance sampling (WIS), fitted Q-evaluation (FQE), and weighted doubly robust (WDR) estimators. The selected model (β_KL = 0.0005) achieved higher estimated returns than the historical clinician policy across all OPE metrics on the held-out test set (WIS 9.18 vs. 8.60; FQE 18.42 vs. 17.76; WDR 17.80 vs. 17.50). Policy distribution analysis indicated differences in treatment allocation patterns across dosing bins. Permutation feature importance (PFI) identified systolic blood pressure, arterial pH, sodium, and INR among the most influential variables. These findings support the feasibility of offline RL for modeling treatment policies in sepsis management and motivate further validation in prospective or simulation-based settings.

Downloads

Published

2026-05-11

How to Cite

Purwanti, E., Hasya Puspa Kasih, F., & Chandra Satria Arisgraha, F. (2026). Offline Reinforcement Learning for Sepsis Management: A Dueling Double Deep Q-Network Approach. International Journal of Online and Biomedical Engineering (iJOE), 22(05), pp. 156–170. https://doi.org/10.3991/ijoe.v22i05.60003

Issue

Section

Papers