Reinforcement Learning for Personalised Critical Care Treatment using Scalable Parallel Computing

Authors

  • Chandra Prasetyo Utomo Universitas YARSI Indonesia
  • Kohei Ichikawa Nara Institute of Science and Technology Japan
  • Nashuha Insani Universitas YARSI Indonesia
  • Kundjanasith Thonglek Kasetsart University Thailand
  • Kang Xingyuan Nara Institute of Science and Technology Japan
  • Chaerita Maulani Universitas YARSI Indonesia
  • Ummi Azizah Rachmawati Universitas YARSI Indonesia

DOI:

https://doi.org/10.26877/asset.v8i2.2080

Keywords:

reinforcement learning, off-policy evaluation, parallel computing, MIMIC-III dataset, sepsis management, ICU decision support, personalised treatment recommendation

Abstract

Sepsis is one of the leading causes of death in intensive care units. Many patients do not receive timely or effective treatment, which lowers their chances of survival. We developed a reinforcement learning–based framework to provide personalised treatment recommendations for sepsis patients. The model creates simple patient representations from treatment responses, groups patients with similar patterns, and learns the best treatment policy for each group. To reduce long training time, we use parallel and distributed computing. Using the MIMIC-III database and off-policy evaluation with weighted importance sampling, our method achieves a policy value of 79.933, higher than the clinician policy (47.654) and a general AI policy (57.658). A higher policy value indicates a lower mortality risk. These results show that our method can support faster, more accurate, and more effective treatment decisions in the ICU.

Author Biographies

  • Chandra Prasetyo Utomo, Universitas YARSI

    Department of Informatics, Universitas YARSI, Jakarta 10510, Indonesia

  • Kohei Ichikawa, Nara Institute of Science and Technology

    Division of Information Science, Nara Institute of Science and Technology, Ikoma 630-0192, Japan

    Faculty of Business Data Science, Kansai University, Suita 565-8585, Japan

  • Nashuha Insani, Universitas YARSI

    Department of Informatics, Universitas YARSI, Jakarta 10510, Indonesia

  • Kundjanasith Thonglek, Kasetsart University

    Department of Computer Engineering, Kasetsart University, Bangkok 10900, Thailand

  • Kang Xingyuan, Nara Institute of Science and Technology

    Division of Information Science, Nara Institute of Science and Technology, Ikoma 630-0192, Japan

  • Chaerita Maulani, Universitas YARSI

    Department of Periodontics, Universitas YARSI, Jakarta 10510, Indonesia

  • Ummi Azizah Rachmawati, Universitas YARSI

    Department of Informatics, Universitas YARSI, Jakarta 10510, Indonesia

References

[1] S. Alban, “Adverse Effects of Heparin,” in Heparin - A Century of Progress, R. Lever, B. Mulloy, and C. P. Page, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 211–263. doi: https://doi.org/10.1007/978-3-642-23056-1_10.

[2] Health Quality Ontario, “Point-of-Care International Normalized Ratio (INR) Monitoring Devices for Patients on Long-term Oral Anticoagulation Therapy: An Evidence-Based Analysis.,” Ont Health Technol Assess Ser, vol. 9, no. 12, pp. 1–114, 2009, [Online]. Available: http://www.ncbi.nlm.nih.gov/pubmed/23074516

[3] S. Nemati, M. M. Ghassemi, and G. D. Clifford, “Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach,” in Proceeding of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’16), Orlando, FL, USA, Aug. 2016, pp. 2978–2981. doi: https://doi.org/10.1109/EMBC.2016.7591355

[4] M. Singer et al., “The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3),” JAMA, vol. 315, no. 8, p. 801, Feb. 2016, doi: https://doi.org/10.1001/jama.2016.0287.

[5] A. Kumar et al., “Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock*,” Crit Care Med, vol. 34, no. 6, pp. 1589–1596, Jun. 2006, doi: https://doi.org/10.1097/01.CCM.0000217961.75225.E9.

[6] A. Shirali, A. Schubert, and A. Alaa, “Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care,” IEEE J Biomed Health Inform, vol. 20, no. 10, pp. 6268 – 6279, 2024, doi: https://doi.org/10.1109/JBHI.2024.3415115.

[7] B. Zhang, X. Qiu, and X. Tan, “Balancing therapeutic effect and safety in ventilator parameter recommendation: An offline reinforcement learning approach,” Eng Appl Artif Intell, vol. 131, p. 107784, May 2024, doi: https://doi.org/10.1016/J.ENGAPPAI.2023.107784.

[8] J. Liu et al., “Value function assessment to different RL algorithms for heparin treatment policy of patients with sepsis in ICU,” Artif Intell Med, vol. 147, p. 102726, Jan. 2024, doi: https://doi.org/10.1016/J.ARTMED.2023.102726.

[9] M. S. Sheikh et al., “Personalized Medicine Transformed: ChatGPT’s Contribution to Continuous Renal Replacement Therapy Alarm Management in Intensive Care Units,” J Pers Med, vol. 14, no. 3, p. 233, Mar. 2024, doi: https://doi.org/10.3390/JPM14030233/S1.

[10] L. R. Zwerwer et al., “The value of artificial intelligence for the treatment of mechanically ventilated intensive care unit patients: An early health technology assessment,” J Crit Care, vol. 82, p. 154802, Aug. 2024, doi: https://doi.org/10.1016/J.JCRC.2024.154802.

[11] D. O’Reilly, J. McGrath, and I. Martin-Loeches, “Optimizing artificial intelligence in sepsis management: Opportunities in the present and looking closely to the future,” Journal of Intensive Medicine, vol. 4, no. 1, pp. 34–45, Jan. 2024, doi: https://doi.org/10.1016/j.jointm.2023.10.001.

[12] M. Komorowski, L. A. Celi, O. Badawi, A. C. Gordon, and A. A. Faisal, “The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care,” Nat Med, vol. 24, no. 11, pp. 1716–1720, 2018, doi: https://doi.org/10.1038/s41591-018-0213-5.

[13] A. Raghu, M. Komorowski, L. A. Celi, P. Szolovits, and M. Ghassemi, “Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach,” in Proceedings of the 2nd Machine Learning for Healthcare Conference, F. Doshi-Velez, J. Fackler, D. Kale, R. Ranganath, B. Wallace, and J. Wiens, Eds., in Proceedings of Machine Learning Research, vol. 68. PMLR, Oct. 2017, pp. 147–163. [Online]. Available: https://proceedings.mlr.press/v68/raghu17a.html

[14] L. Wang, X. He, W. Zhang, and H. Zha, “Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2447–2456, 2018, doi: https://doi.org/10.1145/3219819.3219961.

[15] C. Yu, J. Liu, and H. Zhao, “Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units,” BMC Med Inform Decis Mak, vol. 19, Apr. 2019, doi: https://doi.org/10.1186/s12911-019-0763-6.

[16] H. Zheng, I. O. Ryzhov, W. Xie, and J. Zhong, “Personalized Multimorbidity Management for Patients with Type 2 Diabetes Using Reinforcement Learning of Electronic Health Records,” Drugs, vol. 81, no. 4, pp. 471–482, Mar. 2021, doi: https://doi.org/10.1007/s40265-020-01435-4.

[17] M. Liu, X. Shen, and W. Pan, “Deep reinforcement learning for personalized treatment recommendation,” Stat Med, vol. 41, no. 20, pp. 4034–4056, Sep. 2022, doi: https://doi.org/10.1002/sim.9491.

[18] M. Nambiar, S. Ghosh, P. Ong, Y. E. Chan, Y. M. Bee, and P. Krishnaswamy, “Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2023, pp. 4673–4684. doi: https://doi.org/10.1145/3580305.3599800.

[19] S. Ma, J. Lee, N. Serban, and S. Yang, “Deep Attention Q-Network for Personalized Treatment Recommendation,” in 2023 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, Dec. 2023, pp. 329–337. doi: https://doi.org/10.1109/ICDMW60847.2023.00048.

[20] T. Lin et al., “A dosing strategy model of deep deterministic policy gradient algorithm for sepsis patients,” BMC Med Inform Decis Mak, vol. 23, no. 1, Dec. 2023, doi: 10.1186/s12911-023-02175-7.

[21] D. van de Sande, M. E. van Genderen, J. Huiskens, D. Gommers, and J. van Bommel, “Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit,” Intensive Care Med, vol. 47, no. 7, pp. 750–760, Jul. 2021, doi: https://doi.org/10.1007/s00134-021-06446-7.

[22] M. R. Pinsky et al., “Use of artificial intelligence in critical care: opportunities and obstacles,” Crit Care, vol. 28, no. 1, p. 113, Apr. 2024, doi: https://doi.org/10.1186/s13054-024-04860-z.

[23] P. Moritz et al., “Ray: a distributed framework for emerging AI applications,” in Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, in OSDI’18. USA: USENIX Association, 2018, pp. 561–577.

[24] Z. Liu, X. Xu, P. Qiao, and D. Li, “Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey,” ACM Comput Surv, vol. 57, no. 4, pp. 1–35, Apr. 2025, doi: https://doi.org/10.1145/3703453.

[25] F. Al-Turjman, “AI-powered cloud for COVID-19 and other infectious disease diagnosis,” Pers. Ubiquitous Comput., vol. 27, no. 3, pp. 661–664, 2023.

[26] S. Aminizadeh et al., “The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things,” Comput Methods Programs Biomed, vol. 241, p. 107745, Feb. 2023, doi: https://doi.org/10.1016/j.cmpb.2023.107745.

[27] Z. Xue et al., “A Resource-Constrained and Privacy-Preserving Edge-Computing-Enabled Clinical Decision System: A Federated Reinforcement Learning Approach,” IEEE Internet Things J, vol. 8, no. 11, pp. 9122–9138, Jun. 2021, doi: https://doi.org/10.1109/JIOT.2021.3057653.

[28] A. E. W. Johnson et al., “MIMIC-III, a freely accessible critical care database,” Sci Data, vol. 3, no. 1, pp. 1–9, 2016.

[29] M. Pinsky, A. Dubrawski, and G. Clermont, “Intelligent Clinical Decision Support,” Sensors, vol. 22, no. 4, p. 1408, Feb. 2022, doi: https://doi.org/10.3390/s22041408.

[30] S. Helman et al., “Engaging Multidisciplinary Clinical Users in the Design of an Artificial Intelligence–Powered Graphical User Interface for Intensive Care Unit Instability Decision Support,” Appl Clin Inform, vol. 14, no. 04, pp. 789–802, Aug. 2023, doi: https://doi.org/10.1055/s-0043-1775565.

Downloads

Published

2026-04-30