Optimization of Medical Units for Mobile Shelter Hospitals via Graphical Games and Reinforcement Learning Approach

Ertai E; Jinyuan Liu; Haoran Xu; Xiaoshuai Hao; Jinglin He; Daomeng Cai; Jinhui Pang

Optimization of Medical Units for Mobile Shelter Hospitals via Graphical Games and Reinforcement Learning Approach
通过图形游戏和强化学习方法优化移动救护站的医疗单位配置

Ertai E, Jinyuan Liu, Haoran Xu, Xiaoshuai Hao, Jinglin He, Daomeng Cai, Jinhui Pang
尔泰E，刘金源，徐浩然，郝晓帅，何静林，蔡道萌，庞金辉

31 Jan 2024 (modified: 15 Feb 2024)ICML 2024 Conference SubmissionConference, Senior Area Chairs, Area Chairs, Reviewers, Authors
2024年1月31日（修改日期：2024年2月15日）ICML 2024会议提交会议，高级领域主席，领域主席，审稿人，作者RevisionsCC BY 4.0
修订 CC BY 4.0

Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
验证作者列表：我已经仔细核对了作者列表，并了解到在提交截止日期之后将不允许添加和删除。

Keywords: Mobile Shelter Hospitals, Multi-armed Bandit Algorithm, Simulated Virtual Environment, Graph Models, Nash Equilibrium, Hospital Management
关键词：移动避难所医院，多臂赌博算法，模拟虚拟环境，图模型，纳什均衡，医院管理

TL;DR: This paper presents an innovative approach to optimizing resource allocation in mobile shelter hospitals, employing graph models, game theory, and reinforcement learning, significantly reducing treatment times and improving hospital efficiency.
本文介绍了一种在移动救护医院中优化资源分配的创新方法，采用了图模型、博弈论和强化学习，显著缩短了治疗时间，提高了医院效率。

Abstract: 摘要：

This paper models and solves the medical unit allocation problem of mobile shelter hospitals, involving randomness, network equilibrium, and layout planning. To simulate this medical activity, a simulation environment with a directed graph, random patient arrival time, and routes, is proposed. The total service time of this hospital is considered to be minimized while patients follow the user-optimizing principle to obtain earlier services. We show that a Nash equilibrium exists in patient strategies with the overall goal of minimizing hospital service time. The multi-armed bandit (MAB) algorithm from reinforcement learning is used to identify optimal medical unit allocation strategies for both adaptive and non-adaptive patient scenarios. The study presents an allocation strategy for scenarios with 3000 patients, 9 resources, and 7 hospital areas, achieving a total treatment time of 54,190, an average room utilization rate of approximately 0.8, and an average number of lingering patients of less than 1% of the total number of patients.
本文对移动救护医院的医疗单元分配问题进行了建模和解决，涉及随机性、网络均衡和布局规划。为了模拟这种医疗活动，提出了一个具有有向图、随机患者到达时间和路线的仿真环境。考虑到最小化该医院的总服务时间，同时患者遵循用户优化原则以获得更早的服务。我们证明了在最小化医院服务时间的总体目标下，患者策略存在纳什均衡。采用强化学习中的多臂赌博机（MAB）算法来确定自适应和非自适应患者情景下的最优医疗单元分配策略。研究提出了一个包含3000名患者、9个资源和7个医院区域的情景的分配策略，实现了总治疗时间为54,190，平均房间利用率约为0.8，平均逗留患者数量不到总患者数量的1%。

Primary Area: Applications (computational biology, crowdsourcing, healthcare, neuroscience, social good, climate science, etc.)
主要领域：应用（计算生物学、众包、医疗保健、神经科学、社会公益、气候科学等）

Position Paper Track: No 职位文件跟踪：无

Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
论文清单指南：我证明本作品的所有合著者已阅读并承诺遵守论文清单指南、征稿通知和出版伦理要求。

Submission Number: 3989 提交编号：3989

Add: 添加

Official Review of Submission3989 by Reviewer r7yk
官方对提交3989的评审进行了审核

Official ReviewReviewer r7yk20 Mar 2024, 13:19 (modified: 21 Mar 2024, 20:20)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors, Reviewer r7ykRevisions
官方审查评论者r7yk20 2024年3月20日，13:19（修改：2024年3月21日，20:20）程序主席，高级领域主席，领域主席，提交的评论者，作者，评论者r7yk修订版

Summary: 摘要：

This paper aims to optimize placement/allocation of mobile shelter hospitals in anticipation of demand for example, during mass emergencies. It introduces a network model, graphical games, studies nash equilibrium of the game, and uses multi-armed bandit learning algorithms to find it.
本文旨在优化移动避难医院的位置/分配，以应对例如大规模紧急情况下的需求。它引入了一个网络模型，图形游戏，研究了游戏的纳什均衡，并使用多臂赌博学习算法来找到它。

Strengths And Weaknesses:
优点和缺点

(+) Optimal placement/allocation problem of mobile hospitals is an important problem. It is not too different from the ambulance location problem which is very well studied in the OR literature.
移动医院的最佳位置/分配问题是一个重要的问题。它与救护车位置问题并没有太大的区别，在运筹学文献中已经得到了很好的研究。

(-) There are a number of problems with this paper: The first is modeling - it is unrealistic and doesn't really make much sense. A lot of stuff is thrown around - queues, graphs, games; Nash equilbria, multi-armed bandit eq. And it is hard to understand the relevance in the mish-mash.
这篇论文存在一些问题：首先是建模，它不切实际，也没有太多意义。很多东西被扔进来 - 队列、图表、游戏；纳什均衡、多臂赌博机等等。很难理解这些混乱中的相关性。

(-) There are some experimental results: One would expect at least some aspect of it to come from a real world setting/data. But it is all synthetic.
有一些实验结果：人们期望至少有一些方面来自真实世界的环境/数据。但是这全部都是合成的。

Overall, a poor paper that should have received a desk reject instead of wasting reviewer time.
总体而言，这是一篇糟糕的论文，本应该被直接拒稿，而不是浪费审稿人的时间。

Questions: 问题：

Please clarify why strategic aspect is being considered, why Nash eq. is a suitable notion to predict outcome and how Algorithm 1 is relevant.
请澄清为什么要考虑战略方面，为什么纳什均衡是一个合适的概念来预测结果，以及算法1的相关性。

Limitations: 限制：

N/A

Ethics Flag: No 道德标志：否

Details Of Ethics Concerns: N/A
道德问题的细节：无适用内容

Soundness: 1: poor 健康度：1：差劲

Presentation: 1: poor 演示：1：差劲

Contribution: 1: poor 贡献：1：贫穷

Rating: 2: Strong Reject: For instance, a paper with major technical flaws, and/or poor evaluation, limited impact, poor reproducibility and mostly unaddressed ethical considerations.
评级：2：强烈拒绝：例如，一篇存在重大技术缺陷和/或评估不足、影响有限、可重复性差以及大部分未解决伦理考虑的论文。

Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.
自信度：5：您对您的评估非常确定。您对相关工作非常熟悉，并仔细检查了数学/其他细节。

Code Of Conduct: Yes 行为准则：是的

Add: 添加

Official Review of Submission3989 by Reviewer EZTk
官方对提交3989的评审进行了审查

Official ReviewReviewer EZTk19 Mar 2024, 11:40 (modified: 21 Mar 2024, 20:20)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors, Reviewer EZTkRevisions
官方审查评论者EZTk19 2024年3月19日，11:40（修改：2024年3月21日，20:20）程序主席，高级领域主席，领域主席，评论者提交，作者，评论者EZTk修订

Summary: 摘要：

This paper presents an interesting approach to optimizing the layout and allocation of medical resources in mobile shelter hospitals during emergency situations. The authors model the hospital network as a directed hypergraph and formulate the optimization problem of minimizing total service time while accounting for patient flows, queues, priorities, and randomness in arrivals and routes.
本文提出了一种有趣的方法，用于在紧急情况下优化移动救护医院的布局和医疗资源分配。作者将医院网络建模为有向超图，并制定了最小化总服务时间的优化问题，同时考虑了患者流动、队列、优先级以及到达和路线的随机性。

Strengths And Weaknesses:
优点和缺点

First of all, there is a pure application paper with no new methodology or theory. The problem is well-motivated and the introduction of the real world application is clear.
首先，这是一篇纯粹的应用论文，没有新的方法或理论。问题的动机很好，对真实世界应用的介绍很清晰。

All the experiments are synthetic. Medical unit allocation is an important problem but without using any real data, it is very hard to know if the simulator is realistic. I do not think we can use this simple simulator to model the real world situation with patient-hospital interaction. Then the claim of utilization rate / treating time / x% of improvement is meaningless since we can easily tune the simulator parameter. As this is a pure applied paper, I think working with real data is required.
所有的实验都是合成的。医疗单位分配是一个重要的问题，但是如果没有使用任何真实数据，很难知道模拟器是否真实可靠。我认为我们不能使用这个简单的模拟器来模拟真实世界中的患者-医院互动情况。因此，利用率/治疗时间/改善的x%的声明是没有意义的，因为我们可以轻松调整模拟器参数。由于这是一篇纯应用的论文，我认为需要使用真实数据。
I do not see the necessary of using multi-armed bandit approach. Will the goal be to solve the optimization problem (1)? Is that a pure optimization problem? Bandits try to solve exploration problem. Why in this application we need exploration? To explore what? In bandits, there are two objectives: one is cumulative regret minimization and second one is best-arm identification. It is better to clarify which formulation you are using. Different problem formulation have different metrics to use.
我不认为使用多臂赌博机方法是必要的。目标是解决优化问题（1）吗？这是一个纯粹的优化问题吗？赌博机试图解决探索问题。为什么在这个应用中我们需要探索？探索什么？在赌博机中，有两个目标：一个是累积遗憾最小化，另一个是最佳臂识别。最好澄清你使用的是哪种公式。不同的问题公式有不同的度量指标。
I do not think this paper has anything to do with reinforcement learning. There is no state concept at all. Bandits do not belong to RL.
我不认为这篇论文与强化学习有任何关系。根本没有状态的概念。赌徒问题不属于强化学习。

Questions: 问题：

see above 请参见上文

Limitations: 限制：

see above 请参见上文

Ethics Flag: No 道德标志：否

Soundness: 3: good 健康：3：好

Presentation: 3: good 演示：3：好

Contribution: 3: good 贡献：3：好

Rating: 5: Borderline accept: Technically solid paper where reasons to accept outweigh reasons to reject, e.g., limited evaluation. Please use sparingly.
评分：5：接近接受：在技术上扎实的论文，接受的理由超过拒绝的理由，例如，评估有限。请谨慎使用。

Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.
信心：3：您对自己的评估相当有信心。可能是因为您没有理解提交的某些部分，或者对一些相关工作的某些部分不熟悉。数学/其他细节没有仔细检查。

Code Of Conduct: Yes 行为准则：是的

Add: 添加

Official Review of Submission3989 by Reviewer rAVT
官方对评审员rAVT提交的Submission3989进行审查

Official ReviewReviewer rAVT16 Mar 2024, 15:53 (modified: 21 Mar 2024, 20:20)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors, Reviewer rAVTRevisions
官方审查评论员rAVT16 2024年3月16日，15:53（修改：2024年3月21日，20:20）程序主席，高级领域主席，领域主席，提交的评论员，作者，评论员rAVT修订版

Summary: 摘要：

This paper proposes an approach to optimizing medical unit allocation in mobile shelter hospitals by integrating graphical game theory and reinforcement learning, specifically through the use of Nash equilibrium and the Multi-Armed Bandit (MAB) algorithm. It models the patient flow within a hospital as a directed graph, considering randomness in patient arrivals and movements, to improve operational metrics such as total treatment time, room utilization rate, and the reduction of lingering patient numbers. The approach is validated via a simulation involving scenarios with 3,000 patients, demonstrating its potential to enhance the efficiency and responsiveness of mobile shelter hospitals in complex medical scenarios.
本文提出了一种在移动救护医院中通过整合图形博弈论和强化学习来优化医疗单元分配的方法，具体包括使用纳什均衡和多臂赌博机（MAB）算法。它将医院内的患者流动建模为有向图，考虑患者到达和移动的随机性，以改善总治疗时间、房间利用率和减少滞留患者数量等运营指标。通过涉及3,000名患者的模拟验证了该方法的潜力，展示了它在复杂医疗场景中提高移动救护医院效率和响应能力的能力。

Strengths And Weaknesses:
优点和缺点

Strengths: The study addresses a critical and timely issue, offering the potential for a significant impact on emergency medical response. The findings could be beneficial for the planning and operation of mobile shelter hospitals, especially in disaster response scenarios. The use of Nash equilibrium to model patient strategies in seeking medical services is novel in the context of mobile shelter hospitals.
优势：该研究涉及一个关键且及时的问题，有可能对紧急医疗救援产生重大影响。研究结果对移动救护医院的规划和运营，特别是在灾难响应场景中，可能会有益处。在移动救护医院的背景下，使用纳什均衡模型患者寻求医疗服务的策略是新颖的。

Weaknesses: The approach builds upon existing theories and algorithms. The uniqueness lies more in the application than in the development of new methodologies.
弱点：该方法建立在现有的理论和算法基础上。其独特之处更多地体现在应用方面，而不是开发新方法的方面。

Some parts of the optimization objective and the implementation details could be elaborated further for clarity.
一些优化目标和实施细节可以进一步详细阐述以提高清晰度。

The connection between the proposed method and the resolution of the defined optimization problem and game structure is not convincingly established. It remains unclear how the method directly addresses the optimization of medical unit allocation in the context of the game theoretical model presented and how it contributes to finding an equilibrium in patient strategies.
所提出的方法与定义的优化问题和博弈结构之间的联系尚未令人信服地建立起来。目前还不清楚该方法如何直接解决在所提出的博弈理论模型下的医疗单位分配优化问题，并且它如何有助于找到患者策略的均衡。

More explicit details about the problem space, including constraints and assumptions made in the model are not stated explicitly.
问题空间的更明确细节，包括在模型中所做的约束和假设，并没有明确说明。

Questions: 问题：

What is t_v in the second row of Eq.(1)? Is it the servicing time t_v^s? And what is this objective maximizing over? Is it the servicing time of a particular clinic room v, the minimum across all clinic rooms, or the sum of all?
在方程（1）的第二行中，t_v是什么？它是服务时间t_v^s吗？这个目标是在什么上最大化？是特定诊所房间v的服务时间，所有诊所房间的最小值，还是所有诊所房间的总和？

In lines 256 and 257, why is [O, X_6, X_1, X_3, D] not an option? Should it be X_1 instead of X_2 in line 253?
在第256行和第257行，为什么 [O, X_6, X_1, X_3, D] 不是一个选项？在第253行，应该是 X_1 而不是 X_2 吗？

How to define the ascending order of all patients? Does it depend on both arriving time and priority? Do these need to be known in advance? Would a patient arrive later but with higher priority break the equilibrium? And how to obtain the order of all patient if their arrival times are subject to a randomized distribution?
如何定义所有患者的升序排列？这取决于到达时间和优先级吗？这些需要事先知道吗？如果一个患者到达时间较晚但优先级较高，会打破平衡吗？如果患者的到达时间服从随机分布，如何获得所有患者的顺序？

What are the potential challenges in implementing this model in a live hospital setting, and how might they be addressed?
在实施这个模型的医院现场环境中可能面临哪些潜在挑战，以及如何解决这些挑战？

Limitations: 限制：

While the paper acknowledges the simulation-based nature of its findings, a discussion on the limitations regarding real-world applicability and potential negative societal impacts is somewhat lacking. The paper might benefit from additional validation, such as real-world testing or comparison with other optimization techniques.
尽管该论文承认其研究结果基于模拟，但对于实际应用的局限性和潜在的负面社会影响的讨论还有些不足。该论文可能会受益于额外的验证，例如实际测试或与其他优化技术的比较。

Ethics Flag: No 道德标志：否

Soundness: 2: fair 健全性：2：公平

Presentation: 2: fair 展示：2：公平

Contribution: 2: fair 贡献：2：公平

Rating: 3: Reject: For instance, a paper with technical flaws, weak evaluation, inadequate reproducibility and incompletely addressed ethical considerations.
评分：3：拒绝：例如，一篇存在技术缺陷、评估不足、无法复现以及未完全解决伦理考虑的论文。

Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.
自信度：4：你对自己的评估有信心，但并非绝对确定。很不可能，但也不是完全不可能，你可能没有理解提交内容的某些部分，或者对一些相关工作不熟悉。

Code Of Conduct: Yes 行为准则：是的

Add: 添加