SafeRPlan: 用于椎弓根螺钉置入术中规划的安全深度强化学习|文献速递-生成式模型与transformer在医学影像中的应用

Title

题目

SafeRPlan: Safe deep reinforcement learning for intraoperative planning of pedicle screw placement

SafeRPlan: 用于椎弓根螺钉置入术中规划的安全深度强化学习

文献速递介绍

椎弓根螺钉是脊柱手术中不可或缺的器械组件，作为机械锚点用于在融合手术中实现初步稳定。由于手术过程中视野的限制，安全的椎弓根螺钉植入（PSP）仍然是一项挑战，尤其是在靠近重要结构（例如脊髓和主动脉）时，传统手术方式更为困难。因此，已经开发了各种计算机辅助和机器人辅助技术，以提高PSP的准确性（Lieberman等，2006, 2020；Farber等，2021；D’Souza等，2019b）。这些系统还促进了最小创伤手术（MIS），无需打开脊柱即可进行椎弓根螺钉植入。一般程序包括在术前获得的体积影像（例如CT扫描）上进行螺钉轨迹的术前规划，术中进行术前计划与术中解剖之间的配准，并通过机器人末端执行器的定位提供实时指导。

术中CT已经被广泛使用并在先前的研究中进行了深入研究，作为实现病人配准的术中手段（Kim等，2014；Du等，2018；Yanni等，2021；Baldwin等，2022；Conrads等，2023）。然而，除了后勤和经济方面的顾虑（D’Souza等，2019a），术中CT的一个主要缺点是额外的辐射暴露，既对病人也对手术团队（Costa等，2011；Mendelsohn等，2016）。此外，前述的大多数规划方法属于基于配准的“一次性规划”方法，因为这些方法的规划结果仅包含已植入螺钉的最终姿态。然而，配准过程本身仍然容易出错，可能导致不安全的规划（Zhang等，2019；Scherer等，2023）。此外，在最小创伤手术中，复杂的场景不仅需要对螺钉的最佳位置进行精确引导，还需要对整个手术工具的轨迹进行精确引导，从其初始位置到目标解剖结构的插入点。

最小创伤路径规划已在颅脑手术和柔性针穿刺中得到了广泛研究（Hamzé等，2016a,b；Li等，2017；Tan等，2020）。然而，在脊柱手术方面，只有有限的研究涉及到这方面的工作（Zhang等，2018），尤其是没有文献讨论与椎弓根螺钉植入相关的路径规划方法。因此，我们的研究旨在探讨椎弓根螺钉植入的安全连续路径规划。与使用CT作为术中影像模态不同，我们专注于基于超声（US）影像进行规划，这在机器人脊柱手术中逐渐受到重视（Li等，2023a；Barratt等，2008；Nagpal等，2014；Ma等，2017；Liu等，2022；Tu等，2022）。这种选择的原因在于超声的优势，包括无辐射、实时反馈，并有助于基于影像反馈进行规划。

尽管如此，由于诸如视野小、穿透深度浅、高维度和传感器噪声等固有挑战，使用超声影像进行顺序决策仍然困难。然而，最近的机器人研究表明深度强化学习（DRL）在解决一些挑战方面表现出了有效性（Lee等，2020；Miki等，2022；Kalashnikov等，2018；Scheikl等，2022）。然而，到目前为止，DRL在术中临床环境中的应用仍然有限，主要是由于安全性问题，学习到的策略仍可能采取高风险的行动（Pore等，2021；Lee等，2023）。现有的DRL方法通过奖励塑形（Pore等，2021）或使用来自人类专家的离线示范（Thananjeyan等，2021, 2020）来解决安全性问题。但奖励塑形仍缺乏安全性保证，而专家示范在我们的环境中并不总是可用。

在本研究中，我们提出了SafeRPlan，这是一种安全的DRL流程，利用从超声影像中部分重建的脊柱表面，进行椎弓根螺钉植入手术的连续术中规划（图1）。我们提出了一种新颖的方法，利用术前数据对DRL代理进行预训练。这与经典方法（Nagpal等，2014；Winter等，2008；Barratt等，2008）不同，后者仅使用术前数据进行刚性配准（图2）。为了实现自动化和持续的最小创伤路径规划，我们提出了一种安全的DRL代理，通过结合基于距离的安全过滤器（DSF），使其能够安全地进行行动，灵感来源于Selim等（2022），Shao等（2021），Alshiekh等（2018）和As等（2022）。虽然传统系统仅根据工具和目标的位置进行手术规划，但我们的方法可以基于部分和噪声观察（例如重建的点云）更新规划，这使得它适用于像超声这样的新兴术中影像模态。此外，我们还利用了领域随机化和师生学习技术，来提高我们方法在术前影像和术中模态之间特征的领域差异中的泛化能力。我们已经通过高保真度的人体模型数据集以及合成和真实的超声重建数据集验证了我们的方法，结果表明该方法在实际机器人手术系统中的潜力。

Aastract

摘要

Spinal fusion surgery requires highly accurate implantation of pedicle screw implants, which must be conductedin critical proximity to vital structures with a limited view of the anatomy. Robotic surgery systems havebeen proposed to improve placement accuracy. Despite remarkable advances, current robotic systems still lackadvanced mechanisms for continuous updating of surgical plans during procedures, which hinders attaininghigher levels of robotic autonomy. These systems adhere to conventional rigid registration concepts, relyingon the alignment of preoperative planning to the intraoperative anatomy. In this paper, we propose a safedeep reinforcement learning (DRL) planning approach (SafeRPlan) for robotic spine surgery that leveragesintraoperative observation for continuous path planning of pedicle screw placement. The main contributionsof our method are (1) the capability to ensure safe actions by introducing an uncertainty-aware distance-basedsafety filter; (2) the ability to compensate for incomplete intraoperative anatomical information, by encodinga-priori knowledge of anatomical structures with neural networks pre-trained on pre-operative images; and(3) the capability to generalize over unseen observation noise thanks to the novel domain randomizationtechniques. Planning quality was assessed by quantitative comparison with the baseline approaches, goldstandard (GS) and qualitative evaluation by expert surgeons. In experiments with human model datasets, ourapproach was capable of achieving over 5% higher safety rates compared to baseline approaches, even underrealistic observation noise. To the best of our knowledge, SafeRPlan is the first safety-aware DRL planningapproach specifically designed for robotic spine surgery

脊柱融合手术需要高精度的椎弓根螺钉植入，且必须在靠近重要结构的关键区域进行，且视野有限。为了提高植入精度，已经提出了机器人手术系统。尽管取得了显著进展，但现有的机器人系统仍然缺乏在手术过程中持续更新手术计划的先进机制，这限制了机器人自主性的提升。这些系统依赖于传统的刚性配准概念，依赖于将术前规划与术中解剖结构对齐。本文提出了一种安全深度强化学习（DRL）规划方法（SafeRPlan），用于机器人脊柱手术，通过利用术中观察进行椎弓根螺钉植入的持续路径规划。我们方法的主要贡献有：（1）通过引入基于距离的安全过滤器，结合不确定性感知，确保安全行动；（2）通过利用神经网络对术前影像进行预训练，编码术前解剖结构的先验知识，补偿术中解剖信息的不完整性；（3）通过创新的领域随机化技术，使得系统能够适应未见过的观察噪声。通过与基准方法（如金标准，GS）的定量比较以及专家外科医生的定性评价，评估了规划质量。在使用人体模型数据集进行的实验中，我们的方法在实际观察噪声下，安全率较基准方法提高了超过5%。据我们所知，SafeRPlan是首个专为机器人脊柱手术设计的安全感知DRL规划方法。

Method

方法

In this work,wepresentSafeRPlan,aframeworkfor automatic,continuous and safe intraoperative planning of PSP procedures basedon intraoperative surface reconstruction from US imaging, which is tobe integrated into a robotic system for ultrasonic-based PSP drilling.Instead of performing rigid registration between preoperative and intraoperative data, we use preoperative data to pre-train a DRL agentfor safe intraoperative planning. Section 3.2 first describes the clinicalapplication of the envisioned robotic system and defines the task tobe solved by SafeRPlan. The training simulation is described in Section 3.3, in which we generate randomized synthetic US reconstruction(RsUS) data to condition the agent for the intraoperative applicationwhere only the partial reconstruction of the spine surface is available.Then we model the task as a state-wise constrained Markov decisionprocess (SCMDP), which is detailed in Section 3.4. Section 3.5 introduced the safe DRL agent with a distance-based safety filter (DSF),as well as the teacher–student (TS) learning techniqueleveraged toimprove the generalizability of the agent.

在本研究中，我们提出了SafeRPlan，这是一个基于术中超声影像的表面重建的自动化、连续性和安全性椎弓根螺钉植入（PSP）手术规划框架，旨在集成到基于超声的机器人系统中进行PSP钻孔操作。与传统的术前和术中数据之间进行刚性配准的方法不同，我们使用术前数据对深度强化学习（DRL）代理进行预训练，以实现安全的术中规划。第3.2节首先描述了设想中的机器人系统的临床应用，并定义了SafeRPlan需要解决的任务。第3.3节描述了训练模拟过程，我们生成随机合成的超声重建（RsUS）数据，以使代理适应术中应用场景，其中只有部分脊柱表面的重建数据可用。接着，我们将任务建模为状态约束马尔科夫决策过程（SCMDP），该过程在第3.4节中详细阐述。第3.5节介绍了具有基于距离的安全过滤器（DSF）的安全DRL代理，以及利用师生学习（TS）技术提高代理泛化能力的方法。

Conclusion

结论

In this study, we have introduced SafeRPlan, the first safe DRLagent for intraoperative planning of orthopedic surgery of the spine.A particular distance-based safety filter was designed, and syntheticsurface reconstruction method was developed to train the DRL agentfor intraoperative planning. Validation based on high-fidelity humanmodel datasets and corresponding synthetic and real intraoperative USreconstruction datasets demonstrates the potential of our approachesfor generating more acceptable intraoperative plans compared withrigid registration techniques. By addressing the challenges of partialobservation, domain shift and safety, our method could provide thebasis for intraoperative decision-making and robotic surgery with ahigher level of automation.

在本研究中，我们提出了SafeRPlan，首个用于脊柱骨科手术术中规划的安全深度强化学习（DRL）智能体。我们设计了一个基于距离的安全过滤器，并开发了合成表面重建方法，用于训练DRL智能体进行术中规划。基于高保真度人类模型数据集及相应的合成和真实术中超声（US）重建数据集的验证，展示了我们方法相较于刚性配准技术生成更可接受术中规划的潜力。通过应对部分观测、领域偏移和安全性等挑战，我们的方法为术中决策支持和机器人手术提供了更高自动化水平的基础。

Results

结果

5.1. Validation with synthetic intraoperative US reconstruction

Training performance We use RTX 3090 to train more than 5e6steps for each policy with 100 fps, the total time is around 10-15 hfor each training. Our episode lengths are set to 256 steps, althoughin most cases, the insertions are completed within 150 steps. Trainingcurves for each training experiment are shown in Fig. 9.a-e showingthe total reward gained by the agents during its training phase. It isshown that PPO and A2C have similar convergence trends on average;however, the PPO was shown to be generally more stable than A2C,especially for the Jeduk dataset. Then it can be seen that TS learningimproved both the learning speed and the final training performanceof the policies trained with RsUS. For some datasets, student policiesachieved even similar performance as teacher policies, although theywere trained with more variances for states.Validation results The comparison of insertion ratio (IR) and safetyrates (SR) between different methods is detailed in Table 3. Our integrated approach achieved 98%–100% safety rates with more than0.9 insertion ratio. Comparing DRL-based methods, it can be seen thatusing SF helped in improving the SR at the cost of reducing IR. Besides,combining SF, DR and TS improved the final validation performance bya large margin compared with the original DRL approach. Comparedto the registration approaches, our approach achieved higher SR withslightly lower IR. This is because our agent was trained with safety asthe primary goal. On the contrary, registration-based approaches aimat reducing registration error, which is not necessarily an indication ofsafety. The categorized averaged safety performances of the trajectoriesare detailed in Table 4. Our approach also had the highest safety in particular unsafe scenarios (BR, RR and GR). As is also shown in Table 4,RL-based approaches have greater deviation (DE and TD) from GTtrajectories than registration methods because acceptable placementscan be achieved even without alignment with GS trajectories. Besides,for some vertebrae, the pedicle region is narrow in the Y axis butwide in the Z axis of the world frame, which gives higher freedomfor acceptable insertion. In summary, our approach enables a moredesirable balance between stability of placement and safety comparedto other methods, as is depicted in Fig. 11.

5.1. 使用合成术中超声重建的验证

训练性能我们使用RTX 3090显卡进行训练，每个策略训练超过5e6步，帧率为100 fps，总训练时间约为10-15小时。每个回合的长度设置为256步，尽管在大多数情况下，插入过程会在150步内完成。每个训练实验的训练曲线显示在图9.a-e中，展示了代理在训练阶段获得的总奖励。结果表明，PPO和A2C在收敛趋势上类似；然而，PPO显示出比A2C更稳定的表现，特别是在Jeduk数据集上。随后可以看到，教师-学生（TS）学习提升了基于RsUS训练的策略的学习速度和最终性能。对于某些数据集，学生策略甚至与教师策略的性能相当，尽管它们在状态上训练时有更多的变异。验证结果插入比例（IR）和安全率（SR）之间的比较见表3。我们的集成方法达到了98%-100%的安全率，并且插入比例超过0.9。比较基于深度强化学习（DRL）的方法，可以看到使用安全过滤（SF）有助于提高安全率，但代价是插入比例有所下降。此外，将安全过滤（SF）、深度强化学习（DR）、教师-学生学习（TS）结合起来，与原始DRL方法相比，大幅提升了最终验证性能。与注册方法相比，我们的方法在安全率上取得了更高的成绩，尽管插入比例略低。这是因为我们的代理以安全性为主要目标进行训练，而注册方法则侧重于减少注册误差，这并不一定能保证安全。表4详细列出了不同轨迹的安全性分类平均性能。我们的方案在特定的不安全场景（如BR、RR和GR）中也表现出了最高的安全性。如表4所示，基于强化学习的方法在GT轨迹的偏差（DE和TD）上比注册方法更大，因为即使没有与GS轨迹对齐，仍然可以实现可接受的插入。此外，对于某些脊椎，椎弓根区域在Y轴方向上较窄，但在世界坐标系的Z轴方向上较宽，这为可接受的插入提供了更大的自由度。总的来说，我们的方法相比其他方法，在插入稳定性和安全性之间实现了更理想的平衡，如图11所示。

Figure

图

Fig. 1. The overview of our proposed SafeRPlan framework. During preoperative training, we leveraged segmented preoperative MRI data to construct a training simulationenvironment with randomized synthetic ultrasound reconstruction (RsUS) to train the safe reinforcement learning (RL) agent. During the surgery, we assume a dorsal surfacereconstruction from the robotic US (Li et al., 2023b) is first performed before the operation. Then the trained agent directly uses this intraoperative US reconstruction to plan thesequence of motion commands to drill the path for subsequent pedicle screw placement (PSP). The real-time pose of the surgical tool is tracked by an optical tracking system andused to update the state input to the agent.

图1. 我们提出的SafeRPlan框架概述。在术前训练过程中，我们利用分割后的术前MRI数据构建了一个训练模拟环境，并结合随机合成超声重建（RsUS）来训练安全强化学习（RL）代理。在手术过程中，我们假设首先通过机器人超声（Li等，2023b）进行背侧表面重建。然后，经过训练的代理直接利用这一术中超声重建数据，规划出钻孔路径的运动命令序列，以便后续的椎弓根螺钉植入（PSP）。手术工具的实时姿态通过光学追踪系统进行追踪，并用于更新代理的状态输入。

Fig. 2. A planning framework with registration techniques based on surface reconstruction. (a) Extracted anatomical structures from segmented preoperative image; (b) bone model(green) and preoperative planning (yellow cylinder); (c) extracted surface features and planning; (d) intraoperative robotic US sweep; (e) surface reconstruction (yellow points)from segmented intraoperative data, overlaid on real anatomies of the patient; (f) registration and resulting planned trajectory in the real world (yellow cylinder); (g) planning ofrobotic drilling or guidance (h) an illustration of possible unacceptable planning cases with noisy surface reconstruction; Although the registration error can be low, the estimatedtrajectory (yellow cylinder) overlaps with the spinal cord, which is unsafe

图2. 基于表面重建的配准技术规划框架。 (a) 从分割后的术前影像中提取的解剖结构； (b) 骨模型（绿色）和术前规划（黄色圆柱）； (c) 提取的表面特征和规划； (d) 术中机器人超声扫描； (e) 从分割后的术中数据中重建的表面（黄色点），叠加在病人的真实解剖结构上； (f) 配准及其在现实世界中的规划轨迹（黄色圆柱）； (g) 机器人钻孔或引导规划； (h) 示意图，展示了在噪声表面重建情况下可能出现的不可接受的规划情况；尽管配准误差可能较低，但估算的轨迹（黄色圆柱）与脊髓重叠，属于不安全情况。

Fig. 3. Training simulation environment for PSP. (a) Simulation environment and domain randomization; (b) Construction of 3D imaging state. 𝑉 𝑓 𝑟𝑒𝑒 , 𝑉 𝑝𝑟𝑒𝑠𝑒𝑟𝑣𝑒 , 𝑉 𝑐𝑜𝑟𝑡 , 𝑉 𝑐𝑎𝑛𝑐 and𝑉 𝑛𝑜 are colored by orange, pink, light gray, dark gray and red respectively. Potential effects (highlighted by orange circles) from real US reconstruction data are used to generaterandomly disturbed synthetic US reconstruction (RsUS). 3D image states are created based on RsUS and only contain 3 labels: drill (blue), reconstruction (green) and empty(yellow)

图3. PSP训练模拟环境。 (a) 模拟环境与领域随机化； (b) 3D成像状态的构建。𝑉 𝑓𝑟𝑒𝑒、𝑉 𝑝𝑟𝑒𝑠𝑒𝑟𝑣𝑒、𝑉 𝑐𝑜𝑟𝑡、𝑉 𝑐𝑎𝑛𝑐 和 𝑉 𝑛𝑜 分别标记为橙色、粉色、浅灰色、深灰色和红色。通过真实超声重建数据产生的潜在效应（橙色圆圈标示）被用来生成随机扰动的合成超声重建（RsUS）。基于RsUS创建的3D图像状态仅包含3个标签：钻孔（蓝色）、重建（绿色）和空白（黄色）。

Fig. 4. Random selection of regions on the surface to add observation disturbances.Centers of polygons (red) are first randomly sampled from the GT surface point cloud(green). Then random 2D generation vectors (blue arrows, {ℎ𝑖𝑗 } 𝐽 𝑗=1 𝑖 ) are sampled in theY-Z plane, which serve as the normals of hyperplanes (purple lines, ℎ 𝑇 𝑖𝑗(𝑝−𝑝 𝑐 𝑖 ) = ‖ℎ𝑖𝑗‖ 2 )to construct convex sets in Y-Z plane. Points with Y-Z coordinates within the convexsets (orange, {𝑝 ∶ 𝐻𝑖 (𝑝 − 𝑝 𝑐 𝑖 ) ≤ 𝑏𝑖 }) are the selected regions to apply disturbances.

图4. 随机选择表面区域以添加观测干扰。首先，从真实表面点云（绿色）中随机采样多边形的中心（红色）。然后，在Y-Z平面内随机生成二维向量（蓝色箭头，ℎ𝑖𝑗），作为超平面的法向量（紫色线，ℎ𝑇𝑖𝑗(𝑝−𝑝𝑐𝑖) = ‖ℎ𝑖𝑗‖₂），用于在Y-Z平面中构建凸集。那些Y-Z坐标在凸集内的点（橙色，{𝑝 : 𝐻𝑖(𝑝−𝑝𝑐𝑖) ≤ 𝑏𝑖}）即为选择的区域，用以应用干扰。

Fig. 5. Relevant terms to evaluate the performance of PSP. The insertion depth (𝑏𝑡 ),gold standard, damage length (𝑚𝑑𝑎𝑚 𝑡 ) and safe distance (𝑙 𝑡 𝑠𝑎𝑓 𝑒 ) are visualized with blue,green, orange and red respectively. Here the gold standard is illustrated by a cylinderdetermined by the trajectory center (𝐩 𝑐𝑒𝑛𝑡𝑒𝑟 ), direction (𝐯 𝑑𝑟𝑐𝑡), pedicle width (PW) andtrajectory length.

图5. 用于评估PSP性能的相关术语。插入深度（𝑏𝑡）、黄金标准、损伤长度（𝑚𝑑𝑎𝑚𝑡）和安全距离（𝑙𝑡𝑠𝑎𝑓𝑒）分别用蓝色、绿色、橙色和红色可视化。这里，黄金标准通过由轨迹中心（𝐩̄center）、方向（𝐯̄drc）、椎弓根宽度（PW）和轨迹长度确定的圆柱体来表示。

Fig. 6. Teacher–student learning for safe RL with randomly disturbed 3D image states. Our safe DRL agent contains an actor-critic network and a learning-based safety filter, bothwith 3D convolutional neural network (CNN) feature encoders. The safety filter predicts the distance to unsafe states during policy deployment to ensure safe behavior. In the firststage, we train a teacher agent in a simulation with ground truth (GT) vertebra upper surface to construct states. Then in the second stage, we train a student agent to plan basedon randomized synthetic ultrasound reconstruction (RsUS). To improve the training performance, we use the teacher agent’s feature encoder to guide the student agent’s featureextraction

图6. 师生学习用于安全强化学习（RL），并加入随机干扰的3D图像状态。我们的安全DRL代理包含一个演员-评论家网络和一个基于学习的安全过滤器，两者均使用3D卷积神经网络（CNN）特征编码器。安全过滤器在策略执行过程中预测到不安全状态的距离，以确保安全行为。在第一阶段，我们在带有真实标签（GT）脊椎上表面的仿真环境中训练教师代理以构建状态。然后在第二阶段，我们训练学生代理基于随机合成的超声重建（RsUS）进行规划。为了提高训练性能，我们使用教师代理的特征编码器来指导学生代理的特征提取。

Fig. 7. Real volunteer dataset samples and effectiveness of domain randomization. (a) Examples of the first group of real human datasets used for both training and validation. (b)Examples of the second group of real human datasets used exclusively for validation. Only vertebrae with sufficient surface reconstructions are selected for experimental purposes.The bone surface reconstruction and segmented CT of bones are colored green and gray respectively. (c) Demonstration of the effectiveness of domain randomization. For each ofthe two example vertebrae shown, we can generate one synthetic reconstruction (DR) with point distributions similar to the real reconstructions (real) within 100 random trials..

图7. 真实志愿者数据集样本及领域随机化的有效性。 (a) 用于训练和验证的第一组真实人体数据集示例。 (b) 用于仅验证的第二组真实人体数据集示例。实验中仅选择具有足够表面重建的椎骨。骨骼表面重建和CT分割图分别用绿色和灰色表示。 (c) 领域随机化有效性的展示。对于显示的两个示例椎骨，我们可以生成一个合成重建（DR），其点分布在100次随机试验中与真实重建（real）相似

Fig. 8. Training and validation pipelines for our approaches. (a): Training and validation pipelines for generalization over unseen observation noises. Red arrows denote the pipelinefor training an RL policy for each human model. Green arrows illustrate the pipeline to construct the corresponding validation environment based on unseen reconstructions. For thevalidation environment, the reconstructions are aligned with the GT bone model. (b): Training and validation pipeline for generalization over unseen patients. Red arrows denotethe pipeline for training an RL policy for all human models. Yellow arrows show the validation pipeline based on the unseen US and CT datasets. In the validation environments,the real US reconstruction is colored green.

图8. 我们方法的训练和验证流程。 (a)：针对未见观测噪声的泛化训练和验证流程。红色箭头表示为每个人体模型训练强化学习（RL）策略的流程。绿色箭头表示根据未见重建数据构建相应验证环境的流程。在验证环境中，重建数据与真实标签（GT）骨骼模型对齐。 (b)：针对未见患者的泛化训练和验证流程。红色箭头表示为所有人体模型训练RL策略的流程。黄色箭头表示基于未见超声（US）和计算机断层扫描（CT）数据集的验证流程。在验证环境中，真实的US重建图像用绿色表示。

Fig. 9. Training curves for the experiments. (a)–(e): Training curves for each individual human model from ITIS virtual population. (f): Training curve of the policy learning fromall 5 human models from the entire ITIS foundation dataset. Solid and dashed curves are for training with and without domain randomization (DR) for the observation, respectively.Therefore, the final value of the dashed yellow curves should be the upper bound for the solid curves. The lines display the exponential moving average of the metrics, and theshadows are corresponding variances.

图9. 实验的训练曲线。 (a)–(e)：来自ITIS虚拟人群的每个人体模型的训练曲线。 (f)：来自整个ITIS基础数据集的5个人体模型的策略学习训练曲线。实线和虚线分别表示在有和没有领域随机化（DR）的观测条件下进行训练。因此，虚线黄色曲线的最终值应为实线曲线的上限。曲线显示的是度量的指数移动平均值，阴影表示对应的方差。

Fig. 10. Not acceptable scenarios in our task. (a) Damage safety critical anatomies.(b) Break through the vertebra. (c) Partial break through too much into lateral softtissue. The widest part outset the pedicle region is denoted as yellow, which is usedfor Gertzbein-Robbins (GR) classification. This metric is only used for validation

图 10 我们任务中不可接受的场景。 (a) 损伤安全关键解剖结构。 (b) 穿透椎骨。 (c) 过度穿透到外侧软组织。椎骨外缘的最宽部分用黄色标出，该部分用于 Gertzbein-Robbins (GR) 分类。该指标仅用于验证。

Fig. 11. Scatter plot for IR and SR values of different methods for the 5 human modeldatasets. Our approach (blue stars) achieves the highest safety rates with sufficientinsertion ratios.

图 11 不同方法在 5 个人体模型数据集上的插入比率 (IR) 和安全率 (SR) 散点图。我们的方案（蓝色星形标记）在保证足够插入比率的同时，实现了最高的安全率。

Fig. 12. Qualitative evaluation results from an expert spine surgeon for spine surgery. (a): Qualitative evaluation results for planned trajectories with synthetic US reconstruction.(b): Qualitative evaluation results for planned trajectories with real water model dataset.

图 12. 来自脊柱外科专家的脊柱手术定性评估结果。 (a)：使用合成 US 重建的计划轨迹的定性评估结果。 (b)：使用真实水模型数据集的计划轨迹的定性评估结果。

Fig. 13. Qualitative evaluation results from an expert spine surgeon for the results on the real volunteer dataset. (a): Qualitative evaluation results for planned trajectories ofgroup 1. (b): Qualitative evaluation results for planned trajectories of group 2.

图 13. 来自脊柱外科专家的真实志愿者数据集结果的定性评估结果。 (a)：第一组计划轨迹的定性评估结果。 (b)：第二组计划轨迹的定性评估结果。

Fig. 14. Performance of each approach on vertebra with different reconstructionquality. (a) Insertion ratios for the first group of datasets with average safe rates at99%, 96%, and 92% for DR + SF + TS, DR + SF and SF respectively. (b) Insertion ratiosfor the second group of datasets using policies trained on the first group, with averagesafe rates at 88%, 92%, and 71% respectively. For each approach, linear regression isused to illustrate the correlation between performance and reconstruction quality

图.14. 每种方法在不同重建质量的脊椎上的表现。(a) 第一组数据的插入比率，DR + SF + TS、DR + SF和SF的平均安全率分别为99%、96%和92%。(b) 第二组数据使用的策略是在第一组数据上训练的，其平均安全率分别为88%、92%和71%。对于每种方法，使用线性回归来说明性能与重建质量之间的相关性。

Fig. 15. Averaged sensitivity analysis of the safety filters for 5 human model datasets.(a) SR under different 𝜆 and 𝑑. (b) IR under different 𝜆 and 𝑑.

图 15. 对5个人体模型数据集的安全过滤器的平均敏感性分析。 (a) 在不同的 𝜆 和 𝑑 下的安全率 (SR)。 (b) 在不同的 𝜆 和 𝑑 下的插入比例 (IR)。

Fig. 16. Example sensitivity analysis of the performance of the trajectory w.r.t. initialposition for a trained policy. The values of safety rates (SR) and insertion ratio (IR)for different 3D initial positions are illustrated on the longitudinal and frontal planesrespectively. The scalar value at each pixel is the performance (IR or SR) of the plannedtrajectory if the drill is initialized at the position of that pixel (for IR the anatomiesare omitted)

图 16. 训练策略对初始位置的轨迹性能的例子敏感性分析。不同3D初始位置下的安全率（SR）和插入比例（IR）值分别在纵向和平面上进行展示。每个像素上的标量值表示在该像素位置初始化钻头时，规划轨迹的性能（IR 或 SR）（对于 IR，解剖结构被省略）。

Fig. 17. Forward and side views of example trajectories planned by our policies with both synthetic and real US construction data. The planned trajectories for various vertebralevels, initial poses, and human models are selected to demonstrate the effectiveness of our method. The synthetic intraoperative ultrasound reconstructions (IUS) are coloredyellow, and real US reconstructions are colored green. For the first 2 rows, the green area inside the vertebra denotes the annotated GS trajectories from human planners

图 17. 我们的策略规划的例子轨迹的前视图和侧视图，使用了合成和真实的超声重建数据。为展示我们方法的有效性，选择了不同的脊椎层级、初始姿态和人类模型的规划轨迹。这些轨迹使用合成的术中超声重建（IUS）标记为黄色，真实的超声重建标记为绿色。在前两行中，脊椎内部的绿色区域表示人类规划者标注的GS（黄金标准）轨迹。

Fig. 18. Ablation study of domain randomization and teacher–student learning. (a): Sensitivity of training w.r.t. regional reduction rates; (b): sensitivity of training w.r.t. regionaladdition. (c), (d): sensitivity of training w.r.t. regional height changes; (e): sensitivity of training w.r.t. density of the point cloud, which is controlled by randomly keeping acertain ratio of the points; (f): sensitivity of training w.r.t. feature loss weight.

图 18. 域随机化和教师-学生学习的消融研究。 (a): 训练对区域减少率的敏感性； (b): 训练对区域增加的敏感性； (c)，(d): 训练对区域高度变化的敏感性； (e): 训练对点云密度的敏感性，通过随机保留一定比例的点来控制； (f): 训练对特征损失权重的敏感性。

Fig. 19. Ablation study for reward weights. (a), (d): unsafe penalty (𝑤3 ); (b), (e): damage length penalty (𝑤2 ); (c), (f): following GS reward (𝑤4 ). For damage length, the influenceof reward weight 𝑤2 is not significant compared with other weights, except that too large weight will also stop the agent from exploration. This can be because a small 𝑤2 hasalready encouraged the agent to move the drill straight inside the soft tissue.

图 19. 奖励权重的消融研究。 (a)，(d): 不安全惩罚 (𝑤₃)； (b)，(e): 损伤长度惩罚 (𝑤₂)； (c)，(f): 跟随 GS 奖励 (𝑤₄)。对于损伤长度，相比其他权重，奖励权重 𝑤₂ 的影响不显著，除了过大的权重会阻止智能体的探索。这可能是因为较小的 𝑤₂ 已经鼓励智能体将钻头直接移入软组织中。

Fig. 20. CNN saliency map of the maximum action logit, value function and distance to unsafe region prediction. (a) State as 3D image with vertebra surface reconstruction anddrill. (b) CNN saliency map of the maximum action logit. (c) CNN saliency map of the value function. (d) CNN saliency map of the distance to unsafe region prediction. The areasurrounding the bone surface has higher gradient norms compared to other image positions.

图 20. 最大动作对数值、价值函数和不安全区域预测的 CNN 显著性图。 (a) 作为 3D 图像的状态，包含椎骨表面重建和钻头。 (b) 最大动作对数值的 CNN 显著性图。 (c) 价值函数的 CNN 显著性图。 (d) 不安全区域预测的 CNN 显著性图。在图像中，骨表面周围的区域相比其他位置具有更高的梯度范数。

Fig. 21. Forward and side views of example trajectories planned by our policies for the real volunteer dataset. The first row is the planning results for the first group with thepolicy trained on the same group. The second row is the planning results for the second group with the same policy

图 21. 我们的方法在真实志愿者数据集上的规划轨迹示例的前视和侧视图。第一行是对第一组数据的规划结果，使用的是在相同数据集上训练的策略。第二行是对第二组数据的规划结果，使用的是相同的策略。

Table

表

Table 1Scales (S) and noise (N) of motions in our kinematic simulation. ‘‘+’’ and ‘‘−’’ meanmotions along positive and negative directions along the axis. ‘‘Soft’’, ‘‘Cortical’’ and‘‘Cancellous’’ means the drill tip is inside the soft tissues, cortical and cancellous bones,respectively. The noise value denotes both the standard deviation for Gaussian noiseand the range of uniform noise.

表1 我们的运动学仿真中的运动尺度（S）和噪声（N）。 “+” 和 “−” 表示沿轴的正方向和负方向的运动。“软组织”、“皮质骨”和“海绵骨”分别表示钻头位于软组织、皮质骨和海绵骨内。噪声值表示高斯噪声的标准差以及均匀噪声的范围。

Table 2Human model datasets with various ages and BMIs. Pedicle widths (PWs) are onlyaveraged over the left and right sides of L1-L5 vertebrae

表2 不同年龄和BMI的人体模型数据集。仅对L1-L5椎骨的左侧和右侧的棘突宽度（PW）进行平均计算。

Table 3Insertion ratio (IR) and safe rates (SR) performance in validation environments with IUS as inputs. The SR is the ratio of acceptable trajectories within the 200 trajectories. Thebold numbers are the best column-wise value. The underlined values for Insertion ratios are the highest column-wise values given SR higher than 98. We highlight this value sincewe consider safety to be more important than insertion and 98 is the highest safety rate that is achievable for all datasets

表3 插入比例（IR）和安全率（SR）在使用术中超声（IUS）作为输入的验证环境中的表现。安全率（SR）是200条轨迹中可接受轨迹的比例。粗体数字为每列的最佳值。插入比例下划线部分为每列在安全率高于98时的最高值。我们突出显示该值，因为我们认为安全性比插入更加重要，而98%的安全率是所有数据集中可实现的最高安全率。

Table 4Summary of evaluation results on synthetic US reconstruction dataset. The values areaveraged over the 5 human datasets from the ITIS virtual population. Bold numbersare the best column-wise values

表 4 合成 US 重建数据集的评估结果总结。数值是对来自 ITIS 虚拟人群的 5 个人体数据集进行平均的结果。加粗的数字是每列的最佳值。

Table 5Evaluation results on water model ultrasound (US) reconstruction dataset. Bold numbersare the best column-wise values. RR is not available because the real US + CT datasetcontains no soft tissues.

表 5 水模型超声 (US) 重建数据集的评估结果。粗体数字为每列的最佳值。由于真实的US + CT数据集不包含软组织，RR不可用。

SafeRPlan: 用于椎弓根螺钉置入术中规划的安全深度强化学习|文献速递-生成式模型与transformer在医学影像中的应用

相关文章

Qt实现自定义行编辑器

数据结构强化篇

R语言函数简介

前端成长之路：HTML（4）

Flutter-Web首次加载时添加动画

Linux-设备树

MySQL 实战：小型项目中的数据库应用（二）

智能算法驱动：中阳科技量化交易模型的革新之路