这是用户在 2025-1-11 22:46 为 https://arxiv.org/html/2409.11047?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: mdwmath
  • failed: mdwtab
  • failed: isomath
  • failed: epic

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-SA 4.0
许可协议:CC BY-NC-SA 4.0
arXiv:2409.11047v1 [cs.RO] 17 Sep 2024
arXiv:2409.11047v1 [cs.RO] 2024 年 9 月 17 日

TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation
TacDiffusion:力域扩散策略在精确触觉操控中的应用

Yansong Wu*, Zongxie Chen*, Fan Wu, Lingyun Chen, Liding Zhang,
Zhenshan Bing, Abdalla Swikir, Alois Knoll, Sami Haddadin
The authors are with the Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Germany. The authors with the notation * contribute equally to this work. Corresponding author: Fan Wu (f.wu@tum.de). The authors acknowledge the financial support by the Bavarian State Ministry for Economic Affairs, Regional Development and Energy (StMWi) for the Lighthouse Initiative KI.FABRIK (Phase 1: Infrastructure as well as the research and development program under, grant no. DIK0249). Code avaible: https://github.com/popnut123/TacDiffusion
Abstract  摘要

Assembly is a crucial skill for robots in both modern manufacturing and service robotics. However, mastering transferable insertion skills that can handle a variety of high-precision assembly tasks remains a significant challenge. This paper presents a novel framework that utilizes diffusion models to generate 6D wrench for high-precision tactile robotic insertion tasks. It learns from demonstrations performed on a single task and achieves a zero-shot transfer success rate of 95.7% across various novel high-precision tasks. Our method effectively inherits the self-adaptability demonstrated by our previous work. In this framework, we address the frequency misalignment between the diffusion policy and the real-time control loop with a dynamic system-based filter, significantly improving the task success rate by 9.15%. Furthermore, we provide a practical guideline regarding the trade-off between diffusion models’ inference ability and speed.
装配是现代制造业和服务机器人领域中机器人必备的关键技能。然而,掌握能够应对各种高精度装配任务的可迁移插入技能仍是一项重大挑战。本文提出了一种新颖的框架,利用扩散模型生成用于高精度触觉机器人插入任务的 6D 力矩。该框架从单一任务的演示中学习,并在多种新颖的高精度任务上实现了 95.7%的零样本迁移成功率。我们的方法有效继承了先前工作中展示的自适应能力。在此框架中,我们通过基于动态系统的滤波器解决了扩散策略与实时控制循环之间的频率失配问题,显著提高了任务成功率 9.15%。此外,我们还提供了关于扩散模型推理能力与速度之间权衡的实用指南。

I Introduction  引言

Assembly tasks are crucial in robotics, serving as the backbone of modern manufacturing and service applications [1]. As the demand for flexible manufacturing grows, robotic assembly increasingly takes place in dynamic environments, where objects are not precisely positioned at known locations and part holders are often not viable [2]. Achieving both broad transferability and precise control capabilities in these conditions remains a significant challenge. Human workers, on the other hand, demonstrate exceptional dexterity in assembling diverse objects with tight-clearance components, primarily by leveraging tactile feedback from their fingertips throughout the process [3, 4]. Similarly, a versatile high-precision robotic assembly system must exhibit both task-level transferability—generalizing across a wide range of objects and parts—and control-level self-adaptability, enabling it to respond to environmental changes often sensed through tactile feedback [5, 6].
装配任务在机器人技术中至关重要,是现代制造业和服务应用的核心[1]。随着对灵活制造需求的增长,机器人装配越来越多地在动态环境中进行,这些环境中物体并未精确放置于已知位置,且零件固定装置往往不可行[2]。在这些条件下实现广泛的适应性和精确的控制能力仍然是一个重大挑战。相比之下,人类工人在组装具有紧密间隙部件的多样化物体时展现出非凡的灵巧性,这主要得益于他们在整个过程中利用指尖的触觉反馈[3, 4]。同样,一个多功能的高精度机器人装配系统必须展现出任务层面的适应性——即能够广泛适应各种物体和零件——以及控制层面的自我调节能力,使其能够响应通常通过触觉反馈感知到的环境变化[5, 6]。

Throughout the history of robotics research, the importance of tactile feedback and force control for high-precision assembly has been consistently recognized [7, 8, 9, 10, 11, 12, 5, 13]. However, several challenges persist in precise force control, including the difficulty of accessing to appropriate robot hardware and expensive force sensors, the complexity of ensuring stability and safety while regulating force, the sensitivity of force control to environmental changes, the difficulty of estimating environment constraints and contact dynamics in dynamic settings, and the challenge of collecting high-quality tactile data for learning force control. Due to these barriers, the use of simpler motion-domain action spaces, with impedance control as an indirect force control method is often favored by the robot learning community. Nevertheless, the increasing diversity of contact-rich manipulation tasks highlights the equal importance of simultaneously regulating motion, compliance, and force, so that agents can autonomously perform a wide range of task stably and robustly, without the need for explicit controller switching [14]. Despite the recent successes of implementing transformer [15, 16, 17, 18, 19, 20] and/or diffusion-based [21, 22, 23, 24, 25, 26, 27, 28, 29, 30] policies for robot manipulation that exhibit excellent generalization capability, it remains unexplored how to integrate force control with these models for high-precision tactile manipulation, so that the benefits of these generative models for multi-modal modelling and prediction can be fully exploited.
在机器人研究的历史进程中,触觉反馈与力控制对于高精度装配的重要性始终被广泛认可[7, 8, 9, 10, 11, 12, 5, 13]。然而,精确力控制仍面临诸多挑战,包括难以获取合适的机器人硬件及昂贵的力传感器、在调节力时确保稳定性和安全性的复杂性、力控制对环境变化的敏感性、在动态环境中估计环境约束和接触动力学的困难,以及收集高质量触觉数据以学习力控制的挑战。鉴于这些障碍,机器人学习领域往往倾向于采用更简单的运动域动作空间,以阻抗控制作为间接的力控制方法。尽管如此,接触丰富的操作任务日益多样化,凸显了同时调节运动、柔顺性和力的同等重要性,使得智能体能够自主、稳定且鲁棒地执行广泛任务,而无需显式的控制器切换[14]。 尽管最近在实施基于 transformer[15, 16, 17, 18, 19, 20]和/或扩散模型[21, 22, 23, 24, 25, 26, 27, 28, 29, 30]的机器人操作策略方面取得了成功,这些策略展现出卓越的泛化能力,但如何将这些模型与力控制相结合以实现高精度的触觉操作仍未被探索,从而无法充分利用这些生成模型在多模态建模和预测方面的优势。

To address this gap, and aiming to achieve both task-level transferability and control-level self-adaptability, we propose TacDiffusion, a novel framework that leverages a diffusion policy for high-precision tactile manipulation. To the authors’ knowledge, it is the first framework to employ diffusion models in generating force-domain actions for tactile-based robotic manipulation in tight-clearance insertion tasks. TacDiffusion learns from demonstrations performed by expert policies on a single task and achieves an overall 95.7% zero-shot transfer success rate across various novel high-precision, sub-millimeter-level peg-in-hole tasks. By imitating the expert policies, which are based on a behavior tree-based skill proposed in our previous work [31], TacDiffusion successfully inherits its self-adaptability, characterized by the ability to switch skill primitives based on real-time tactile sensing. Importantly, compared to the expert policy, TacDiffusion also outperforms in execution time and robustness on these novel tasks in a zero-shot transfer manner.
为填补这一空白,并旨在实现任务级可迁移性和控制级自适应性,我们提出了 TacDiffusion,一种利用扩散策略进行高精度触觉操作的新颖框架。据作者所知,这是首个在紧密间隙插入任务中采用扩散模型生成力域动作以支持基于触觉的机器人操作的框架。TacDiffusion 通过模仿专家策略在单一任务上的演示进行学习,并在多种新颖的高精度、亚毫米级孔轴配合任务中实现了总体 95.7%的零样本迁移成功率。通过模仿基于我们先前工作中提出的行为树技能的专家策略,TacDiffusion 成功继承了其自适应性,表现为能够根据实时触觉感知切换技能原语。重要的是,与专家策略相比,TacDiffusion 在这些新任务上的执行时间和鲁棒性方面也以零样本迁移方式表现更优。

To further enhance real-time performance, we investigate how model size affects the trade-off between accuracy and inference speed, providing practical guidelines for optimal model selection. Moreover, to handle the frequency misalignment between the diffusion policy’s inference process and the low-level controller, a dynamic system-based filter is designed to smooth the output of the diffusion model for high-frequency force-impedance control, significantly improving the task success rate by 9.15%.
为进一步提升实时性能,我们探究了模型规模如何影响精度与推理速度之间的权衡,为最优模型选择提供了实用指南。此外,针对扩散策略推理过程与底层控制器之间的频率失配问题,设计了一种基于动态系统的滤波器,用于平滑扩散模型的输出,以实现高频力-阻抗控制,从而将任务成功率显著提高了 9.15%。

In summary, our main contributions are: (i) a novel diffusion-based policy that outputs 6D wrench for tactile manipulation; (ii) learning from a behavior tree-based expert policy to inherent its tactile-based self-adaptability; (iii) a dynamic system-based filter smoothing and aligning low frequency outputs from diffusion model with high frequency control, with experimental evidences showing significant effect on task performance; (iv) investigation on trade-off between accuracy and inference speed, resulting in insights for optimal model selection in practice.
总结而言,我们的主要贡献包括:(i)一种新颖的基于扩散的策略,能够输出用于触觉操作的六维力/力矩;(ii)通过从基于行为树的专家策略中学习,继承其基于触觉的自适应能力;(iii)基于动态系统的滤波器,用于平滑和对齐扩散模型输出的低频信号与高频控制,实验证据显示这对任务性能有显著影响;(iv)对精度与推理速度之间权衡的研究,为实践中模型的最优选择提供了洞见。

II Related Works  II 相关工作

In this section, we focus our review on (i) High-Precision Assembly Tasks, (ii) Transferability and (iii) Diffusion model in robotics.
本节中,我们重点回顾了(i)高精度装配任务,(ii)可迁移性,以及(iii)机器人学中的扩散模型。

II-A High-Precision Assembly Tasks
II-A 高精度装配任务

Due to the robot’s accuracy limitation, position-based control methods are insufficient for high-precision assembly tasks that require accuracy exceeding the robot’s precision [10]. To address this issue, recent studies have shifted to designing actions in the force-domain rather than the position domain to perform high-precision robotic assembly tasks. According to the control strategy, these methods span four main categories: force controller [9], admittance controller [8], hybrid position/force controller[10, 13], and impedance controller with feed-forward force [11, 12]. Nevertheless, these works normally focus on a specific tight-clearance task and lack investigation into the method’s transferability and adaptability to novel tasks[5].
由于机器人精度限制,基于位置的控制方法对于要求精度超过机器人精度的高精度装配任务来说是不够的[10]。为了解决这一问题,最近的研究转向在力域而非位置域设计动作,以执行高精度机器人装配任务。根据控制策略,这些方法主要分为四类:力控制器[9]、导纳控制器[8]、混合位置/力控制器[10, 13]以及带前馈力的阻抗控制器[11, 12]。然而,这些工作通常专注于特定的紧密间隙任务,缺乏对方法可迁移性及对新任务适应性的研究[5]。

II-B Enhancing Transferability in Robotic Assembly
II-B 增强机器人装配中的可迁移性

In the last decade, there is extensive literature on generating robotic assembly policies with broad generalization. Deep Reinforcement Learning-based methods, for instance, typically achieve the generalization ability through training with multiple objects [32, 33]. Another noteworthy case is meta-learning, which trains a pre-trained model using online or offline data from a diverse and comprehensive set of tasks, enabling domain adaptation ability through fine-tuning [34, 35]. Furthermore, sim-to-real based approaches have gained attention for their cost-effective data collection in the simulation environments, and zero-shot sim-to-real transfer for perception-initialized assembly has been only recently demonstrated [36, 37]. Besides, to tackle precise manipulation, RVT-2 [20] trained a transformer-based multi-task policy. Despite improving performance on multi-task learning benchmark, its success rate on high-precision (millimeter level) insertion tasks, roughly 50%, is far from being satisfactory to deploy to real assembly production. Aside from these approaches, evolutionary algorithms with parameterized robot skills have shown transferability across tasks via fine-tuning [31]. However, achieving zero-shot transfer on high-precision tasks with a satisfactory success rate in the real world remains an open challenge
过去十年间,关于生成具有广泛泛化能力的机器人装配策略的文献大量涌现。例如,基于深度强化学习的方法通常通过多对象训练来实现泛化能力[32, 33]。另一个值得注意的案例是元学习,它利用来自多样且全面任务集的在线或离线数据训练预训练模型,通过微调赋予领域适应能力[34, 35]。此外,基于仿真到现实的方法因其在仿真环境中成本效益高的数据收集而受到关注,而感知初始化装配的零样本仿真到现实迁移直到最近才得以展示[36, 37]。另外,针对精确操作,RVT-2[20]训练了一个基于 Transformer 的多任务策略。尽管在多任务学习基准上提升了性能,但其在高精度(毫米级)插入任务上的成功率约为 50%,远未达到实际装配生产部署的满意水平。 除了这些方法外,采用参数化机器人技能的进化算法已通过微调展示了跨任务的可迁移性[31]。然而,在现实世界中实现高精度任务的零样本迁移并达到令人满意的成功率,仍然是一个未解的挑战

II-C Diffusion Model in Robotics
II-C 机器人学中的扩散模型

Meanwhile, in other areas of robotics, diffusion models [38] have made significant progress. Compared to traditional discriminative models, diffusion models excel in generalization, achieving superior performance on unseen tasks and scenarios, by establishing a stochastic transport map between an empirically observed target distribution and a known prior [30]. Recent works have typically used scene images as input to solve planning problems [23, 24, 29] and perform manipulation tasks [25, 26, 27, 28] in robotics. However, the application of diffusion models with other input modalities remains relatively underexplored in robotics, with only few studies addressing this area [22]. In addition, considering diffusion model applications in sequential behavior imitation [21] and time series processing [39], there is great potential for adapting diffusion models to force-domain actions in robotics.
与此同时,在机器人学的其他领域,扩散模型[38]已取得显著进展。相较于传统的判别模型,扩散模型在泛化能力上表现卓越,通过建立经验观察到的目标分布与已知先验之间的随机传输映射[30],在未见过的任务和场景中实现了更优的性能。近期研究通常利用场景图像作为输入,以解决机器人学中的规划问题[23, 24, 29]并执行操作任务[25, 26, 27, 28]。然而,在机器人学中,扩散模型与其他输入模态结合的应用仍相对较少探索,仅有少数研究涉足此领域[22]。此外,考虑到扩散模型在序列行为模仿[21]和时间序列处理[39]中的应用,将扩散模型适配于机器人学中的力域动作具有巨大潜力。

In summary, although significant progress has been made in insertion tasks, achieving zero-shot transfer in high-precision assembly tasks remains an ongoing challenge. Additionally, the application of diffusion models to force-domain actions has not yet been explored. To bridge these gaps, we propose a novel framework that leverages diffusion models to enable more efficient zero-shot transfer in high-precision insertion tasks.
总之,尽管在插入任务方面已取得显著进展,但在高精度装配任务中实现零样本迁移仍是一个持续的挑战。此外,扩散模型在力域动作中的应用尚未得到探索。为了填补这些空白,我们提出了一种新颖的框架,该框架利用扩散模型,在高精度插入任务中实现更高效的零样本迁移。

III Methods  III 方法

To solve the aforementioned issues, we develop a framework that adapts the diffusion model to force-domain actions for high-precision tactile assembly tasks. In the following subsections, we first provide an overview of the framework, followed by a detailed explanation of the concrete modules, i.e., the diffusion model, the impedance control with feed-forward force, and the dynamic system-based filter.
为解决上述问题,我们开发了一个框架,该框架将扩散模型适配于力域动作,以执行高精度触觉装配任务。在接下来的小节中,我们首先概述该框架,随后详细解释具体模块,即扩散模型、带前馈力的阻抗控制以及基于动态系统的滤波器。

III-A Framework Overview
III-A 框架概述

Our framework comprises two key functional modules: the diffusion policy-based action generation module and the impedance control with feed-forward-based execution module.111A practical consideration here is the compatibility issues between the real-time kernel and the NVIDIA CUDA Toolkit.
此处的一个实际考虑是实时内核与 NVIDIA CUDA 工具包之间的兼容性问题。

我们的框架包含两个关键功能模块:基于扩散策略的动作生成模块和带有前馈基础的阻抗控制执行模块。
As illustrated in Fig. LABEL:fig:IL_overview_new, the diffusion-based policy is integrated into the behavior tree (BT) based Insertion skill by replacing the original sub-tree, which contained two primitives and a state estimator. The resultant behavior tree is simplified into a sequence of skill primitives, with “approach” and “contact” as two preceding primitives. As the BT is simplified into a sequence and the diffusion model handles primitive-switching, the discussion of the preceding skill primitives for contact initialization is beyond the scope of this work. For more details, we refer readers to our previous work [31].
如图 LABEL:fig:IL_overview_new 所示,基于扩散的策略通过替换原有子树(包含两个基本动作和一个状态估计器)被整合到基于行为树(BT)的插入技能中。由此产生的行为树被简化为一系列技能基本动作,其中“接近”和“接触”作为两个前置基本动作。由于行为树被简化为一个序列,且扩散模型负责基本动作间的切换,关于接触初始化前置技能基本动作的讨论超出了本文的范围。更多细节,我们建议读者参考我们之前的工作[31]。

During the assembly process, the interaction between the robot and the environment is captured as observation 𝒐\bm{o}bold_italic_o, which includes the external wrench, internal wrench, and end-effector’s speed. The diffusion model then predicts the force-domain actions (𝒂:=𝑭df\bm{a}:=\bm{F}_{df}bold_italic_a := bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT) based on both the current observation 𝒐curr\bm{o}_{curr}bold_italic_o start_POSTSUBSCRIPT italic_c italic_u italic_r italic_r end_POSTSUBSCRIPT and the previous observation 𝒐prev\bm{o}_{prev}bold_italic_o start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT. Due to the restrictions of computational resources, the diffusion model’s inference frequency typically ranges from 50 Hz to 500 Hz (Table I), which is misaligned with the robot’s 1000 Hz real-time control loop. To mitigate this, we designed a dynamic system-based filter to interpolate the diffusion model’s predictions 𝑭df\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT. The filtered action is then transmitted to the impedance controller with feed-forward force. Based on the desired goal 𝒙d\bm{x}_{d}bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (insertion hole’s pose) and the force command, it regulates the robot’s motion and force behavior simultaneously.
在装配过程中,机器人与其环境的交互被捕捉为观测值 𝒐\bm{o}bold_italic_o ,该值包括外部力矩、内部力矩以及末端执行器的速度。扩散模型随后基于当前观测 𝒐currsubscript\bm{o}_{curr}bold_italic_o start_POSTSUBSCRIPT italic_c italic_u italic_r italic_r end_POSTSUBSCRIPT 和先前观测 𝒐prevsubscript\bm{o}_{prev}bold_italic_o start_POSTSUBSCRIPT italic_p italic_r italic_e italic_v end_POSTSUBSCRIPT 预测力域动作( 𝒂:=𝑭dfassignsubscript\bm{a}:=\bm{F}_{df}bold_italic_a := bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT )。由于计算资源的限制,扩散模型的推理频率通常在 50 赫兹至 500 赫兹之间(见表 I),这与机器人 1000 赫兹的实时控制循环不同步。为解决此问题,我们设计了一种基于动态系统的滤波器,用于插值扩散模型的预测 𝑭dfsubscript\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT 。经过滤波的动作随后与前置力一同传送至阻抗控制器。根据期望目标 𝒙dsubscript\bm{x}_{d}bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT (插入孔的位姿)和力指令,控制器同时调节机器人的运动与力行为。

III-B Diffusion Model   III-B 扩散模型

Denoising diffusion probabilistic model (DDPM) [38, 40, 41] is a specific type of diffusion model designed to generate data by learning to reverse a noise injection process. DDPM consists of two processes: diffusion and denoising. The diffusion process systematically transforms the data into noise, while the denoising process is responsible for converting this noise back into data.
去噪扩散概率模型(DDPM)[38, 40, 41]是一种特定类型的扩散模型,旨在通过学习逆转噪声注入过程来生成数据。DDPM 包含两个过程:扩散和去噪。扩散过程系统地将数据转化为噪声,而去噪过程则负责将这种噪声重新转化为数据。

Refer to caption
Figure 1: Network architecture of the noise estimator.
图 1:噪声估计器的网络架构。

III-B1 Diffusion Process  III-B1 扩散过程

The diffusion process is a forward progressive process that destructs data with noise over a series of steps. By progressively injecting noise into a “clean” initial action 𝒂0\bm{a}_{0}bold_italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, a sequence of “polluted” actions 𝒂1,𝒂2,,𝒂T\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{T}bold_italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT converging to a Gaussian distribution is obtained, according to the diffusion rule [21]:
扩散过程是一种前向渐进过程,通过一系列步骤用噪声破坏数据。根据扩散规则[21],通过逐步向“干净”的初始动作 𝒂0subscript0\bm{a}_{0}bold_italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 注入噪声,可以得到一系列收敛于高斯分布的“污染”动作 𝒂1,𝒂2,,𝒂Tsubscript1subscript2subscript\bm{a}_{1},\bm{a}_{2},\cdots,\bm{a}_{T}bold_italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , bold_italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT

ατ\displaystyle{\alpha}_{\tau}italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT =1βτ,\displaystyle=1-{\beta}_{\tau},= 1 - italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , (1)
𝒂τ\displaystyle\bm{a}_{\tau}bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT =ατ𝒂τ1+βτϵτ,\displaystyle=\sqrt{\alpha_{\tau}}\ \bm{a}_{\tau-1}+\sqrt{\beta_{\tau}}\ \bm{% \epsilon}_{\tau},= square-root start_ARG italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG bold_italic_a start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT + square-root start_ARG italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , (2)

where τ[1,T]\tau\in[1,T]italic_τ ∈ [ 1 , italic_T ] denotes the diffusion step, with TTitalic_T referring to the total number of denoising steps (not to be confused with the environment time step, as it is common in time serials). 𝒂τ\bm{a}_{\tau}bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT and ϵτ𝒩(𝟎,𝑰)\bm{\epsilon}_{\tau}\in\mathcal{N}(\bm{0},\bm{I})bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ caligraphic_N ( bold_0 , bold_italic_I ) represent the diffused action and the corresponding noise in the τ\tauitalic_τ-th diffusion step. ατ{\alpha}_{\tau}italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT and βτ{\beta}_{\tau}italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT refer to variance schedule parameters that regulate the noise mixed in each diffusion step.
其中 τ[1,T]1\tau\in[1,T]italic_τ ∈ [ 1 , italic_T ] 表示扩散步骤, TTitalic_T 指代去噪步骤的总数(不要与环境时间步长混淆,这在时间序列中很常见)。 𝒂τsubscript\bm{a}_{\tau}bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPTϵτ𝒩(𝟎,𝑰)subscript0\bm{\epsilon}_{\tau}\in\mathcal{N}(\bm{0},\bm{I})bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∈ caligraphic_N ( bold_0 , bold_italic_I ) 分别代表在第 τ\tauitalic_τ 个扩散步骤中的扩散动作及相应的噪声。 ατsubscript{\alpha}_{\tau}italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPTβτsubscript{\beta}_{\tau}italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT 则是控制每个扩散步骤中混合噪声的方差调度参数。

Furthermore, the noise ϵτ\bm{\epsilon}_{\tau}bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT also plays a crucial role in the subsequent denoising process. To account for this, we construct the noise estimator ϵ^()\hat{\bm{\epsilon}}(\cdot)over^ start_ARG bold_italic_ϵ end_ARG ( ⋅ ) using a residual neural network [42], as illustrated in Fig. 1, and train it by minimizing the following loss function:
此外,噪声 ϵτsubscript\bm{\epsilon}_{\tau}bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT 在后续的去噪过程中也起着至关重要的作用。为了考虑这一点,我们构建了噪声估计器 ϵ^()\hat{\bm{\epsilon}}(\cdot)over^ start_ARG bold_italic_ϵ end_ARG ( ⋅ ) ,采用残差神经网络[42],如图 1 所示,并通过最小化以下损失函数对其进行训练:

𝓛DDPM=𝔼[ϵ^τ(𝒐,𝒂τ,τ)ϵτ22],\bm{\mathcal{L}}_{DDPM}={\mathbb{E}}[\left\|\hat{\bm{\epsilon}}_{\tau}(\bm{o},% \bm{a}_{\tau},{\tau})-{\bm{\epsilon}}_{\tau}\right\|_{2}^{2}],bold_caligraphic_L start_POSTSUBSCRIPT italic_D italic_D italic_P italic_M end_POSTSUBSCRIPT = blackboard_E [ ∥ over^ start_ARG bold_italic_ϵ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_o , bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_τ ) - bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] , (3)

where 𝒐\bm{o}bold_italic_o includes both the current and previous observations, as incorporating historical information helps identify trends and enhances the accuracy of predicting future actions. The diffusion step τ\tauitalic_τ serves as positional information, enabling the network to recognize the current diffusion stage effectively [43].
其中 𝒐\bm{o}bold_italic_o 包含当前及先前的观测数据,因为整合历史信息有助于识别趋势并提高预测未来行为的准确性。扩散步骤 τ\tauitalic_τ 作为位置信息,使网络能够有效识别当前的扩散阶段[43]。

III-B2 Denoising Process  III-B2 去噪过程

In contrast to the diffusion process, the denoising process reconstructs data from noise in reverse, illustrated by the linen block in Fig. LABEL:fig:IL_overview_new. Leveraging the previously trained noise estimator ϵ^()\hat{\bm{\epsilon}}(\cdot)over^ start_ARG bold_italic_ϵ end_ARG ( ⋅ ), the model progressively removes the noise from a random sample 𝒂T𝒩(𝟎,𝑰)\bm{a}_{T}\in\mathcal{N}(\bm{0},\bm{I})bold_italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_N ( bold_0 , bold_italic_I ), following the denoising rule:
与扩散过程相反,去噪过程从噪声中逆向重建数据,如图 LABEL:fig:IL_overview_new 中的亚麻布块所示。利用先前训练好的噪声估计器 ϵ^()\hat{\bm{\epsilon}}(\cdot)over^ start_ARG bold_italic_ϵ end_ARG ( ⋅ ) ,模型遵循去噪规则,逐步从随机样本 𝒂T𝒩(𝟎,𝑰)subscript0\bm{a}_{T}\in\mathcal{N}(\bm{0},\bm{I})bold_italic_a start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∈ caligraphic_N ( bold_0 , bold_italic_I ) 中去除噪声:

στ=\displaystyle{\sigma}_{\tau}=italic_σ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = βτ,\displaystyle\sqrt{{{\beta}_{\tau}}},square-root start_ARG italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG , (4)
α¯τ=\displaystyle\bar{{{\alpha}}}_{\tau}=over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT = i=1ταi,\displaystyle\prod_{i=1}^{\tau}{{{\alpha}}}_{i},∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (5)
𝒂τ1=\displaystyle\bm{a}_{\tau-1}=bold_italic_a start_POSTSUBSCRIPT italic_τ - 1 end_POSTSUBSCRIPT = 1ατ[𝒂τ1ατ1α¯τϵ^τ(𝒐,𝒂τ,τ)]+στϵτ,\displaystyle\frac{1}{\sqrt{\alpha_{\tau}}}\ [\bm{a}_{\tau}-\frac{1-\alpha_{% \tau}}{\sqrt{1-\bar{\alpha}_{\tau}}}\ \hat{\bm{\epsilon}}_{\tau}(\bm{o},\bm{a}% _{\tau},{\tau})]+\sigma_{\tau}\ \bm{\epsilon}_{\tau},divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG end_ARG [ bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT - divide start_ARG 1 - italic_α start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_ARG end_ARG over^ start_ARG bold_italic_ϵ end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ( bold_italic_o , bold_italic_a start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , italic_τ ) ] + italic_σ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT , (6)

where the variance schedule parameters α¯τ\bar{{{\alpha}}}_{\tau}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT and στ{\sigma}_{\tau}italic_σ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT modulate the subtracted noise in each step. After TTitalic_T steps (diffusion horizon) iteration, we obtain a probabilistic reconstructed action 𝒂0\bm{a}_{0}bold_italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. An illustrative example is provided in Sec. IV-B3.
其中,方差调度参数 α¯τsubscript\bar{{{\alpha}}}_{\tau}over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPTστsubscript{\sigma}_{\tau}italic_σ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT 调节每一步中减去的噪声。经过 TTitalic_T 步(扩散范围)迭代后,我们得到一个概率重建动作 𝒂0subscript0\bm{a}_{0}bold_italic_a start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 。第 IV-B3 节提供了一个示例说明。

III-C Impedance Control with Feed-forward Force
III-C 带前馈力的阻抗控制

Consider a torque-controlled robot with nnitalic_n-Degree of Freedom, the second-order rigid body dynamics is written as:
考虑一个具有 nnitalic_n 自由度的扭矩控制机器人,其二阶刚体动力学方程可表示为:

𝑴(𝒒)𝒒¨+𝑪(𝒒,𝒒˙)𝒒˙+𝒈(𝒒)=𝝉m+𝝉ext,\bm{M}(\bm{q})\ddot{\bm{q}}+\bm{C}(\bm{q},\dot{\bm{q}})\dot{\bm{q}}+\bm{g}(\bm% {q})=\bm{\tau}_{m}+\bm{\tau}_{ext},bold_italic_M ( bold_italic_q ) over¨ start_ARG bold_italic_q end_ARG + bold_italic_C ( bold_italic_q , over˙ start_ARG bold_italic_q end_ARG ) over˙ start_ARG bold_italic_q end_ARG + bold_italic_g ( bold_italic_q ) = bold_italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT + bold_italic_τ start_POSTSUBSCRIPT italic_e italic_x italic_t end_POSTSUBSCRIPT , (7)

where 𝒒n\bm{q}\in\mathbb{R}^{n}bold_italic_q ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the joint state. 𝑴(𝒒)n×n\bm{M}(\bm{q})\in\mathbb{R}^{n\times n}bold_italic_M ( bold_italic_q ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT corresponds to the mass matrix, 𝑪(𝒒,𝒒˙)n×n\bm{C}(\bm{q},\dot{\bm{q}})\in\mathbb{R}^{n\times n}bold_italic_C ( bold_italic_q , over˙ start_ARG bold_italic_q end_ARG ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT is the Coriolis matrix and 𝒈(𝒒)n\bm{g}(\bm{q})\in\mathbb{R}^{n}bold_italic_g ( bold_italic_q ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is the gravity vector. The motor torque (control input) and external torque are denoted by 𝝉mn\bm{\tau}_{m}\in\mathbb{R}^{n}bold_italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT and 𝝉extn\bm{\tau}_{ext}\in\mathbb{R}^{n}bold_italic_τ start_POSTSUBSCRIPT italic_e italic_x italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, respectively. The impedance control law with feed-forward force profile is defined as [44]:
其中 𝒒nsuperscript\bm{q}\in\mathbb{R}^{n}bold_italic_q ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 表示关节状态, 𝑴(𝒒)n×nsuperscript\bm{M}(\bm{q})\in\mathbb{R}^{n\times n}bold_italic_M ( bold_italic_q ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT 对应质量矩阵, 𝑪(𝒒,𝒒˙)n×nsuperscript\bm{C}(\bm{q},\dot{\bm{q}})\in\mathbb{R}^{n\times n}bold_italic_C ( bold_italic_q , over˙ start_ARG bold_italic_q end_ARG ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT 为科里奥利矩阵, 𝒈(𝒒)nsuperscript\bm{g}(\bm{q})\in\mathbb{R}^{n}bold_italic_g ( bold_italic_q ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 则是重力向量。电机扭矩(控制输入)与外部扭矩分别由 𝝉mnsubscriptsuperscript\bm{\tau}_{m}\in\mathbb{R}^{n}bold_italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT𝝉extnsubscriptsuperscript\bm{\tau}_{ext}\in\mathbb{R}^{n}bold_italic_τ start_POSTSUBSCRIPT italic_e italic_x italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT 表示。带有前馈力分布的阻抗控制律定义如下[44]:

𝝉m(t)=\displaystyle\bm{\tau}_{m}(t)=bold_italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_t ) = 𝑱(𝒒)𝖳[𝑭ff(t)+𝑲(t)𝒆+𝑫𝒆˙\displaystyle\bm{J}(\bm{q})^{\mathsf{T}}[\bm{F}_{ff}(t)+\bm{K}(t)\bm{e}+\bm{D}% \dot{\bm{e}}bold_italic_J ( bold_italic_q ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT [ bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT ( italic_t ) + bold_italic_K ( italic_t ) bold_italic_e + bold_italic_D over˙ start_ARG bold_italic_e end_ARG (8)
+𝑴(𝒒)𝒙¨d]+𝑪(𝒒,𝒒˙)𝒒˙+𝒈(𝒒),\displaystyle+\bm{M}(\bm{q})\ddot{\bm{x}}_{d}]+\bm{C}(\bm{q},{\dot{\bm{q}}})% \dot{\bm{q}}+\bm{g}(\bm{q}),+ bold_italic_M ( bold_italic_q ) over¨ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ] + bold_italic_C ( bold_italic_q , over˙ start_ARG bold_italic_q end_ARG ) over˙ start_ARG bold_italic_q end_ARG + bold_italic_g ( bold_italic_q ) ,

where 𝑭ff(t)\bm{F}_{ff}(t)bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT ( italic_t ) donates the feed-forward wrench, 𝒙d\bm{x}_{d}bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT is desired trajectory. 𝒙\bm{x}bold_italic_x indicates the robot’s current position. 𝒆=𝒙d𝒙\bm{e}=\bm{x}_{d}-\bm{x}bold_italic_e = bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - bold_italic_x and 𝒆˙=𝒙˙d𝒙˙\dot{\bm{e}}={\dot{\bm{x}}}_{d}-\dot{\bm{x}}over˙ start_ARG bold_italic_e end_ARG = over˙ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - over˙ start_ARG bold_italic_x end_ARG are the position and velocity error, respectively. 𝑲(t)\bm{K}(t)bold_italic_K ( italic_t ) and 𝑫\bm{D}bold_italic_D are stiffness and damping matrices in Cartesian space. 𝑱(𝒒)\bm{J}(\bm{q})bold_italic_J ( bold_italic_q ) represents the robot Jacobian matrix. The internal wrench 𝑭in\bm{F}_{in}bold_italic_F start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT applied by the robot on objects is calculated with:
其中, 𝑭ff(t)subscript\bm{F}_{ff}(t)bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT ( italic_t ) 表示前馈力/力矩, 𝒙dsubscript\bm{x}_{d}bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT 是期望轨迹。 𝒙\bm{x}bold_italic_x 表示机器人的当前位置。 𝒆=𝒙d𝒙subscript\bm{e}=\bm{x}_{d}-\bm{x}bold_italic_e = bold_italic_x start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - bold_italic_x𝒆˙=𝒙˙d𝒙˙subscript\dot{\bm{e}}={\dot{\bm{x}}}_{d}-\dot{\bm{x}}over˙ start_ARG bold_italic_e end_ARG = over˙ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT - over˙ start_ARG bold_italic_x end_ARG 分别是位置误差和速度误差。 𝑲(t)\bm{K}(t)bold_italic_K ( italic_t )𝑫\bm{D}bold_italic_D 是笛卡尔空间中的刚度和阻尼矩阵。 𝑱(𝒒)\bm{J}(\bm{q})bold_italic_J ( bold_italic_q ) 代表机器人雅可比矩阵。机器人施加在物体上的内力/力矩 𝑭insubscript\bm{F}_{in}bold_italic_F start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT 通过以下公式计算:

𝑱binv\displaystyle\bm{J}_{binv}bold_italic_J start_POSTSUBSCRIPT italic_b italic_i italic_n italic_v end_POSTSUBSCRIPT =𝑱body,\displaystyle=\bm{J}_{body}^{{\dagger}},= bold_italic_J start_POSTSUBSCRIPT italic_b italic_o italic_d italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT , (9)
𝑭in\displaystyle\bm{F}_{in}bold_italic_F start_POSTSUBSCRIPT italic_i italic_n end_POSTSUBSCRIPT =𝑱binv𝖳(𝝉m𝑪(𝒒,𝒒˙)𝒒˙𝒈(𝒒)),\displaystyle=\bm{J}_{binv}^{\mathsf{T}}(\bm{\tau}_{m}-\bm{C}\left(\bm{q},\dot% {\bm{q}}\right)\dot{\bm{q}}-\bm{g}\left(\bm{q}\right)),= bold_italic_J start_POSTSUBSCRIPT italic_b italic_i italic_n italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( bold_italic_τ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - bold_italic_C ( bold_italic_q , over˙ start_ARG bold_italic_q end_ARG ) over˙ start_ARG bold_italic_q end_ARG - bold_italic_g ( bold_italic_q ) ) , (10)

where 𝑱binv\bm{J}_{binv}bold_italic_J start_POSTSUBSCRIPT italic_b italic_i italic_n italic_v end_POSTSUBSCRIPT represents the pseudo-inverse of the body Jacobian 𝑱body\bm{J}_{body}bold_italic_J start_POSTSUBSCRIPT italic_b italic_o italic_d italic_y end_POSTSUBSCRIPT, which relates joint velocities to the End-Effector (EE) twist expressed in the body frame (a frame at the EE).
其中 𝑱binvsubscript\bm{J}_{binv}bold_italic_J start_POSTSUBSCRIPT italic_b italic_i italic_n italic_v end_POSTSUBSCRIPT 表示身体雅可比矩阵 𝑱bodysubscript\bm{J}_{body}bold_italic_J start_POSTSUBSCRIPT italic_b italic_o italic_d italic_y end_POSTSUBSCRIPT 的伪逆,该矩阵将关节速度与在身体坐标系(位于末端执行器的坐标系)中表达的末端执行器(EE)扭转相关联。

III-D Dynamic System based Filter
III-D 基于动态系统的滤波器

To solve the frequency misalignment between the diffusion model and the impedance controller with feed-forward force, we interpolate the diffusion model’s output 𝑭df\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT with a dynamic system-based filter, according to the equation:
为解决扩散模型与带前馈力的阻抗控制器之间的频率失配问题,我们依据以下方程,通过基于动态系统的滤波器对扩散模型的输出 𝑭dfsubscript\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT 进行插值处理:

𝑭¨ff=α(β(𝑭df𝑭ff)𝑭˙ff),\ddot{\bm{F}}_{ff}=\alpha(\beta(\bm{F}_{df}-\bm{F}_{ff})-\dot{\bm{F}}_{ff}),over¨ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT = italic_α ( italic_β ( bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT - bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT ) - over˙ start_ARG bold_italic_F end_ARG start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT ) , (11)

where the 𝑭df\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT refers to the raw output of the diffusion model and 𝑭ff\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT indicates the filtered and interpolated 100010001000 Hz feed-forward force to be executed by the controller. The derivative and second-order derivative of 𝑭ff\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT are initialized as zero vectors. α\alphaitalic_α and β\betaitalic_β are two constant scales.222In this work, α\alphaitalic_α and β\betaitalic_β are fixed as 0.90.90.9 and 0.30.30.3, respectively, based on several trials that demonstrated their effectiveness.
在本研究中,经过多次试验验证其有效性后, α\alphaitalic_αβ\betaitalic_β 分别固定为 0.90.90.90.90.30.30.30.3

其中, 𝑭dfsubscript\bm{F}_{df}bold_italic_F start_POSTSUBSCRIPT italic_d italic_f end_POSTSUBSCRIPT 表示扩散模型的原始输出, 𝑭ffsubscript\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT 指代经过滤波和插值处理、由控制器执行的 1000100010001000 Hz 前馈力。 𝑭ffsubscript\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT 的一阶和二阶导数初始化为零向量。 α\alphaitalic_αβ\betaitalic_β 是两个常数比例因子。

IV Experiment  第四部分 实验

To evaluate our proposed method, we designed a set of experiments to: (i) demonstrate the effectiveness of our proposed framework and validate its capability to generalize to novel tasks, (ii) provide a practical guideline for balancing inference ability and speed by evaluating the performance of models with varying sizes, and (iii) showcase the feasibility of our designed dynamic system-based filter to mitigate the frequency misalignment between diffusion model and real-time controller.
为评估我们提出的方法,我们设计了一系列实验,旨在:(一)展示所提框架的有效性,并验证其对新任务的泛化能力;(二)通过评估不同规模模型的性能,为平衡推理能力与速度提供实用指南;(三)展示我们设计的基于动态系统的滤波器在缓解扩散模型与实时控制器间频率失配问题上的可行性。

Refer to caption
Figure 2: Experiment Setup. The object grasped by the robot in the left figure is the training object: (a) Cuboid: A 35 mm × 25 mm × 60 mm dimensional cuboid (0.1 mm clearance). The four objects on the right are applied to validate the transferability: (b) Key: A 37 mm long key; (c) Cyl-S: A 50 mm long cylinder with a diameter of 20 mm (0.02 mm clearance); (d) Cyl-L: A cylinder with a length of 50 mm and diameter of 30 mm (0.025 mm clearance); (e) Prism: A 50 mm long octagonal prism with a side length of 11 mm (0.05 mm clearance).
图 2:实验设置。左图中机器人抓取的物体为训练对象:(a) 长方体:尺寸为 35 毫米×25 毫米×60 毫米的长方体(间隙 0.1 毫米)。右侧四个物体用于验证可转移性:(b) 钥匙:长度为 37 毫米的钥匙;(c) 圆柱-S:长度为 50 毫米,直径为 20 毫米的圆柱体(间隙 0.02 毫米);(d) 圆柱-L:长度为 50 毫米,直径为 30 毫米的圆柱体(间隙 0.025 毫米);(e) 棱柱:长度为 50 毫米,边长为 11 毫米的八边形棱柱(间隙 0.05 毫米)。

IV-A Experiment Setup  IV-A 实验设置

The experiment setup shown in Fig. 2 consists of a Franka Emika Panda robot with 5 tight-clearance insertion objects. The robot is controlled by a PC using Ubuntu 20.04 with Intel i9-10900K CPU and real-time kernel, and the diffusion module is implemented on the PyTorch framework. Training and inference are performed on another PC with NVIDIA RTX 3090 GPU and CUDA Toolkit.
图 2 所示的实验装置包括一台 Franka Emika Panda 机器人及五个紧密间隙插入物体。该机器人由一台搭载 Ubuntu 20.04 操作系统、Intel i9-10900K CPU 及实时内核的 PC 控制,扩散模块在 PyTorch 框架上实现。训练与推理则在另一台配备 NVIDIA RTX 3090 GPU 及 CUDA 工具包的 PC 上进行。

IV-B Data Collection & Training
IV-B 数据收集与训练

Refer to caption
Figure 3: An example view of observations in the dataset.
图 3:数据集中观测值的一个示例视图。
Refer to caption
(a) training loss  训练损失
Refer to caption
(b) validation loss  (b) 验证损失
Figure 4: Training loss and validation loss. Validation is conducted every 5 epochs throughout the training process.
图 4:训练损失与验证损失。在整个训练过程中,每 5 个周期进行一次验证。

IV-B1 Data Collection  IV-B1 数据收集

To train the diffusion model, we collect a comprehensive dataset comprising 1500 expert demonstrations of the assembly task, using the setup shown in Fig. 2. Demonstrations are generated by executing our previous method [31] to perform the insertion task (Cuboid) in various initial poses. The data is recorded at 1000 Hz, resulting in a 24-dimensioned sequence, i.e., an 18-dimensional observation 𝒐\bm{o}bold_italic_o which includes external wrench, internal wrench, and EE’s speed (Fig. 3), paired with corresponding 6-dimensional actions 𝑭ff\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT.
为了训练扩散模型,我们收集了一个包含 1500 次装配任务专家演示的综合数据集,使用图 2 所示的设置。这些演示是通过执行我们之前的方法[31]来生成,以在不同初始姿态下执行插入任务(长方体)。数据以 1000 赫兹的频率记录,生成一个 24 维序列,即一个 18 维的观测值 𝒐\bm{o}bold_italic_o ,其中包括外部力矩、内部力矩和末端执行器的速度(图 3),与相应的 6 维动作 𝑭ffsubscript\bm{F}_{ff}bold_italic_F start_POSTSUBSCRIPT italic_f italic_f end_POSTSUBSCRIPT 配对。

TABLE I: Hyperparameters for Training Diffusion Models
表一:训练扩散模型的超参数
Hyperparameters  超参数 Value  价值
Epoch  纪元 1500
Batch Size  批量大小 4096
Learning Rate  学习率 10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
Diffusion Horizon (TTitalic_T)
扩散视界( TTitalic_T
50
Diffusion Weight (βτ\beta_{\tau}italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT)
扩散权重( βτsubscript\beta_{\tau}italic_β start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT
increased from 104 to 10210^{-4}\text{ to }10^{-2}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT  104 to 102superscript104superscript10210^{-4}\text{ to }10^{-2}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT to 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 增加

IV-B2 Training  IV-B2 训练

There is a trade-off to select the optimal model. Larger models offer stronger inference capabilities, but smaller models provide faster inference speeds that are better suited to our controller. Therefore, an appropriate size is crucial for balancing performance and real-time control requirements, especially in our scenario where computational efficiency is critical.
在选择最优模型时存在权衡。较大的模型提供更强的推理能力,但较小的模型则能提供更快的推理速度,更适合我们的控制器。因此,适当的模型大小对于平衡性能和实时控制需求至关重要,尤其是在我们这种计算效率极为关键的应用场景中。

TABLE II: Details of four Diffusion Models
表 II:四种扩散模型的详细信息
Model Neurons (N)  神经元(N) Final Loss  最终损失 Inference Frequency  推理频率
DF1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 128128128 0.27510.27510.2751 503.8503.8503.8 Hz   503.8503.8503.8503.8 赫兹
DF2DF_{2}italic_D italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 256256256 0.16530.16530.1653 297.5297.5297.5 Hz   297.5297.5297.5297.5 赫兹
DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 512512512 0.07160.07160.0716 141.8141.8141.8 Hz   141.8141.8141.8141.8 赫兹
DF4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT 102410241024 0.02880.02880.0288 51.251.251.2 Hz   51.251.251.251.2 赫兹

To address this problem, we train diffusion models with varying neuron numbers NNitalic_N (highlighted in red in Fig. 1) to provide a practical guideline. 80% of the data is used as training data. Hyperparameters employed in this process are detailed in Table I. Moreover, all trained models were exported to the ONNX format to optimize the inference speed. Table II provides the details of each model. In addition, as shown by the corresponding learning curve in Fig. 4(a), all the candidate models successfully converge within 1,000,000 iteration steps. As the model size increases, there is a clear improvement in accuracy on the training dataset, evidenced by the decreasing final loss. However, larger models also require more computational resources, leading to an evident frequency drop from 503.8503.8503.8 Hz to 51.251.251.2 Hz.
为解决此问题,我们训练了具有不同神经元数量的扩散模型(图 1 中红色高亮部分 NNitalic_N ),以提供实用指导。80%的数据被用作训练数据,此过程中采用的超参数详见表 I。此外,所有训练好的模型均导出为 ONNX 格式,以优化推理速度。表 II 提供了各模型的详细信息。另外,如图 4(a)所示的学习曲线表明,所有候选模型在 1,000,000 次迭代步数内均成功收敛。随着模型规模的增大,训练数据集上的准确率明显提升,最终损失值下降即为明证。然而,更大的模型也需要更多的计算资源,导致频率从 503.8503.8503.8503.8 Hz 显著下降至 51.251.251.251.2 Hz。

Refer to caption
Figure 5: Denoising process with model DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. From the top down, the red curves indicate the change in the diffused actions during the denoising process. The black refers to the corresponding ground truth.
图 5:使用模型 DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 的去噪过程。从上至下,红色曲线表示去噪过程中扩散动作的变化。黑色曲线代表相应的真实值。

IV-B3 Validation  IV-B3 验证

The remaining 20% of the data is used for validation. The validation losses in Fig. 4(b) imply that models have successfully converged without overfitting. Fig. 5 provides an intuitive instance of the denoising process, where the diffusion model reconstructs actions by progressively removing noise from a random Gaussian sample (τ=50\tau=50italic_τ = 50). After 25 backward diffusion steps (τ=25\tau=25italic_τ = 25), the model’s output exhibits a tendency towards the ground truth. By the final step (τ=1\tau=1italic_τ = 1), the model’s prediction closely matches the ground truth.
剩余 20%的数据用于验证。图 4(b)中的验证损失表明,模型已成功收敛且未出现过拟合现象。图 5 直观展示了去噪过程的一个实例,其中扩散模型通过逐步从随机高斯样本( τ=5050\tau=50italic_τ = 50 )中去除噪声来重建动作。经过 25 步反向扩散( τ=2525\tau=25italic_τ = 25 )后,模型输出显示出向真实值靠拢的趋势。至最后一步( τ=11\tau=1italic_τ = 1 ),模型的预测已与真实值高度吻合。

It is noteworthy that the diffusion model successfully inherits the self-adaptability of our previous method, selecting appropriate primitives based on the assembly state. The model performs a wiggle motion to align the object with the hole before 0.90.90.9 s, and to resolve a stuck state from 1.21.21.2 s to 4.24.24.2 s. When the object is properly aligned, it applies a force to push the object into the insertion hole.
值得注意的是,扩散模型成功继承了先前方法的自适应性,能够根据装配状态选择合适的基元。该模型在 0.90.90.90.9 秒前执行摆动动作以使物体与孔对齐,并在 1.21.21.21.2 秒至 4.24.24.24.2 秒期间解决卡住状态。当物体正确对齐时,模型会施加一个力将物体推入插入孔中。

IV-C Real-World Experiment Performance
IV-C 现实世界实验性能

IV-C1 Performance Test  IV-C1 性能测试

In this section, we validate the efficacy of our diffusion models using the experimental setup depicted in Fig. 2. Among all demonstrated policies, we select the best-performing one as our baseline. We evaluate not only the performance of the candidates on the training object but also emphasize their zero-shot transferability to four novel objects.
在本节中,我们利用图 2 所示的实验设置验证了扩散模型的有效性。在所有展示的策略中,我们选择表现最佳的一个作为基线。我们不仅评估了候选模型在训练对象上的性能,还特别强调了它们对四种新对象的零样本迁移能力。

As depicted in Table III, a total of 25 test cases are created by combining the models with various objects. For each case, the model is evaluated on the corresponding task with 50 random initial poses. At each pose, the robot performed two insertion trials to account for variability and reduce the influence of random occurrences. Consequently, the success rate and corresponding execution time are represented in Table III and Fig. 6, respectively.
如表 III 所示,通过将模型与各种对象结合,共创建了 25 个测试案例。每个案例中,模型在对应任务上以 50 个随机初始姿态进行评估。在每个姿态下,机器人执行两次插入试验,以考虑变异性并减少随机事件的影响。因此,成功率和相应的执行时间分别在表 III 和图 6 中展示。

TABLE III: Success Rate [%]
表 III:成功率 [%]
Model trained  训练有素的 novel (zero-shot transfer)
小说(零样本迁移)
Cuboid  长方体 Key  关键 Cyl-S  圆柱-S Cyl-L  圆柱-L Prism  棱镜 Average  平均
DF1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT 90.090.090.0 99.0\bm{99.0}bold_99.0 86.086.086.0 85.085.085.0 40.040.040.0 77.577.577.5
DF2DF_{2}italic_D italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 79.079.079.0 94.094.094.0 87.087.087.0 90.090.090.0 79.079.079.0 87.587.587.5
DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 98.0\bm{98.0}bold_98.0 99.0\bm{99.0}bold_99.0 97.0\bm{97.0}bold_97.0 96.0\bm{96.0}bold_96.0 91.091.091.0 95.7\bm{95.7}bold_95.7
DF4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT 73.073.073.0 85.085.085.0 90.090.090.0 66.066.066.0 85.085.085.0 81.581.581.5
Baseline  基线 92.092.092.0 94.094.094.0 61.061.061.0 82.082.082.0 96.0\bm{96.0}bold_96.0 83.383.383.3
*The highest success rate for each task is highlighted in bold font. The detailed configuration of models DF1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to DF4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT is provided in Table II.
每项任务的最高成功率以**粗体**突出显示。表 II 提供了模型 DF1subscript1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTDF4subscript4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT 的详细配置。
Refer to caption
Figure 6: Execution time. The colored bars represent the median execution time for each model, and the black lines denote their 25th and 75th percentiles.
图 6:执行时间。彩色条形表示每个模型的中位执行时间,黑色线条则标示其第 25 和第 75 百分位数。

According to the Common Industry Format for Usability Test Reports (ISO/IEC 25062:2006), the “core measure of efficiency” is the ratio of the task completion rate to the mean time per task [45]. We use this ratio as the performance metric, to evaluate the performance of comparing models. The results, illustrated by the radar plots in Fig. 7, show that DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT outperforms the baseline on demonstrated tasks in terms of efficiency.
根据《可用性测试报告通用行业格式》(ISO/IEC 25062:2006),“效率的核心衡量标准”是任务完成率与每项任务平均时间的比率[45]。我们采用此比率作为性能指标,以评估比较模型的性能。图 7 中的雷达图所示结果表明, DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 在展示任务上的效率优于基线。

Notably, for novel tasks, all diffusion models achieve over a 10% improvement in efficiency, showcasing excellent zero-shot transferability. Among these models, DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT stands out with the best comprehensive performance on novel tasks, achieving an average success rate of 95.7%.
值得注意的是,对于新任务,所有扩散模型在效率上均实现了超过 10%的提升,展现了卓越的零样本迁移能力。在这些模型中, DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 在新任务上的综合表现尤为突出,平均成功率达到了 95.7%。

Refer to caption
Figure 7: Radar plots for efficiency across different models
图 7:不同模型效率的雷达图

IV-C2 Trade-off between model accuracy and inference speed
IV-C2 模型精度与推理速度之间的权衡

As the model size increases, the model better captures latent relationships within the data, which is reflected in the increasing overall success rate from DF1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, as shown in Table III. However, larger models also experience a significant reduction in inference frequency, which exacerbates the misalignment with the 1000 Hz control loop. As depicted in TableII, DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT maintains an acceptable frequency of 141.8 Hz, whereas DF4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT suffers a dramatic drop to only 51.2 Hz. This extremely low output frequency limits the model’s deployment potential despite its strong inference capability, resulting in an overall significant performance drop. Consequently, DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (with N=512N=512italic_N = 512) is the only model that outperforms the baseline on both demonstrated and novel tasks. It exhibits the most balanced and highest performance across all insertion tasks, achieving a 129.5% improvement in overall performance compared to the baseline.
随着模型规模的增大,模型能更好地捕捉数据中的潜在关系,这体现在从 DF1subscript1DF_{1}italic_D italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTDF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 整体成功率的提升上,如表 III 所示。然而,更大的模型也经历了推理频率的显著下降,这加剧了与 1000 赫兹控制循环的不匹配。如表 II 所示, DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 保持了 141.8 赫兹的可接受频率,而 DF4subscript4DF_{4}italic_D italic_F start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT 则急剧下降至仅 51.2 赫兹。尽管其推理能力强,但极低的输出频率限制了模型的部署潜力,导致整体性能显著下降。因此, DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (带有 N=512512N=512italic_N = 512 )是唯一在展示任务和新任务上均超越基线的模型。它在所有插入任务中展现出最均衡且最高的性能,与基线相比,整体性能提升了 129.5%。

IV-C3 Dymanic system-based filter
IV-C3 基于动态系统的滤波器

Our dynamic system-based filter is designed to address the frequency misalignment issue. To validate its effectiveness, we repeat the identical experiments in Sec. IV-C1 for the diffusion models while disabling the filter in the framework. To distinguish from the previous models (DFxDF_{x}italic_D italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT), these models are represented as DFxNDF_{xN}italic_D italic_F start_POSTSUBSCRIPT italic_x italic_N end_POSTSUBSCRIPT. For ease of comparison, the results are presented in the same figure. As illustrated in Fig. 8, the models with filter assistance achieve higher success rates in 16 out of 20 scenarios, with three unchanged and one decreasing by 6%. Overall, our dynamic system-based filter mitigates the effects of frequency misalignment, leading to a 9.15% increase in success rates.
我们设计的基于动态系统的滤波器旨在解决频率失配问题。为验证其有效性,我们在框架中禁用该滤波器的情况下,对扩散模型重复了第 IV-C1 节中的相同实验。为区别于之前的模型( DFxsubscriptDF_{x}italic_D italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ),这些模型被表示为 DFxNsubscriptDF_{xN}italic_D italic_F start_POSTSUBSCRIPT italic_x italic_N end_POSTSUBSCRIPT 。为便于比较,结果展示在同一图中。如图 8 所示,在 20 个场景中,有 16 个场景下,借助滤波器的模型实现了更高的成功率,其中三个保持不变,一个下降了 6%。总体而言,我们的动态系统滤波器减轻了频率失配的影响,使成功率提高了 9.15%。

Refer to caption
Figure 8: Impact of the dynamical system-based filter on the success rate of high-precision assembly tasks
图 8:基于动力系统的过滤器对高精度装配任务成功率的影响

Moreover, we compare the model’s performance on both demonstrated and novel objects as illustrated in Fig.10. The inclusion of the filter results in enhanced performance across both categories. Besides, a more concrete example is provided in Fig.9, vividly illustrating the effect of our filter on diffusion model outputs. The raw diffusion output, depicted by the black curves, exhibits higher variability and fluctuations in force and torque components. In contrast, the filtered feed-forward force commands, indicated by the red curves, present a smoother profile at 1000 Hz. These results confirm that the filtering process mitigates the frequency misalignment issue.
此外,我们比较了模型在已展示物体和新物体上的性能,如图 10 所示。引入滤波器后,两类物体的性能均得到提升。此外,图 9 提供了一个更具体的例子,生动展示了我们的滤波器对扩散模型输出的影响。黑色曲线描绘的原始扩散输出在力和扭矩分量上表现出更高的变异性和波动性。相比之下,红色曲线表示的滤波前馈力命令在 1000 赫兹下呈现出更为平滑的轮廓。这些结果证实,滤波过程有效缓解了频率失配问题。

Refer to caption
Figure 9: Impact of the filter on diffusion model’s predictions. The red curves represent the filtered feed-forward wrench, while the black curves correspond to the raw outputs from the diffusion model DF3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT.
图 9:滤波器对扩散模型预测的影响。红色曲线表示经过滤波的前馈力矩,而黑色曲线对应于扩散模型 DF3subscript3DF_{3}italic_D italic_F start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 的原始输出。
Refer to caption
(a) demonstrated object  (a) 展示对象
Refer to caption
(b) novel objects  (b) 新颖物体
Figure 10: Model efficiency sorted in descending order.
图 10:按降序排列的模型效率。

V Conclusion  五 结论

In this work, we present a novel framework leveraging diffusion models to generate 6D wrench for tactile manipulation in high-precision robotic assembly tasks. Our approach, being the first force-domain diffusion policy, demonstrated excellent improved zero-shot transferability compared to prior work, by achieving an overall 95.7% success rate in zero-shot transfer in experimental evaluations. Additionally, we investigate the trade-off between accuracy and inference speed and provide a practical guideline for optimal model selection. Further, we address the frequency misalignment between the diffusion policy and the real-time control loop with a dynamic system-based filter, significantly improving the task success rate by 9.15%. Extensive experimental studies in our work underscores the effectiveness of our framework in real-world settings, showcasing a promising approach tackling high-precision tactile manipulation by learning diffusion-based transferable skills from expert policies containing primitive-switching logic. In future work, we will focus on extending the framework’s applicability to a broader range of high-precision assembly tasks and integrating additional sensing modalities to enhance system adaptability and robustness in real-time environments.
在本研究中,我们提出了一种新颖的框架,利用扩散模型生成用于高精度机器人装配任务中触觉操作的 6D 力矩。作为首个力域扩散策略,我们的方法在实验评估中实现了 95.7%的零样本迁移成功率,相较于先前工作展现了卓越的改进。此外,我们探讨了精度与推理速度之间的权衡,并提供了最优模型选择的实用指南。进一步地,我们通过基于动态系统的滤波器解决了扩散策略与实时控制回路之间的频率失配问题,显著提升了任务成功率 9.15%。我们工作中的大量实验研究证实了该框架在现实环境中的有效性,展示了一种通过从包含原始切换逻辑的专家策略中学习基于扩散的可迁移技能,以应对高精度触觉操作挑战的有前景的方法。 在未来的工作中,我们将致力于扩展该框架的适用性,使其能够应对更广泛的高精度装配任务,并整合更多传感模式,以增强系统在实时环境中的适应性和鲁棒性。

References  参考文献

  • [1] D. E. Whitney, Mechanical assemblies: their design, manufacture, and role in product development.   Oxford university press New York, 2004, vol. 1.
    D. E. 惠特尼,《机械装配:其设计、制造及在产品开发中的作用》。纽约牛津大学出版社,2004 年,第 1 卷。
  • [2] K. Nottensteiner, A. Sachtler, and A. Albu-Schäffer, “Towards Autonomous Robotic Assembly: Using Combined Visual and Tactile Sensing for Adaptive Task Execution,” Journal of Intelligent & Robotic Systems, vol. 101, no. 3, p. 49, Mar. 2021.
    K. Nottensteiner, A. Sachtler, 和 A. Albu-Schäffer, “迈向自主机器人装配:利用视觉与触觉融合感知实现自适应任务执行,” 《智能与机器人系统杂志》, 卷 101, 期 3, 页 49, 2021 年 3 月.
  • [3] R. S. Johansson and Å. B. Vallbo, “Tactile sensory coding in the glabrous skin of the human hand,” Trends in neurosciences, vol. 6, pp. 27–32, 1983.
    R. S. Johansson 和 Å. B. Vallbo,“人类手掌无毛皮肤中的触觉感觉编码”,《神经科学趋势》,第 6 卷,第 27-32 页,1983 年。
  • [4] I. Birznieks, P. Jenmalm, A. W. Goodwin, and R. S. Johansson, “Encoding of direction of fingertip forces by human tactile afferents,” Journal of Neuroscience, vol. 21, no. 20, pp. 8222–8237, 2001.
    I. Birznieks, P. Jenmalm, A. W. Goodwin, 和 R. S. Johansson, “人类触觉传入神经对指尖力方向的编码,” 《神经科学杂志》, 卷 21, 期 20, 页 8222–8237, 2001 年。
  • [5] R. Li and H. Qiao, “A survey of methods and strategies for high-precision robotic grasping and assembly tasks—some new trends,” IEEE/ASME Transactions on Mechatronics, vol. 24, no. 6, pp. 2718–2732, 2019.
    R. Li 和 H. Qiao,“高精度机器人抓取与装配任务的方法与策略综述——一些新趋势”,《IEEE/ASME 机电一体化汇刊》,第 24 卷,第 6 期,第 2718–2732 页,2019 年。
  • [6] K. Nottensteiner, A. Sachtler, and A. Albu-Schäffer, “Towards autonomous robotic assembly: Using combined visual and tactile sensing for adaptive task execution,” Journal of Intelligent & Robotic Systems, vol. 101, no. 3, p. 49, 2021.
    K. Nottensteiner, A. Sachtler, 和 A. Albu-Schäffer, “迈向自主机器人装配:利用视觉与触觉融合感知实现自适应任务执行,” 《智能与机器人系统杂志》, 卷 101, 期 3, 页码 49, 2021 年。
  • [7] I. Hirochika, “Force Feedback in Precise Assembly Tasks,” in AI Memos.   MIT, 1974.
    一. 广力,“精确装配任务中的力反馈”,载于《AI 备忘录》。麻省理工学院,1974 年。
  • [8] H. Chen, G. Zhang, H. Zhang, and T. A. Fuhlbrigge, “Integrated robotic system for high precision assembly in a semi-structured environment,” Assembly Automation, vol. 27, no. 3, pp. 247–252, 2007.
    H. Chen, G. Zhang, H. Zhang, 和 T. A. Fuhlbrigge, “半结构化环境中高精度装配的集成机器人系统,” 装配自动化, 卷 27, 编号 3, 页码 247–252, 2007.
  • [9] H. Chen, J. Wang, G. Zhang, T. Fuhlbrigge, and S. Kock, “High-precision assembly automation based on robot compliance,” The International Journal of Advanced Manufacturing Technology, vol. 45, no. 9, pp. 999–1006, 2009.
  • [10] T. Inoue, G. De Magistris, A. Munawar, T. Yokoya, and R. Tachibana, “Deep reinforcement learning for high precision assembly tasks,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2017, pp. 819–825.
  • [11] L. Johannsmeier, M. Gerchow, and S. Haddadin, “A framework for robot manipulation: Skill formalism, meta learning and adaptive control,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 5844–5850.
  • [12] J. Luo, E. Solowjow, C. Wen, J. A. Ojea, A. M. Agogino, A. Tamar, and P. Abbeel, “Reinforcement learning on variable impedance controller for high-precision robotic assembly,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 3080–3087.
  • [13] C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, and K. Harada, “Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach,” Applied Sciences, vol. 10, no. 19, p. 6923, 2020.
  • [14] S. Haddadin and E. Shahriari, “Unified force-impedance control,” The International Journal of Robotics Research, p. 02783649241249194, Jul. 2024.
  • [15] D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu et al., “Palm-e: An embodied multimodal language model,” in International Conference on Machine Learning.   PMLR, 2023, pp. 8469–8488.
  • [16] A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn et al., “Rt-1: Robotics transformer for real-world control at scale,” arXiv preprint arXiv:2212.06817, 2022.
  • [17] A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” arXiv preprint arXiv:2307.15818, 2023.
  • [18] M. Shridhar, L. Manuelli, and D. Fox, “Perceiver-actor: A multi-task transformer for robotic manipulation,” in Conference on Robot Learning.   PMLR, 2023, pp. 785–799.
  • [19] A. O’Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain et al., “Open x-embodiment: Robotic learning datasets and rt-x models: Open x-embodiment collaboration 0,” in 2024 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2024, pp. 6892–6903.
  • [20] A. Goyal, V. Blukis, J. Xu, Y. Guo, Y.-W. Chao, and D. Fox, “RVT-2: Learning Precise Manipulation from Few Demonstrations,” in Robotics: Science and Systems, Jun. 2024.
  • [21] T. Pearce, T. Rashid, A. Kanervisto, D. Bignell, M. Sun, R. Georgescu, S. V. Macua, S. Z. Tan, I. Momennejad, K. Hofmann et al., “Imitating human behaviour with diffusion models,” in Deep Reinforcement Learning Workshop NeurIPS, 2022.
  • [22] J. Carvalho, A. T. Le, M. Baierl, D. Koert, and J. Peters, “Motion planning diffusion: Learning and planning of robot motions with diffusion models,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 1916–1923.
  • [23] I. Kapelyukh, V. Vosylius, and E. Johns, “Dall-e-bot: Introducing web-scale diffusion models to robotics,” IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 3956–3963, 2023.
  • [24] U. A. Mishra, S. Xue, Y. Chen, and D. Xu, “Generative skill chaining: Long-horizon skill planning with diffusion models,” in Conference on Robot Learning.   PMLR, 2023, pp. 2905–2925.
  • [25] K. Black, M. Nakamoto, P. Atreya, H. Walke, C. Finn, A. Kumar, and S. Levine, “Zero-shot robotic manipulation with pretrained image-editing diffusion models,” arXiv preprint arXiv:2310.10639, 2023.
  • [26] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” Robotics: Science and Systems, 2023.
  • [27] M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal-conditioned imitation learning using score-based diffusion policies,” Robotics: Science and Systems, 2023.
  • [28] U. A. Mishra and Y. Chen, “Reorientdiff: Diffusion model based reorientation for object manipulation,” in 2024 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2024, pp. 10 867–10 873.
  • [29] A. Sridhar, D. Shah, C. Glossop, and S. Levine, “Nomad: Goal masked diffusion policies for navigation and exploration,” in 2024 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2024, pp. 63–70.
  • [30] P. Li, Z. Li, H. Zhang, and J. Bian, “On the generalization properties of diffusion models,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [31] Y. Wu, F. Wu, L. Chen, K. Chen, S. Schneider, L. Johannsmeier, Z. Bing, F. J. Abu-Dakka, A. Knoll, and S. Haddadin, “1 khz behavior tree for self-adaptable tactile insertion,” in 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 16 002–16 008.
  • [32] O. Spector and D. Di Castro, “Insertionnet-a scalable solution for insertion,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 5509–5516, 2021.
  • [33] O. Spector, V. Tchuiev, and D. Di Castro, “Insertionnet 2.0: Minimal contact multi-step insertion using multimodal multiview sensory input,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6330–6336.
  • [34] G. Schoettler, A. Nair, J. A. Ojea, S. Levine, and E. Solowjow, “Meta-reinforcement learning for robotic industrial insertion tasks,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2020, pp. 9728–9735.
  • [35] T. Z. Zhao, J. Luo, O. Sushkov, R. Pevceviciute, N. Heess, J. Scholz, S. Schaal, and S. Levine, “Offline meta-reinforcement learning for industrial insertion,” in 2022 international conference on robotics and automation (ICRA).   IEEE, 2022, pp. 6386–6393.
  • [36] B. Tang, M. A. Lin, I. Akinola, A. Handa, G. S. Sukhatme, F. Ramos, D. Fox, and Y. Narang, “Industreal: Transferring contact-rich assembly tasks from simulation to reality,” arXiv preprint arXiv:2305.17110, 2023.
  • [37] B. Tang, I. Akinola, J. Xu, B. Wen, A. Handa, K. Van Wyk, D. Fox, G. S. Sukhatme, F. Ramos, and Y. Narang, “Automate: Specialist and generalist assembly policies over diverse geometries,” arXiv preprint arXiv:2407.08028, 2024.
  • [38] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
  • [39] M. Kollovieh, A. F. Ansari, M. Bohlke-Schneider, J. Zschiegner, H. Wang, and Y. B. Wang, “Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  • [40] A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International conference on machine learning.   PMLR, 2021, pp. 8162–8171.
  • [41] Y. Yang, M. Jin, H. Wen, C. Zhang, Y. Liang, L. Ma, Y. Wang, C. Liu, B. Yang, Z. Xu et al., “A survey on diffusion models for time series and spatio-temporal data,” arXiv preprint arXiv:2404.18886, 2024.
  • [42] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14.   Springer, 2016, pp. 630–645.
  • [43] Y. Li, N. Miao, L. Ma, F. Shuang, and X. Huang, “Transformer for object detection: Review and benchmark,” Engineering Applications of Artificial Intelligence, vol. 126, p. 107021, 2023.
  • [44] C. Yang, G. Ganesh, S. Haddadin, S. Parusel, A. Albu-Schaeffer, and E. Burdet, “Human-like adaptation of force and impedance in stable and unstable interactions,” IEEE transactions on robotics, vol. 27, no. 5, pp. 918–930, 2011.
  • [45] B. Albert and T. Tullis, Measuring the user experience: collecting, analyzing, and presenting usability metrics.   Newnes, 2013.
原文
请对此翻译评分
您的反馈将用于改进谷歌翻译