Human Movement Science 人体运动学
Volume 30, Issue 5, October 2011, Pages 846-868
第 30 卷,第 5 期,2011 年 10 月,第 846-868 页
第 30 卷,第 5 期,2011 年 10 月,第 846-868 页
Neuro-cognitive mechanisms of decision making in joint action: A human–robot interaction study
共同行动决策的神经认知机制:人机互动研究
SCI基础版 医学4区IF 1.6 如果 1.6PsycINFO Classification 心理信息分类
2340
3040
4140
4160
Keywords 关键词
Motor planning
Decision making
Dynamic neural fields
Joint action
运动规划决策动态神经场联合行动
1. Introduction 1. 简介
As an exquisitely social species humans are experts in cooperating with others when trying to achieve the goals of a common task (Sebanz, Bekkering, & Knoblich, 2006). In our everyday social interactions we continuously monitor the actions of our partners, interpret them in terms of their outcomes and adapt our own motor behavior accordingly. Imagine for instance the joint action task of preparing a dinner table. The way a co-actor grasps a certain object (e.g., a coffee cup) or the context in which the motor act is executed (e.g., the cup may be empty or full) transmits to the observer important information about the co-actor’s intention. Depending on the grip type, for instance, she/he may want to place the cup on the table or, alternatively, may have the intention to hand it over. Knowing what the other is going to do should facilitate motor programmes in the observer that serve the achievement of shared goals. Fluent and efficient coordination of actions among co-actors in a familiar task requires that the preparation of an adequate complementary action is a rather automatic and unconscious process. Since in sufficiently complex situations several possible complementary behaviors may exist this process necessarily includes a decision-making operation.
作为一个社交性极强的物种,人类在试图实现共同任务的目标时擅长与他人合作(Sebanz, Bekkering, & Knoblich, 2006)。在我们日常的社交互动中,我们不断监控我们的合作伙伴的行为,根据其结果进行解释,并相应调整自己的运动行为。比如,准备餐桌就是一个联合行动任务。合作伙伴抓取某个物品(如咖啡杯)的方式,或执行动作的背景(杯子可能是空的或满的),会向观察者传达关于合作伙伴意图的重要信息。根据抓取类型,他/她可能想把杯子放在桌子上,或者打算递给别人。了解对方将要做什么,应该有助于观察者产生有助于实现共同目标的运动程序。在熟悉的任务中,协调参与者之间的流畅高效的行动,需要产生适当的补充性行动,这是一个相当自动化和无意识的过程。因为在复杂的情况下可能存在多种可能的补充行为,所以这个过程必然包括一个决策操作。
作为一个社交性极强的物种,人类在试图实现共同任务的目标时擅长与他人合作(Sebanz, Bekkering, & Knoblich, 2006)。在我们日常的社交互动中,我们不断监控我们的合作伙伴的行为,根据其结果进行解释,并相应调整自己的运动行为。比如,准备餐桌就是一个联合行动任务。合作伙伴抓取某个物品(如咖啡杯)的方式,或执行动作的背景(杯子可能是空的或满的),会向观察者传达关于合作伙伴意图的重要信息。根据抓取类型,他/她可能想把杯子放在桌子上,或者打算递给别人。了解对方将要做什么,应该有助于观察者产生有助于实现共同目标的运动程序。在熟悉的任务中,协调参与者之间的流畅高效的行动,需要产生适当的补充性行动,这是一个相当自动化和无意识的过程。因为在复杂的情况下可能存在多种可能的补充行为,所以这个过程必然包括一个决策操作。
The long-term goal of our research group is to build robots that are able to interact with users in the same way as humans interact with each other in common tasks. Our research strategy to achieve this challenging objective is to develop and test control architectures that are strongly inspired by neuro-cognitive mechanisms underlying human joint action. We believe that implementing a human-like interaction model in an autonomous robot will greatly increase the user’s acceptance to work with an artificial agent since the co-actors will become more predictable for each other (for a survey of challenges for socially interactive robots see Fong, Nourbakhsh, and Dautenhahn (2003)). Such an interdisciplinary approach constitutes not only a promising line of research towards human-centered robots but also offers unique possibilities for researchers from neuroscience and cognitive science. Synthesizing cooperative behavior in an artificial but naturally inspired cognitive system allows them in principle to test their theories and hypothesis about the mechanisms supporting social interactions (Dominey & Warneken, in press).
我们研究小组的长期目标是建立能够像人类之间相互交互一样与用户进行互动的机器人。为实现这一具有挑战性的目标,我们的研究策略是开发和测试受人类协作行为背后神经认知机制强烈启发的控制架构。我们相信,在自主机器人中实施人类式的交互模型将大大提高用户与人工智能代理合作的接受度,因为互动参与者会变得更加可预测(有关社交互动机器人挑战的综述,请参见 Fong、Nourbakhsh 和 Dautenhahn, 2003)。这种跨学科的方法不仅是迈向以人为中心的机器人的前景可观的研究方向,也为神经科学和认知科学的研究人员提供了独特的可能性。在一个人工但自然启发的认知系统中合成合作行为,他们原则上可以测试关于支持社会互动机制的理论和假设(Dominey 和 Warneken,即将出版)。
我们研究小组的长期目标是建立能够像人类之间相互交互一样与用户进行互动的机器人。为实现这一具有挑战性的目标,我们的研究策略是开发和测试受人类协作行为背后神经认知机制强烈启发的控制架构。我们相信,在自主机器人中实施人类式的交互模型将大大提高用户与人工智能代理合作的接受度,因为互动参与者会变得更加可预测(有关社交互动机器人挑战的综述,请参见 Fong、Nourbakhsh 和 Dautenhahn, 2003)。这种跨学科的方法不仅是迈向以人为中心的机器人的前景可观的研究方向,也为神经科学和认知科学的研究人员提供了独特的可能性。在一个人工但自然启发的认知系统中合成合作行为,他们原则上可以测试关于支持社会互动机制的理论和假设(Dominey 和 Warneken,即将出版)。
The focus of this paper is on flexible action planning and decision formation in cooperative human–robot interactions that take into account the inferred goal of the co-actor and other task constraints. An impressive range of experimental evidence accumulated over the last two decades supports the notion that a close perception-action linkage provides a basic mechanism for real-time social interactions (Newman-Norlund, Noordzij et al., 2007, Wilson and Knoblich, 2005). A key idea is that action observation leads to an automatic activation of motor representations that are associated with the execution of the observed action. It has been suggested that this resonance of motor structures supports an action understanding capacity (Blakemore and Decety, 2001, Fogassi et al., 2005, Rizzolatti et al., 2001). By internally simulating action consequences using his own motor repertoire the observer may predict the consequences of others’ actions. Direct physiological evidence for such a perception-action matching system came with the discovery of the mirror neurons first described in premotor cortex of macaque monkey (for a review see Rizzolatti and Craighero (2004)). Mirror neurons are a particular class of visuomotor neurons that are active both during the observation of goal-directed actions such as reaching, grasping, holding or placing an object and during the execution of the same class of actions. Although action understanding is the dominant hypothesis about the functional role of the motor resonance mechanism it has been suggested that it may also contribute to motor planning and action preparation. Typically it is assumed that a direct activation of the corresponding motor program explains the evidence found in many behavioral experiments for a tendency of an automatic imitation of observed actions (e.g., Brass, Bekkering, & Prinz, 2001, for a review see Wilson & Knoblich, 2005). Such a tendency is of course not beneficial for cooperative joint action which normally requires the facilitation of a complementary motor behavior. Recent findings in neuroimaging and behavioral studies provide evidence however that goal and context representations may link an observed action to a different but functionally related motor response (Newman-Norlund, van Schie et al., 2007, van Schie et al., 2008). These studies clearly demonstrate that the mapping between action observation and action execution is much more flexible than previously thought.
本文关注于灵活的行动规划和在合作的人机互动中的决策形成,同时考虑到共同参与者的目标推断和其他任务约束。过去 20 年积累的大量实验证据支持了这一观点,即紧密的感知-动作联系提供了实时社交互动的基本机制(Newman-Norlund, Noordzij 等人,2007 年;Wilson 和 Knoblich,2005 年)。一个关键思想是,对行动的观察会自动激活与观察到的行动执行相关联的运动表征。有人认为,这种运动结构的共鸣支持了行动理解能力(Blakemore 和 Decety,2001 年;Fogassi 等人,2005 年;Rizzolatti 等人,2001 年)。通过使用自己的运动库内部模拟行动后果,观察者可以预测他人行动的后果。关于这种感知-动作匹配系统的直接生理证据来自于在猕猴运动前皮质首次发现的镜像神经元(参见 Rizzolatti 和 Craighero,2004 年)。镜像神经元是一类特殊的视运动神经元,在观察包括伸手、握取、持握或放置物体在内的目标导向行动以及执行同类行动时都会激活。尽管行动理解是关于运动共鸣机制功能作用的主导假说,但它也可能有助于运动规划和行动准备。通常认为,相应运动程序的直接激活解释了许多行为实验中发现的自动模仿观察到的行动的倾向性(例如,Brass、Bekkering 和 Prinz,2001 年;Wilson 和 Knoblich,2005 年综述)。 这种倾向当然不利于需要互补运动行为配合的合作共同行动。然而,近期的神经成像和行为研究结果表明,目标和背景表征可能将观察到的动作链接到不同但功能相关的运动反应(Newman-Norlund, van Schie et al., 2007, van Schie et al., 2008)。这些研究清楚地证明,动作观察和动作执行之间的映射要比以前认为的更加灵活。
本文关注于灵活的行动规划和在合作的人机互动中的决策形成,同时考虑到共同参与者的目标推断和其他任务约束。过去 20 年积累的大量实验证据支持了这一观点,即紧密的感知-动作联系提供了实时社交互动的基本机制(Newman-Norlund, Noordzij 等人,2007 年;Wilson 和 Knoblich,2005 年)。一个关键思想是,对行动的观察会自动激活与观察到的行动执行相关联的运动表征。有人认为,这种运动结构的共鸣支持了行动理解能力(Blakemore 和 Decety,2001 年;Fogassi 等人,2005 年;Rizzolatti 等人,2001 年)。通过使用自己的运动库内部模拟行动后果,观察者可以预测他人行动的后果。关于这种感知-动作匹配系统的直接生理证据来自于在猕猴运动前皮质首次发现的镜像神经元(参见 Rizzolatti 和 Craighero,2004 年)。镜像神经元是一类特殊的视运动神经元,在观察包括伸手、握取、持握或放置物体在内的目标导向行动以及执行同类行动时都会激活。尽管行动理解是关于运动共鸣机制功能作用的主导假说,但它也可能有助于运动规划和行动准备。通常认为,相应运动程序的直接激活解释了许多行为实验中发现的自动模仿观察到的行动的倾向性(例如,Brass、Bekkering 和 Prinz,2001 年;Wilson 和 Knoblich,2005 年综述)。 这种倾向当然不利于需要互补运动行为配合的合作共同行动。然而,近期的神经成像和行为研究结果表明,目标和背景表征可能将观察到的动作链接到不同但功能相关的运动反应(Newman-Norlund, van Schie et al., 2007, van Schie et al., 2008)。这些研究清楚地证明,动作观察和动作执行之间的映射要比以前认为的更加灵活。
Here we present a dynamic model that implements such a flexible perception-action linkage as a means to achieve an efficient coordination of actions and decisions between co-actors in a joint action task. We report results of our ongoing evaluation of the model as part of the control architecture of a humanoid robot that assembles together with a human user a toy object from its components (Bicho et al., 2008, Bicho et al., 2009).
我们在此提出一个动态模型,该模型实现了这种灵活的知觉-行动联系,作为在联合行动任务中实现行动和决策之间有效协调的一种方式。我们报告了作为人形机器人控制体系的一部分对该模型进行的持续评估结果,该人形机器人与人类用户共同组装玩具零件(Bicho et al., 2008, Bicho et al., 2009)。
我们在此提出一个动态模型,该模型实现了这种灵活的知觉-行动联系,作为在联合行动任务中实现行动和决策之间有效协调的一种方式。我们报告了作为人形机器人控制体系的一部分对该模型进行的持续评估结果,该人形机器人与人类用户共同组装玩具零件(Bicho et al., 2008, Bicho et al., 2009)。
The model is based on the theoretical framework of Dynamic Neural Fields (DNFs) that has been originally proposed to explain the firing patterns of neural populations in the cortex (Amari, 1977, Wilson and Cowan, 1973; see also Grossberg (1973) for a related approach). The architecture of this model family reflects the hypothesis that strong recurrent interactions in local pools of neurons form a basic mechanism of cortical information processing. DNFs have been first introduced into the motor domain as neuro-inspired models of sensorimotor decisions in simple reaching and saccadic eye movement tasks (Erlhagen and Schöner, 2002, Schöner et al., 1997, Wilmzig et al., 2006). In these applications, the dynamic fields represent parameters of the movement such as for instance extent and direction. The neural activation patterns encoding these parameters evolve continuously in time under the influence of inputs representing sensory evidence and prior task knowledge but are mainly shaped by the interplay of local-excitatory and long-range inhibitory interactions within the population. Due to the recurrent interactions, the patterns may become self-stabilized in the absence of any external input. Such stable states of the field dynamics reflect decisions between multiple movement alternatives since a competition process mediated by lateral inhibition leads to a suppression of activation in neural pools that are less supported by input from external sources.
该模型基于动态神经场(DNF)理论框架,这一框架最初被提出用于解释大脑皮质中神经细胞群的放电模式(Amari,1977;Wilson 和 Cowan,1973;参见 Grossberg(1973)的相关方法)。这一模型家族的架构反映了一个假设,即神经元局部池中强大的递归相互作用构成了皮质信息处理的基本机制。DNF 首次应用于运动领域,作为简单到达和眼球运动任务中感知运动决策的神经启发式模型(Erlhagen 和 Schöner,2002;Schöner 等人,1997;Wilmzig 等人,2006)。在这些应用中,动态场表示运动参数,如运动幅度和方向。编码这些参数的神经激活模式在感觉证据和先前任务知识的输入影响下随时间连续演化,但主要由局部兴奋和远程抑制相互作用 within 群体内塑造。由于递归相互作用,在没有任何外部输入的情况下,这些模式可能会保持稳定态。场动力学的这种稳定状态反映了在多种运动选择之间的决策,因为通过侧抑制介导的竞争过程导致那些缺乏外部输入支持的神经池激活被抑制。
该模型基于动态神经场(DNF)理论框架,这一框架最初被提出用于解释大脑皮质中神经细胞群的放电模式(Amari,1977;Wilson 和 Cowan,1973;参见 Grossberg(1973)的相关方法)。这一模型家族的架构反映了一个假设,即神经元局部池中强大的递归相互作用构成了皮质信息处理的基本机制。DNF 首次应用于运动领域,作为简单到达和眼球运动任务中感知运动决策的神经启发式模型(Erlhagen 和 Schöner,2002;Schöner 等人,1997;Wilmzig 等人,2006)。在这些应用中,动态场表示运动参数,如运动幅度和方向。编码这些参数的神经激活模式在感觉证据和先前任务知识的输入影响下随时间连续演化,但主要由局部兴奋和远程抑制相互作用 within 群体内塑造。由于递归相互作用,在没有任何外部输入的情况下,这些模式可能会保持稳定态。场动力学的这种稳定状态反映了在多种运动选择之间的决策,因为通过侧抑制介导的竞争过程导致那些缺乏外部输入支持的神经池激活被抑制。
Here we apply the basic concepts of DNF-models to address the problem of selecting a specific action among a set of possible complementary behaviors. The competing neural populations in the model thus encode entire object-directed motor acts like reaching, grasping, placing or combinations of these motor primitives. The hight level of abstraction of the neural representations fits well to the fundamental properties of most mirror neurons that encode the goal of an action (e.g., the effector interacts with the object in an efficient way) independent of the fine details of the movement kinematics (Rizzolatti & Craighero, 2004). Of particular interest is a class of mirror neurons that reveals a broad matching between action observation and execution (e.g., involving different effector and/or postures), which could in principle support a flexible perception-action coupling in cooperative settings (Newman-Norlund, van Schie et al., 2007).
我们在这里将 DNF 模型的基本概念应用于解决在一组可能的互补行为中选择特定行动的问题。模型中竞争的神经群体因此编码整个面向对象的运动动作,如伸手、抓取、放置或这些运动基元的组合。神经表征的高度抽象程度很好地适合了大多数镜像神经元的基本特性,这些神经元编码动作的目标(例如,执行器以高效的方式与对象互动),而不依赖于运动动力学的细节(Rizzolatti & Craighero, 2004)。特别值得关注的是一类镜像神经元,它们在动作观察和执行之间显示广泛的匹配(例如,涉及不同的执行器和/或姿势),这可能原则上支持在合作环境中的灵活感知-行动耦合(Newman-Norlund, van Schie et al., 2007)。
我们在这里将 DNF 模型的基本概念应用于解决在一组可能的互补行为中选择特定行动的问题。模型中竞争的神经群体因此编码整个面向对象的运动动作,如伸手、抓取、放置或这些运动基元的组合。神经表征的高度抽象程度很好地适合了大多数镜像神经元的基本特性,这些神经元编码动作的目标(例如,执行器以高效的方式与对象互动),而不依赖于运动动力学的细节(Rizzolatti & Craighero, 2004)。特别值得关注的是一类镜像神经元,它们在动作观察和执行之间显示广泛的匹配(例如,涉及不同的执行器和/或姿势),这可能原则上支持在合作环境中的灵活感知-行动耦合(Newman-Norlund, van Schie et al., 2007)。
The DNF-model of joint action extends our previous modeling work on action understanding and goal-directed imitation (Erlhagen et al., 2006a, Erlhagen et al., 2006b, Erlhagen et al., 2007). It consists of a multi-layered network of reciprocally connected neural populations that represent in their activation patterns specific task-relevant information. Decision making in the joint assembly task is implemented as a dynamic process that continuously integrates over time information about the inferred goal of the co-actor (obtained through motor simulation), shared knowledge about what the two actors should do (construction plan), and contextual information (e.g., the spatial distribution of objects in the working area). For generating the overt motor behavior of the robot in the joint construction scenario, we apply a posture-based motor planning and execution model. It translates the abstract decision about a complementary action (e.g., grasping an object to hold it out for the partner) into a realistic, collision-free trajectory.
联合行动的 DNF 模型扩展了我们先前对行动理解和有目标的模仿建模工作(Erlhagen et al., 2006a, Erlhagen et al., 2006b, Erlhagen et al., 2007)。它由互相连接的神经群体组成的多层网络,其激活模式代表任务相关的特定信息。在联合装配任务中,决策制定被实现为一个动态过程,持续整合协作伙伴的推断目标(通过运动模拟获得)、关于两个参与者应该做什么的共享知识(建造计划)以及情景信息(如工作区域内物体的空间分布)。为了生成机器人在联合建造场景中的明显运动行为,我们应用了基于姿势的运动规划和执行模型。它将关于补充性动作(如抓取一个物体以供伙伴使用)的抽象决策转化为现实的、无碰撞的轨迹。
联合行动的 DNF 模型扩展了我们先前对行动理解和有目标的模仿建模工作(Erlhagen et al., 2006a, Erlhagen et al., 2006b, Erlhagen et al., 2007)。它由互相连接的神经群体组成的多层网络,其激活模式代表任务相关的特定信息。在联合装配任务中,决策制定被实现为一个动态过程,持续整合协作伙伴的推断目标(通过运动模拟获得)、关于两个参与者应该做什么的共享知识(建造计划)以及情景信息(如工作区域内物体的空间分布)。为了生成机器人在联合建造场景中的明显运动行为,我们应用了基于姿势的运动规划和执行模型。它将关于补充性动作(如抓取一个物体以供伙伴使用)的抽象决策转化为现实的、无碰撞的轨迹。
The paper is organized as follows: Section 2 introduces the joint construction task and briefly describes the robotic platform used for the human–robot interaction experiments. Section 3 gives an overview of the motor planning model that we have used to generate the overt behavior of the robotic arm and hand. Section 4 gives an overview of the model for decision making in joint action. Section 5 presents the basic concepts of the Dynamic Neural Field framework and summarizes the DNF-based implementation of the joint action model. The selection of appropriate complementary actions in different joint action contexts are described in the results Section 6. The paper ends with a discussion of results, concepts and future research.
本文的组织结构如下:第 2 节介绍了联合构建任务,并简要描述了用于人机交互实验的机器人平台。第 3 节概述了用于生成机械臂和手的明显行为的运动规划模型。第 4 节概述了联合行动决策模型。第 5 节介绍了动态神经场框架的基本概念,并总结了基于动态神经场的联合行动模型的实现。第 6 节结果部分描述了在不同联合行动情境下选择合适的补充行动。最后,本文讨论了结果、概念和未来研究。
本文的组织结构如下:第 2 节介绍了联合构建任务,并简要描述了用于人机交互实验的机器人平台。第 3 节概述了用于生成机械臂和手的明显行为的运动规划模型。第 4 节概述了联合行动决策模型。第 5 节介绍了动态神经场框架的基本概念,并总结了基于动态神经场的联合行动模型的实现。第 6 节结果部分描述了在不同联合行动情境下选择合适的补充行动。最后,本文讨论了结果、概念和未来研究。
2. Joint construction task
2. 联合建设任务
To validate the dynamic field model of joint action we have chosen a task in which a robot collaborates with a human in constructing a toy object from components that are initially distributed on a table (Fig. 1).
为了验证联合行动的动态场模型,我们选择了一项任务,机器人与人类合作从分散在桌子上的组件构建玩具物品(图 1)。
为了验证联合行动的动态场模型,我们选择了一项任务,机器人与人类合作从分散在桌子上的组件构建玩具物品(图 1)。
The task requires only a limited number of different motor behaviors to be performed by the team but is complex enough to show the impact of goal inference, shared task knowledge and context on action selection. The components that have to be manipulated by the robot were designed to limit the workload for the vision and the motor system of the robot. The toy object consists of a round platform with an axle on which two wheels have to be attached and fixed with a nut. Subsequently, 4 columns have to be plugged into holes in the platform. The placing of another round object on top of the columns finishes the task. It is assumed that each teammate is responsible to assemble one side of the toy. Since the working areas of the human and the robot do not overlap, the spatial distribution of components on the table obliges the team to coordinate in addition to handing-over sequences. It is further assumed that both partners know the construction plan and keep track of the subtasks which have been already completed by the team. As part of the dynamic control architecture, the plan is implemented based on the concepts of the dynamic field theory. Hand-designed connections between populations encoding subsequent subtasks define the logical order of the assembly work (for more details see Section 4). Since the desired end state does not uniquely define this order, at each stage of the construction the execution of several subtasks may be simultaneously possible. The main challenge for the team is thus to efficiently coordinate in space and time the decision about actions to be performed by each of the co-actors.
该任务只需要团队执行有限数量的不同运动行为,但复杂程度足以显示目标推断、共享任务知识和上下文对动作选择的影响。机器人需要操纵的部件被设计成限制机器人视觉和运动系统的工作量。玩具物品由一个带轴的圆形平台组成,需要安装并用螺母固定两个轮子。随后,需要将 4 个柱子插入平台上的孔中。最后在柱子顶端放置另一个圆形物品即完成任务。假定每个队友负责组装玩具的一侧。由于人类和机器人的工作区域不重叠,桌面上的部件分布要求团队除了交接动作序列外,还需要进行协调。还假定双方都知道组装计划并跟踪团队已完成的子任务。作为动态控制架构的一部分,计划是基于动态场理论概念实施的。编码后续子任务的种群之间的手动设计连接定义了装配工作的逻辑顺序(更多细节见第 4 节)。由于期望的最终状态不能唯一定义此顺序,因此在构建的每个阶段,可能同时执行多个子任务。因此,团队的主要挑战是有效地协调空间和时间上的动作决策。
该任务只需要团队执行有限数量的不同运动行为,但复杂程度足以显示目标推断、共享任务知识和上下文对动作选择的影响。机器人需要操纵的部件被设计成限制机器人视觉和运动系统的工作量。玩具物品由一个带轴的圆形平台组成,需要安装并用螺母固定两个轮子。随后,需要将 4 个柱子插入平台上的孔中。最后在柱子顶端放置另一个圆形物品即完成任务。假定每个队友负责组装玩具的一侧。由于人类和机器人的工作区域不重叠,桌面上的部件分布要求团队除了交接动作序列外,还需要进行协调。还假定双方都知道组装计划并跟踪团队已完成的子任务。作为动态控制架构的一部分,计划是基于动态场理论概念实施的。编码后续子任务的种群之间的手动设计连接定义了装配工作的逻辑顺序(更多细节见第 4 节)。由于期望的最终状态不能唯一定义此顺序,因此在构建的每个阶段,可能同时执行多个子任务。因此,团队的主要挑战是有效地协调空间和时间上的动作决策。
For the experiments we used a robot built in our lab (Silva, Bicho, & Erlhagen, 2008). It consists of a stationary torus on which an arm with 7 degrees of freedom and a 3-fingered hand, and a stereo camera head are mounted. A speech synthesizer allows the robot to communicate the result of its reasoning and decision processes to the human user. The information about object type, position and pose as well as about the state of the construction is provided by the camera system. The object recognition combines color-based segmentation with template matching derived from earlier learning examples (Westphal, von der Malsburg, & Würtz, 2008). The same technique is also used for the classification of object-directed, static hand postures such as grasping and communicative gestures such as pointing or demanding an object.
我们在实验中使用了一台在我们实验室建造的机器人(Silva、Bicho 和 Erlhagen,2008)。它由一个固定的圆环组成,上面装有一个拥有 7 个自由度和一个三指手的机械臂,以及一个立体声相机头。语音合成器使机器人能够向人类用户传达其推理和决策过程的结果。相机系统提供有关物体类型、位置和姿态以及建筑物状态的信息。物体识别结合基于颜色的分割和来自早期学习示例的模板匹配(Westphal、von der Malsburg 和 Würtz,2008)。同样的技术也用于物体导向的静态手势(如抓取)和交流手势(如指点或要求物体)的分类。
我们在实验中使用了一台在我们实验室建造的机器人(Silva、Bicho 和 Erlhagen,2008)。它由一个固定的圆环组成,上面装有一个拥有 7 个自由度和一个三指手的机械臂,以及一个立体声相机头。语音合成器使机器人能够向人类用户传达其推理和决策过程的结果。相机系统提供有关物体类型、位置和姿态以及建筑物状态的信息。物体识别结合基于颜色的分割和来自早期学习示例的模板匹配(Westphal、von der Malsburg 和 Würtz,2008)。同样的技术也用于物体导向的静态手势(如抓取)和交流手势(如指点或要求物体)的分类。
3. Movement planning 3. 运动规划
For the human–robot experiments, a decision of the robot to execute a specific complementary action has to be translated into a fluent, smooth and collision-free arm trajectory. A complementary behavior may consist of a simple pointing towards a target object, but may also involve more complex movements such as grasping a component with a specific grip or attaching different components of the toy object to each other. These goal-directed movements define the motor repertoire of the robot in the construction task. To generate complete temporal motor behaviors of the robotic arm and hand we use an approach that is inspired by the posture model of Rosenbaum and colleagues (Meulenbroek et al., 2001, Rosenbaum et al., 2001). This model has been proven to generate different types of realistic movements such as reaching, grasping and manipulation of objects, and presents an elegant obstacle avoidance mechanism. The posture model was first introduced for planar movements and was recently extended to the planning in a 3D workspace (Vaugham, Rosenbaum, & Meulenbroek, 2006). A key assumption is that the planning of movements in joint space can be divided into two sub-problems: end posture selection and trajectory selection. Here we give an overview of the implementation of this two-step planning process in the robot. The model has been described with more technical details elsewhere (Costa e Silva, Bicho, Erlhagen, & Meulenbroek, submitted). The planning system first selects a goal posture from the set of all postures that (1) allows an object to be grasped without collision with any obstacle, and (2) minimizes the displacement costs from the beginning to the end of the movement. Mathematically, the selection process can be formalized as a nonlinear constraint optimization problem. It is numerically solved taking into account the information about object type, position and orientation (provided by the vision system), as well as the information represented by the activation patterns in the dynamic field model about grip type and hand orientation relative to the object. Subsequently, the trajectory is selected by computing for each of the 10 joints of the robotics arm and hand its trajectory, i.e., time history of position, velocity and acceleration, from initial to end posture. Since the minimum jerk principle is applied (Flash & Hogan, 1985), the movements of the joint follow a bell-shaped velocity profile, resulting in a smooth straight-line movement in joint space. This joint trajectory defines the direct movement without checking if an obstacle blocks a certain area in posture space. To detect potential collision with an intermediate obstacle, the object to be grasped or a target object, the planning system uses direct (forward) kinematics to internally simulate movement execution from start to end. If no collision is anticipated, the movement is executed, otherwise the system searches for a feasible movement. For finding this alternative a suitable bounce posture is selected. This bounce posture serves as a subgoal for a back-and-forth movement, which is superimposed on the direct movement. The end posture that is finally reached is the same as for the direct movement, only the selected path differs to guarantee collision avoidance. The bounce posture is found by solving a similar constrained optimization problem as applied for the end posture. It minimizes the displacement of the joints. To generate the movement, the desired joint position and time interval, given by the planning model, is sent to the low-level arm and hand controllers, using the high-level interface functions provided by the manufacturer (AMTEC/SCHUNK and BARRETT Technology, respectively). They guarantee that the planned trajectory is executed in the desired time interval. The real-time interaction experiments with human users show that the movements of the robot are perceived as smooth and goal-directed but slower compared to human motion. Moreover, a direct comparison with human data in reach to grasp tasks reveals that the generated arm and hand trajectories reflect several characteristics observed in biological motion such as for instance a biphasic tangential velocity profile or a maximum grip aperture that occurs during the second half of the movement (Lommertzen, Costa e Silva, Cuijpers, & Meulenbroek, 2008).
对于人机实验,机器人执行特定补充动作的决定必须转化为流畅、平滑和无碰撞的机械臂轨迹。补充行为可能包括简单地指向目标物体,也可能涉及更复杂的动作,如以特定握法抓取组件或将玩具对象的不同部件相互连接。这些面向目标的运动定义了机器人在建造任务中的动作库。为了生成机械臂和手的完整时间运动行为,我们采用了一种受罗森鲍姆及其同事(Meulenbroek 等人,2001 年;Rosenbaum 等人,2001 年)姿态模型启发的方法。该模型已被证明能够产生各种类型的真实运动,如抓取、抓握和操纵物体,并提供优雅的避障机制。姿态模型最初是为平面运动引入的,最近已扩展到 3D 工作空间的规划(Vaugham、Rosenbaum 和 Meulenbroek,2006 年)。一个关键假设是,关节空间运动的规划可以分为两个子问题:末端姿态选择和轨迹选择。这里我们概述了这个两步规划过程在机器人中的实现。该模型的更多技术细节已在其他地方描述过(Costa e Silva、Bicho、Erlhagen 和 Meulenbroek,已提交)。规划系统首先从所有姿态中选择一个目标姿态,该姿态(1)可以在不与任何障碍物碰撞的情况下抓取物体,(2)从运动开始到结束的位移成本最小。从数学上讲,这个选择过程可以形式化为一个非线性约束优化问题。 根据来自视觉系统的关于物体类型、位置和方向的信息,以及动态场模型中表示的抓握类型和手部方向相对于物体的信息,数值求解这一问题。随后,通过计算机器人手臂和手部 10 个关节的轨迹,即从初始到终点姿势的位置、速度和加速度随时间的变化,选择轨迹。由于应用了最小抖动原理(Flash & Hogan, 1985),关节运动遵循钟形速度曲线,在关节空间呈现平滑的直线运动。这个关节轨迹定义了直接运动,没有检查姿态空间中是否有障碍物阻挡。为了检测与中间障碍物、待抓取物体或目标物体的潜在碰撞,规划系统使用正运动学在内部模拟从起点到终点的运动执行。如果预计没有碰撞,则执行该运动,否则系统会寻找可行的替代运动。为了找到这种替代方案,选择了一个合适的反弹姿态。这个反弹姿势作为来回运动的子目标,superimpose 在直接运动上。最终达到的终点姿势与直接运动相同,只是所选路径不同,以确保避免碰撞。通过解决与终点姿势相似的约束优化问题来找到反弹姿态,最小化关节的位移。为了产生运动,规划模型给出的期望关节位置和时间间隔被发送到底层手臂和手部控制器,使用制造商提供的高级接口功能(AMTEC/SCHUNK 和 BARRETT Technology)。 它们保证计划的轨迹在所需的时间间隔内得到执行。与人类用户的实时交互实验显示,机器人的运动被视为平滑和目标导向,但与人类运动相比较慢。此外,与人类抓取任务中的数据进行直接比较表明,生成的手臂和手部轨迹反映了生物运动中观察到的几个特征,例如二相性切线速度剖面或在运动第二半期出现的最大握持张开度(Lommertzen, Costa e Silva, Cuijpers, & Meulenbroek, 2008)。
对于人机实验,机器人执行特定补充动作的决定必须转化为流畅、平滑和无碰撞的机械臂轨迹。补充行为可能包括简单地指向目标物体,也可能涉及更复杂的动作,如以特定握法抓取组件或将玩具对象的不同部件相互连接。这些面向目标的运动定义了机器人在建造任务中的动作库。为了生成机械臂和手的完整时间运动行为,我们采用了一种受罗森鲍姆及其同事(Meulenbroek 等人,2001 年;Rosenbaum 等人,2001 年)姿态模型启发的方法。该模型已被证明能够产生各种类型的真实运动,如抓取、抓握和操纵物体,并提供优雅的避障机制。姿态模型最初是为平面运动引入的,最近已扩展到 3D 工作空间的规划(Vaugham、Rosenbaum 和 Meulenbroek,2006 年)。一个关键假设是,关节空间运动的规划可以分为两个子问题:末端姿态选择和轨迹选择。这里我们概述了这个两步规划过程在机器人中的实现。该模型的更多技术细节已在其他地方描述过(Costa e Silva、Bicho、Erlhagen 和 Meulenbroek,已提交)。规划系统首先从所有姿态中选择一个目标姿态,该姿态(1)可以在不与任何障碍物碰撞的情况下抓取物体,(2)从运动开始到结束的位移成本最小。从数学上讲,这个选择过程可以形式化为一个非线性约束优化问题。 根据来自视觉系统的关于物体类型、位置和方向的信息,以及动态场模型中表示的抓握类型和手部方向相对于物体的信息,数值求解这一问题。随后,通过计算机器人手臂和手部 10 个关节的轨迹,即从初始到终点姿势的位置、速度和加速度随时间的变化,选择轨迹。由于应用了最小抖动原理(Flash & Hogan, 1985),关节运动遵循钟形速度曲线,在关节空间呈现平滑的直线运动。这个关节轨迹定义了直接运动,没有检查姿态空间中是否有障碍物阻挡。为了检测与中间障碍物、待抓取物体或目标物体的潜在碰撞,规划系统使用正运动学在内部模拟从起点到终点的运动执行。如果预计没有碰撞,则执行该运动,否则系统会寻找可行的替代运动。为了找到这种替代方案,选择了一个合适的反弹姿态。这个反弹姿势作为来回运动的子目标,superimpose 在直接运动上。最终达到的终点姿势与直接运动相同,只是所选路径不同,以确保避免碰撞。通过解决与终点姿势相似的约束优化问题来找到反弹姿态,最小化关节的位移。为了产生运动,规划模型给出的期望关节位置和时间间隔被发送到底层手臂和手部控制器,使用制造商提供的高级接口功能(AMTEC/SCHUNK 和 BARRETT Technology)。 它们保证计划的轨迹在所需的时间间隔内得到执行。与人类用户的实时交互实验显示,机器人的运动被视为平滑和目标导向,但与人类运动相比较慢。此外,与人类抓取任务中的数据进行直接比较表明,生成的手臂和手部轨迹反映了生物运动中观察到的几个特征,例如二相性切线速度剖面或在运动第二半期出现的最大握持张开度(Lommertzen, Costa e Silva, Cuijpers, & Meulenbroek, 2008)。
4. Model overview 4. 模型概览
Fig. 2 presents a sketch of the DNF-based architecture for decision making in cooperative joint action. It consists of various layers each containing one or more neural populations encoding information specific to the construction task (a detailed description of the labels in each layer is given in the Supplemental material). The lines indicate the connectivity between individual populations in the network. Basically, the architecture implements a flexible mapping from an observed action of the co-actor onto a complementary motor behavior.
图 2 描绘了基于 DNF 的架构,用于合作性联合行动中的决策制定。它由多个层组成,每个层都包含一个或多个神经种群,编码特定于施工任务的信息(标签的详细描述见补充材料)。线条表示网络中个体种群之间的连通性。基本上,该架构实现了从共同参与者的观察行为到互补运动行为的灵活映射。
图 2 描绘了基于 DNF 的架构,用于合作性联合行动中的决策制定。它由多个层组成,每个层都包含一个或多个神经种群,编码特定于施工任务的信息(标签的详细描述见补充材料)。线条表示网络中个体种群之间的连通性。基本上,该架构实现了从共同参与者的观察行为到互补运动行为的灵活映射。
The multi-layered architecture extends a previous DNF-model of the STS-PF-F5 mirror circuit of monkey (Erlhagen et al., 2006a) that is believed to represent the neural basis for a matching between the visual description of an action in area STS and its motor representation in area F5 (Rizzolatti & Craighero, 2004). This circuit supports a direct and automatic imitation of an action performed by another individual. Importantly for joint action, however, the model allows also for a flexible perception-action coupling by exploiting the existence of object-directed action chains in the middle layer PF (Fogassi et al., 2005) that are linked to the representations of their final goals or outcomes in prefrontal cortex (PFC). The automatic activation of a particular chain during action observation (e.g., reaching-grasping-placing) drives the connected goal representation which in turn may bias the decision processes in layer F5 towards the selection of a complementary rather than an imitative action. Consistent with this model prediction, a class of mirror neurons has been reported in F5 for which the effective observed and effective executed actions are logically related (e.g., implementing a matching between placing an object on the table and bringing the object to the mouth (di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992)). For the robotics work we refer to the three layers of the matching system as the action observation (AOL), action simulation (ASL) and action execution (AEL) layers, respectively. An observed object-related hand movement that is recognized by the vision system as a particular primitive is represented in AOL. In the action simulation layer (ASL) populations encode entire chains of action primitives that are in the motor repertoire of the robot (e.g., reaching-grasping-placing/attaching a particular part) or communicative hand gestures (e.g., pointing towards or requesting a part). They are connected to population representations of the associated end states or goals in the intention layer IL (e.g., right wheel attached). The activation of a particular chain during action observation thus allows the robot to predict the co-actor’s motor intentions by internally simulating the action outcomes. Very often, however, the observation of a particular motor act alone (e.g., grasping) is not sufficient to make this prediction since the motor act may be part of several chains. To solve this ambiguity, the neural populations in ASL get additional inputs from connected populations representing situational context and/or prior task knowledge about what the co-actor should do in a particular situation. An important contextual cue is the spatial distribution in the workspace of parts necessary for the assembly work. The object memory layer (OML) encodes memorized information about the position of these parts in each of the two working areas, separately for each object type. The common subgoals layer (CSGL) encodes the currently available subgoals as well as the subtasks that have been already accomplished by the team. The available subgoals are continuously updated based on feedback from the vision system in accordance with the construction plan. The information about the sequential order in which subtasks have to be accomplished (e.g., attach right wheel first and subsequently fix it with a nut) is encoded in the synaptic links between populations representing these subgoals in two different neural fields (indicated with labels ‘present’ and ‘next’ in layer CSGL, see Fig. 2). Input from the vision system signaling the achievement of a certain subtask activates the respective population representation in the first layer which in turn drives through the connections the populations representing the next possible assembly steps in the second layer. To guarantee pro-active behavior of the robot, the model implements the possibility to update the current subgoals also based on input from IL representing the predicted motor intention of the co-actor. This allows the robot to start preparing an action serving a subsequent goal (e.g., transferring a nut to the co-actor for fixing the wheel) ahead of the realization of the preceding subtask (e.g., co-actor is going to attach right wheel). In the action execution layer (AEL) populations that encode the same action sequences and communicative gestures like the ASL compete for expression in overt behavior. They integrate input from the IL, OML and CSGL.
多层架构拓展了在猴子区域 STS 和 F5 区域之间建立运动表征匹配的神经基础(Erlhagen 等人,2006 年)。该电路支持对其他个体执行的动作的直接和自动模仿。但对于共同行动而言,该模型还允许通过利用中层 PF 中的面向对象的行动链条(Fogassi 等人,2005 年),这些行动链与前额叶皮层(PFC)中其最终目标或结果的表征相关联,从而实现灵活的感知-行动耦合。在动作观察期间,特定链条的自动激活(例如,伸手-抓取-放置)驱动相关的目标表征,反过来可能会偏向 F5 层中对补充性而非模仿性动作的选择。与该模型预测一致,在 F5 区域中已报告存在一类镜像神经元,其有效观察动作和有效执行动作在逻辑上相关(例如,实现将物体放到桌子上和将物体送到嘴边之间的匹配)。对于机器人工作,我们将匹配系统的三个层次分别称为动作观察层(AOL)、动作模拟层(ASL)和动作执行层(AEL)。视觉系统识别为特定原始动作的观察到的与物体相关的手部运动,被表征在动作观察层(AOL)中。在动作模拟层(ASL),神经元群编码整个动作原语链条,这些动作原语链条位于机器人的动作库(例如,伸手-抓取-放置/连接某个部件)或交流手势(例如,指向或请求某个部件)中。 它们与意图层(IL)中相关的最终状态或目标人群表征相连(例如,右轮已连接)。因此,在动作观察过程中,某一特定链的激活使机器人能够通过内部模拟动作结果来预测合作者的运动意图。然而,很 often,单独观察某个特定运动动作(例如抓取)是不足以进行此预测的,因为该运动动作可能属于多个链。为了解决这种歧义,ASL 中的神经群体从连接的群体中获取了代表情境环境和/或先前任务知识的附加输入,这些知识告知合作者在特定情况下应该做些什么。工作区域中零件的空间分布是一个重要的环境线索。物体记忆层(OML)编码了有关每种物体类型在两个工作区域的零件位置的记忆信息。常见子目标层(CSGL)编码了当前可用的子目标以及团队已完成的子任务。可用子目标根据来自视觉系统的反馈不断更新,以符合装配计划。子任务必须完成的顺序信息(例如,先安装右轮,然后用螺母固定它)编码在表示这些子目标的两个不同神经领域之间的突触链接中(在 CSGL 层中用"现在"和"下一步"标签表示,见图 2)。来自视觉系统的信号表示某个子任务已完成,会激活第一层中相应的群体表征,进而通过连接激活第二层中表示下一步可能的装配步骤的群体。 为了确保机器人的主动行为,该模型实现了根据 IL 代表协作者预测的运动意图来更新当前子目标的可能性。这使得机器人能够在完成前一个子任务(例如,协作者要安装右轮)之前就开始准备执行下一个目标的动作(例如,将螺母传给协作者修理轮子)。在行动执行层(AEL)中,编码相同动作序列和交流手势(如 ASL)的群体相互竞争以表现出明确的行为。它们整合了来自 IL、OML 和 CSGL 的输入。
多层架构拓展了在猴子区域 STS 和 F5 区域之间建立运动表征匹配的神经基础(Erlhagen 等人,2006 年)。该电路支持对其他个体执行的动作的直接和自动模仿。但对于共同行动而言,该模型还允许通过利用中层 PF 中的面向对象的行动链条(Fogassi 等人,2005 年),这些行动链与前额叶皮层(PFC)中其最终目标或结果的表征相关联,从而实现灵活的感知-行动耦合。在动作观察期间,特定链条的自动激活(例如,伸手-抓取-放置)驱动相关的目标表征,反过来可能会偏向 F5 层中对补充性而非模仿性动作的选择。与该模型预测一致,在 F5 区域中已报告存在一类镜像神经元,其有效观察动作和有效执行动作在逻辑上相关(例如,实现将物体放到桌子上和将物体送到嘴边之间的匹配)。对于机器人工作,我们将匹配系统的三个层次分别称为动作观察层(AOL)、动作模拟层(ASL)和动作执行层(AEL)。视觉系统识别为特定原始动作的观察到的与物体相关的手部运动,被表征在动作观察层(AOL)中。在动作模拟层(ASL),神经元群编码整个动作原语链条,这些动作原语链条位于机器人的动作库(例如,伸手-抓取-放置/连接某个部件)或交流手势(例如,指向或请求某个部件)中。 它们与意图层(IL)中相关的最终状态或目标人群表征相连(例如,右轮已连接)。因此,在动作观察过程中,某一特定链的激活使机器人能够通过内部模拟动作结果来预测合作者的运动意图。然而,很 often,单独观察某个特定运动动作(例如抓取)是不足以进行此预测的,因为该运动动作可能属于多个链。为了解决这种歧义,ASL 中的神经群体从连接的群体中获取了代表情境环境和/或先前任务知识的附加输入,这些知识告知合作者在特定情况下应该做些什么。工作区域中零件的空间分布是一个重要的环境线索。物体记忆层(OML)编码了有关每种物体类型在两个工作区域的零件位置的记忆信息。常见子目标层(CSGL)编码了当前可用的子目标以及团队已完成的子任务。可用子目标根据来自视觉系统的反馈不断更新,以符合装配计划。子任务必须完成的顺序信息(例如,先安装右轮,然后用螺母固定它)编码在表示这些子目标的两个不同神经领域之间的突触链接中(在 CSGL 层中用"现在"和"下一步"标签表示,见图 2)。来自视觉系统的信号表示某个子任务已完成,会激活第一层中相应的群体表征,进而通过连接激活第二层中表示下一步可能的装配步骤的群体。 为了确保机器人的主动行为,该模型实现了根据 IL 代表协作者预测的运动意图来更新当前子目标的可能性。这使得机器人能够在完成前一个子任务(例如,协作者要安装右轮)之前就开始准备执行下一个目标的动作(例如,将螺母传给协作者修理轮子)。在行动执行层(AEL)中,编码相同动作序列和交流手势(如 ASL)的群体相互竞争以表现出明确的行为。它们整合了来自 IL、OML 和 CSGL 的输入。
To give an example of the dynamic decision making process implemented in the field architecture, think of the situation that the co-actor reaches towards a wheel in his working area. The wheel on his construction side has been already attached, but not the wheel on the side of the observer. The available information about active and already accomplished subtasks together with the observed hand motion activates automatically the chain representation of a ‘reach–grasp wheel–handover’ behavior in the ASL which subsequently activates the motor intention ‘handover wheel’ in the IL. As a consequence, the robot may prepare at the time of the reaching for receiving the wheel. Now imagine that the same motor act is observed at the start of the construction, that is, the wheel on the co-actor’s side has not been attached yet. Consequently, specific input from CSGL in support of the object transfer hypothesis is missing. Now input from the AOL representing the type of the observed grasping behavior (top versus side grip) may decide which of the two possible chains associated with different motor intentions may become activated. A top grip (i.e., a grip from above) is usually used for directly attaching the wheel, whereas grasping from the side is the most secure way to hand over the wheel to the partner. In the latter case, the robot will prepare a complementary grasping behavior to receive the wheel. In the former case, an adequate complementary behavior of the robot might be to reach for a wheel in its workspace to attach it on its construction side.
以建筑领域中实施的动态决策过程为例,设想这样一种情况:合作者伸手去够他工作区域内的车轮。他一侧的车轮已经安装好了,但观察者这一侧的车轮还未安装。关于已完成的子任务的可用信息以及观察到的手部运动,会自动激活 ASL 中'够到-抓取车轮-交接'行为的链式表示,随后在 IL 中激活'交接车轮'的运动意图。因此,机器人可能会在够车轮时就准备好接收车轮。现假设观察到同样的运动行为发生在施工开始时,也就是说,合作者这一侧的车轮还未安装。结果,CSGL 中支持物品转移假设的特定输入就缺失了。此时,来自 AOL 的输入,表示所观察到的抓握行为的类型(从上方抓取还是从侧面抓取),可以决定哪一个与不同运动意图相关的链可能被激活。从上方抓取(即从上方抓取)通常用于直接安装车轮,而从侧面抓取是将车轮交给合作伙伴的最安全方式。在后一种情况下,机器人将准备一种互补的抓握行为来接收车轮。在前一种情况下,机器人的适当互补行为可能是去自己的工作空间内够一个车轮并将其安装在自己的施工一侧。
以建筑领域中实施的动态决策过程为例,设想这样一种情况:合作者伸手去够他工作区域内的车轮。他一侧的车轮已经安装好了,但观察者这一侧的车轮还未安装。关于已完成的子任务的可用信息以及观察到的手部运动,会自动激活 ASL 中'够到-抓取车轮-交接'行为的链式表示,随后在 IL 中激活'交接车轮'的运动意图。因此,机器人可能会在够车轮时就准备好接收车轮。现假设观察到同样的运动行为发生在施工开始时,也就是说,合作者这一侧的车轮还未安装。结果,CSGL 中支持物品转移假设的特定输入就缺失了。此时,来自 AOL 的输入,表示所观察到的抓握行为的类型(从上方抓取还是从侧面抓取),可以决定哪一个与不同运动意图相关的链可能被激活。从上方抓取(即从上方抓取)通常用于直接安装车轮,而从侧面抓取是将车轮交给合作伙伴的最安全方式。在后一种情况下,机器人将准备一种互补的抓握行为来接收车轮。在前一种情况下,机器人的适当互补行为可能是去自己的工作空间内够一个车轮并将其安装在自己的施工一侧。
5. Model details: Basic concepts of dynamic neural field theory
5. 模型详情:动态神经场理论的基本概念
Each layer of the model is formalized by one or more Dynamic Neural Fields (DNFs). The DNFs implement the idea that task-relevant information about action goals, action primitives or context is encoded by means of activation patterns of local populations of neurons. In the action observation layer for instance, high levels of activation of a neural pool representing a certain grasping behavior means that the specific grip type has been detected and classified by the vision system whereas a low activation level indicates that information about the specific grip type is currently not processed (see Fig. 3, top).
该模型的每一层都由一个或多个动态神经场(DNF)进行形式化。DNF 实施了这样一个想法,即与任务相关的关于行动目标、行动原语或上下文的信息是通过局部神经元群体的激活模式编码的。例如,在动作观察层中,表示某种抓取行为的神经池的高激活水平意味着特定的抓取类型已被视觉系统检测和分类,而低激活水平表明当前并未处理关于特定抓取类型的信息(见图 3,顶部)。
该模型的每一层都由一个或多个动态神经场(DNF)进行形式化。DNF 实施了这样一个想法,即与任务相关的关于行动目标、行动原语或上下文的信息是通过局部神经元群体的激活模式编码的。例如,在动作观察层中,表示某种抓取行为的神经池的高激活水平意味着特定的抓取类型已被视觉系统检测和分类,而低激活水平表明当前并未处理关于特定抓取类型的信息(见图 3,顶部)。
As shown in the bottom panel of Fig. 3, the activation patterns are initially triggered by input from connected populations and/or sources external to the network like for instance the vision system. In the example, the input to the population representing an above grip is stronger compared to the input to the population representing a side grip.
如图 3 底部面板所示,激活模式最初是由连接的种群和/或网络外部的来源(如视觉系统)的输入触发的。在这个例子中,代表高握法的种群的输入强于代表侧握法的种群的输入。
如图 3 底部面板所示,激活模式最初是由连接的种群和/或网络外部的来源(如视觉系统)的输入触发的。在这个例子中,代表高握法的种群的输入强于代表侧握法的种群的输入。
The activation patterns may become self-sustained in the absence of any input due to the recurrent interactions within the local populations. To guarantee the existence of self-stabilized solutions of the field dynamics, the pattern of recurrent interactions between cells must be spatially structured with excitation dominating at small and inhibition at larger distances. The distance between neurons may be defined in anatomical space (as in the original work by Wilson and Cowan (1973) and Amari (1977)) or in a more abstract space given by some feature dimension that the neurons encode (e.g., movement direction or direction in visual space). For high-dimensional spaces representing for instance different grasping behaviors the metric distance is not directly observable. However, it may still be defined operationally by the degree of overlap between their neural representations. For functionally distinct behaviors associated with the achievement of different goals, we assume that they are represented by separate neural sub-populations (all able to self-sustain high levels of activation) located at a distance that guarantees a purely inhibitory interaction between the neurons of these pools.
激活模式可能会在没有任何输入的情况下自我维持,这是由于局部群体内部的循环相互作用。为了确保场动力学中自稳定解的存在,细胞之间的循环相互作用模式必须在空间上结构化,在较小的距离上表现为兴奋占主导,在较大的距离上表现为抑制占主导。神经元之间的距离可以在解剖空间中定义(如 Wilson 和 Cowan(1973)以及 Amari(1977)的原创工作中所示),也可以在更抽象的空间中定义,该空间由神经元编码的某些特征维度给出(例如,运动方向或视空间方向)。对于表示不同抓取行为的高维空间,度量距离是不可直接观察的。但是,它仍可通过它们的神经表征之间的重叠程度来操作性定义。对于与实现不同目标相关的功能不同的行为,我们假设它们由位于可保证这些群体中神经元之间纯抑制相互作用的距离的单独神经亚群体表示。
激活模式可能会在没有任何输入的情况下自我维持,这是由于局部群体内部的循环相互作用。为了确保场动力学中自稳定解的存在,细胞之间的循环相互作用模式必须在空间上结构化,在较小的距离上表现为兴奋占主导,在较大的距离上表现为抑制占主导。神经元之间的距离可以在解剖空间中定义(如 Wilson 和 Cowan(1973)以及 Amari(1977)的原创工作中所示),也可以在更抽象的空间中定义,该空间由神经元编码的某些特征维度给出(例如,运动方向或视空间方向)。对于表示不同抓取行为的高维空间,度量距离是不可直接观察的。但是,它仍可通过它们的神经表征之间的重叠程度来操作性定义。对于与实现不同目标相关的功能不同的行为,我们假设它们由位于可保证这些群体中神经元之间纯抑制相互作用的距离的单独神经亚群体表示。
For the modeling we employed a particular form of a DNF first analyzed by Amari (1977). In each model layer i, the activity at time t of a neuron at field location x is described by the following integro-differential equation (for an overview of analytical results see Erlhagen and Bicho (2006)):(1)where and define the time scale and the resting level of the field dynamics, respectively. The integral term describes the intra-field interactions. It is assumed that the interaction strength, , between any two neurons x and depends only on the distance between field locations. For the present implementation we used the following weight function of lateral-inhibition type (Fig. 4b):(2)where and describe the amplitude and the standard deviation of a Gaussian, respectively. For simplicity, the inhibition is assumed to be constant, , meaning that integration of inhibitory input does not change with distance between field sites. Only sufficiently activated neurons contribute to interaction. The threshold function is chosen of sigmoidal shape with slope parameter β and threshold (Fig. 4a):(3)
对于建模,我们采用了阿马里(1977)最先分析的特定形式的 DNF。在每个模型层 i 中,位于场地 x 的神经元在时间 t 的活动 由以下积分微分方程描述(有关分析结果的概述,请参见 Erlhagen 和 Bicho (2006)): (1) 其中 和 分别定义场动力学的时间尺度和静息水平。积分项描述了场内的相互作用。假定位于 x 和 的任意两个神经元之间的相互作用强度 仅取决于场地位置之间的距离。对于当前的实施,我们使用了以下侧向抑制型权重函数(图 4b): (2) 其中 和 分别描述了高斯函数的幅度和标准差。为了简单起见,假定抑制是恒定的 ,这意味着抑制性输入的积分不会随着场地之间的距离而变化。只有足够激活的神经元才会参与相互作用。阈值函数 采用带有斜率参数 β 和阈值 的 S 型(图 4a): (3)
对于建模,我们采用了阿马里(1977)最先分析的特定形式的 DNF。在每个模型层 i 中,位于场地 x 的神经元在时间 t 的活动 由以下积分微分方程描述(有关分析结果的概述,请参见 Erlhagen 和 Bicho (2006)): (1) 其中 和 分别定义场动力学的时间尺度和静息水平。积分项描述了场内的相互作用。假定位于 x 和 的任意两个神经元之间的相互作用强度 仅取决于场地位置之间的距离。对于当前的实施,我们使用了以下侧向抑制型权重函数(图 4b): (2) 其中 和 分别描述了高斯函数的幅度和标准差。为了简单起见,假定抑制是恒定的 ,这意味着抑制性输入的积分不会随着场地之间的距离而变化。只有足够激活的神经元才会参与相互作用。阈值函数 采用带有斜率参数 β 和阈值 的 S 型(图 4a): (3)
As illustrated in Fig. 5, the model parameters are adjusted to guarantee that the field dynamics is bi-stable (Amari, 1977), that is, the attractor state of a self-stabilized activation pattern (dashed-dotted line) coexists with a stable homogenous activation distribution (solid line) that represents the absence of specific information (resting level). If the summed input, , to a local population is sufficiently strong to drive the field activation beyond a certain threshold, the homogeneous state loses stability and a localized pattern in the dynamic field evolves. Weaker external signals lead to a subthreshold, input-driven activation pattern (dashed line) in which the contribution of the interactions is negligible.
正如图 5 所示,模型参数被调整以确保场动力学是双稳态的(Amari,1977),即自稳定激活模式的吸引子状态(虚线-点线)与代表缺乏特定信息(静息水平)的稳定均匀激活分布(实线)共存。如果局部群体的总输入 足够强,以将场激活推向某个阈值以上,则均匀状态将失去稳定性,动态场中会演化出一个局部化的模式。较弱的外部信号会导致一个亚阈输入驱动的激活模式(虚线),其中相互作用的贡献可以忽略不计。
正如图 5 所示,模型参数被调整以确保场动力学是双稳态的(Amari,1977),即自稳定激活模式的吸引子状态(虚线-点线)与代表缺乏特定信息(静息水平)的稳定均匀激活分布(实线)共存。如果局部群体的总输入 足够强,以将场激活推向某个阈值以上,则均匀状态将失去稳定性,动态场中会演化出一个局部化的模式。较弱的外部信号会导致一个亚阈输入驱动的激活模式(虚线),其中相互作用的贡献可以忽略不计。
Normally, a constant input from a single population does not drive directly connected populations. It may play nevertheless an important role for the processing in the joint action circuit. The preshaping by weak input brings populations closer to the threshold for triggering the self-sustaining interactions and thus biases the decision processes linked to behavior. Much like prior distributions in the Bayesian sense, multi-modal patterns of subthreshold activation in for instance the action execution layer (AEL) may represent the probability of different complementary actions (Erlhagen & Schöner, 2002).
通常,来自单一群体的恒定输入不会直接驱动相连的群体。但它可能在联合行动电路的处理中发挥重要作用。弱输入的预塑造使群体更接近触发自持相互作用的阈值,从而偏向与行为相关的决策过程。类似于贝叶斯意义上的先验分布,行动执行层(AEL)中亚阈值激活的多模式模式可能代表不同补充性动作的概率(Erlhagen & Schöner, 2002)。
通常,来自单一群体的恒定输入不会直接驱动相连的群体。但它可能在联合行动电路的处理中发挥重要作用。弱输入的预塑造使群体更接近触发自持相互作用的阈值,从而偏向与行为相关的决策过程。类似于贝叶斯意义上的先验分布,行动执行层(AEL)中亚阈值激活的多模式模式可能代表不同补充性动作的概率(Erlhagen & Schöner, 2002)。
The summed input from connected fields is given as . The parameter k scales the total input relative to the threshold for triggering a self-sustained pattern. This guarantees that the inter-field coupling is weak and the field dynamics is dominated by the recurrent interactions. The input from each connected field is modeled by Gaussian functions. As shown in Fig. 6, the input from a connected population j in layer to a target population m in layer is modeled by a Gaussian function. This input is applied whenever the activation in population j is beyond the threshold for a self-stabilized activation peak. The total input from all sub-populations in field to field is mathematically described by:(4)Here is a function that signals the existence or evolution of a self-sustained activation pattern in field centered at position (i.e., signals that subpopulation j in is active), and is the inter-field synaptic connection between subpopulation in to subpopulation m in .
来自相连场域 的输入总和被给定为 。参数 k 相对于触发自持续模式的阈值缩放总输入。这保证了跨场域耦合较弱,场域动力学主导于循环互作用。每个相连场域 的输入由高斯函数建模。如图 6 所示,来自层 中子群落 j 到层 中目标群落 m 的输入由高斯函数建模。当群落 j 中的激活超过自稳定激活峰的阈值时,就会施加这种输入。来自场域 所有子群落到场域 的总输入用数学方式描述为: (4) 这里 是一个函数,它标志着场域 中心位置 的自持续激活模式的存在或演化(即标志着 中的子群落 j 处于活跃状态),而 是从 中子群落 到 中子群落 m 的跨场域突触连接。
来自相连场域 的输入总和被给定为 。参数 k 相对于触发自持续模式的阈值缩放总输入。这保证了跨场域耦合较弱,场域动力学主导于循环互作用。每个相连场域 的输入由高斯函数建模。如图 6 所示,来自层 中子群落 j 到层 中目标群落 m 的输入由高斯函数建模。当群落 j 中的激活超过自稳定激活峰的阈值时,就会施加这种输入。来自场域 所有子群落到场域 的总输入用数学方式描述为: (4) 这里 是一个函数,它标志着场域 中心位置 的自持续激活模式的存在或演化(即标志着 中的子群落 j 处于活跃状态),而 是从 中子群落 到 中子群落 m 的跨场域突触连接。
The existence of a self-stabilized activation peak in a dynamic field is closely linked to decision making. In layers ASL, IL and AEL subpopulations encoding different chains (ASL), goals (IL) and complementary actions (AEL), respectively, interact through lateral inhibition. Fig. 7 shows the temporal evolution of two competing populations encoding two different actions. The inhibitory interactions cause the suppression of activity below resting level in competing neural pools whenever a certain subpopulation becomes activated above threshold in response to external input.
动态场中自稳激活峰的存在与决策密切相关。在层 ASL、IL 和 AEL 中,编码不同连锁(ASL)、目标(IL)和补充性动作(AEL)的亚群通过侧向抑制相互作用。图 7 显示了两个编码不同动作的竞争性群体的时间演化。当某个亚群在外部输入的作用下被激活超过阈值时,抑制性相互作用会导致竞争性神经池中的活动被抑制至低于休息水平。
动态场中自稳激活峰的存在与决策密切相关。在层 ASL、IL 和 AEL 中,编码不同连锁(ASL)、目标(IL)和补充性动作(AEL)的亚群通过侧向抑制相互作用。图 7 显示了两个编码不同动作的竞争性群体的时间演化。当某个亚群在外部输入的作用下被激活超过阈值时,抑制性相互作用会导致竞争性神经池中的活动被抑制至低于休息水平。
The attractor state of a self-sustained activation peak can be used to implement a working memory function, which happens for sufficiently small values of the global inhibitory parameter . Conversely, for sufficiently large values of the existing suprathreshold self-sustained activation becomes unstable and field activation decays back to resting level (the field becomes mono-stable). To implement these working memory versus forgetting mechanisms we defined a proper dynamics for the global inhibitory parameter (Bicho, Mallet, & Schöner, 2000):(5)
自维持激活峰值的吸引子状态可用于实现工作记忆功能,这在全局抑制参数 的值足够小时会发生。相反,当 的值足够大时,现有的超阈值自维持激活变得不稳定,场激活衰退回到休息水平(场变成单稳态)。为了实现这些工作记忆与遗忘机制,我们定义了全局抑制参数 的适当动力学(Bicho, Mallet, & Schöner, 2000): (5)
自维持激活峰值的吸引子状态可用于实现工作记忆功能,这在全局抑制参数 的值足够小时会发生。相反,当 的值足够大时,现有的超阈值自维持激活变得不稳定,场激活衰退回到休息水平(场变成单稳态)。为了实现这些工作记忆与遗忘机制,我们定义了全局抑制参数 的适当动力学(Bicho, Mallet, & Schöner, 2000): (5)
This dynamics increases toward (destabilizing memory), at a time scale , while a suprathreshold pattern of activation exists . It restores to (to enable the working memory), at a time scale . The restoring process is much faster than the destabilizing process , so that quickly after forgetting, the field dynamics is again able to self-sustain (i.e. memorize) a suprathreshold pattern of activation. The presence of a self-sustained peak is signaled by(6)where H(·) is a Heaviside step function, and(7)is the total suprathreshold activation in the field and(8)is the total external input activation to the field.
这种动态增加了 趋向于 (destabilizing memory),在时间尺度 上,同时存在激活的超阈值模式 。它恢复了 到 (以启用工作记忆),在时间尺度 上。恢复过程比去稳定过程 快得多,因此在遗忘后很快,场动力学就能再次自我维持(即记忆)一个超阈值激活模式。自维持峰值的存在由 (6) 信号表示,其中 H(·) 是 Heaviside 阶跃函数, (7) 是场中的总超阈值激活, (8) 是场的总外部输入激活。
这种动态增加了 趋向于 (destabilizing memory),在时间尺度 上,同时存在激活的超阈值模式 。它恢复了 到 (以启用工作记忆),在时间尺度 上。恢复过程比去稳定过程 快得多,因此在遗忘后很快,场动力学就能再次自我维持(即记忆)一个超阈值激活模式。自维持峰值的存在由 (6) 信号表示,其中 H(·) 是 Heaviside 阶跃函数, (7) 是场中的总超阈值激活, (8) 是场的总外部输入激活。
In the OML and the CSGL self-sustained peaks encode memorized information about the location of the relevant parts in the two working areas and the already achieved and currently available subgoals, respectively. Since multiple potential target objects and subgoals may exist simultaneously, the field dynamics must support multi-peak solutions. Although specific lateral interaction functions exist that support such solutions (Laing, Tray, Gutkin, & Ermentrout, 2002), we implemented for simplicity synaptic weight functions, in these layers, with limited spatial ranges of lateral inhibition to avoid competition (i=CSGL, OML):(9)with and .
在 OML 和 CSGL 中,自我维持的峰值编码了关于两个工作区域相关部分位置和已经实现和当前可用子目标的记忆信息。由于可能存在多个潜在的目标对象和子目标,场动力学必须支持多峰值解决方案。虽然存在支持这种解决方案的特定横向相互作用函数(Laing、Tray、Gutkin 和 Ermentrout,2002),但为了简单起见,我们在这些层中实现了具有有限空间范围横向抑制的突触权重函数,以避免竞争(i=CSGL, OML): (9) 与 和 。
在 OML 和 CSGL 中,自我维持的峰值编码了关于两个工作区域相关部分位置和已经实现和当前可用子目标的记忆信息。由于可能存在多个潜在的目标对象和子目标,场动力学必须支持多峰值解决方案。虽然存在支持这种解决方案的特定横向相互作用函数(Laing、Tray、Gutkin 和 Ermentrout,2002),但为了简单起见,我们在这些层中实现了具有有限空间范围横向抑制的突触权重函数,以避免竞争(i=CSGL, OML): (9) 与 和 。
6. Results 6. 结果
In the following we present results of real-time human–robot interactions in the joint construction scenario. It is assumed that both actors know the construction plan and are able to perceive the state of the construction. Each actor thus knows in principle what assembly steps the team still has to perform. The examples are chosen to illustrate the impact of action observation and context on action selection from the perspective of the robot. In each case, the initial spatial distribution of objects in the two working areas obliges the robot to continuously monitor and interpret the actions of its human partner, since actions such as handing-over or demanding objects are necessarily involved. A detailed discussion of the goal inference capacity is not within the scope of the present paper. We therefore focus on examples at the beginning of the joint assembly work for which action understanding is straightforward (for details of the motor simulation mechanism see Erlhagen et al. (2006a) and Erlhagen et al. (2007)).
以下我们将呈现在联合建设情景中实时人机互动的结果。假定双方都知道建设计划,并能感知建设状态。因此,团队仍需执行的装配步骤原则上都为双方所知。选取这些例子是为了从机器人的角度说明行动观察和上下文对行动选择的影响。在每种情况下,工作区内物体的初始空间分布都要求机器人连续监控和解释人类伙伴的行动,因为涉及诸如移交或需要物品等行动。目标推断能力的详细讨论超出了本文的范围。因此我们将重点放在联合装配工作的初期,其中动作理解是直接的(有关运动仿真机制的详细信息,请参见 Erlhagen et al. (2006a)和 Erlhagen et al. (2007)).
以下我们将呈现在联合建设情景中实时人机互动的结果。假定双方都知道建设计划,并能感知建设状态。因此,团队仍需执行的装配步骤原则上都为双方所知。选取这些例子是为了从机器人的角度说明行动观察和上下文对行动选择的影响。在每种情况下,工作区内物体的初始空间分布都要求机器人连续监控和解释人类伙伴的行动,因为涉及诸如移交或需要物品等行动。目标推断能力的详细讨论超出了本文的范围。因此我们将重点放在联合装配工作的初期,其中动作理解是直接的(有关运动仿真机制的详细信息,请参见 Erlhagen et al. (2006a)和 Erlhagen et al. (2007)).
The connections between the neural populations in the various fields are hand-coded meaning that the different inputs that may bias the selection of a particular complementary action are pre-defined. The robot shows flexibility in its behavior since in the complex dynamical system of interacting populations the decision for a certain action depends on both the informational content of the various inputs to populations in AEL and their timing. Changes in the time course of activity in a connected field due to competition between neural pools or noisy input data for instance may thus affect which complementary behavior the robot selects.
神经种群在各个领域之间的联系是预先编码的,意味着可能偏向某种特定补充行为的不同输入都是预先定义好的。由于 AEL 中各种输入种群之间的时间信息内容和时间顺序都会影响某种特定行为的决策,这个交互的复杂动态系统使机器人表现出灵活的行为。由于神经池之间的竞争或输入数据的噪音,连接领域活动随时间的变化可能会影响机器人选择哪种补充行为。
神经种群在各个领域之间的联系是预先编码的,意味着可能偏向某种特定补充行为的不同输入都是预先定义好的。由于 AEL 中各种输入种群之间的时间信息内容和时间顺序都会影响某种特定行为的决策,这个交互的复杂动态系统使机器人表现出灵活的行为。由于神经池之间的竞争或输入数据的噪音,连接领域活动随时间的变化可能会影响机器人选择哪种补充行为。
As summarized in Table 1, the total number of goal-directed sequences and communicative gestures that represent relevant complementary behaviors in AEL is restricted to 9 alternatives. At any point of time of the human–robot interaction only a few of these alternatives are simultaneously possible, that is, are supported by input from connected layers. It is important to stress, however, that the dynamic decision making implemented in AEL also works for more complex situations with a larger set of possible response alternatives (e.g., at later stages of the construction process). In line with the classical Hick–Hyman law, the number of alternatives only affects the time it takes to stabilize a peak solution representing a decision in the dynamic neural field (Erlhagen & Schöner, 2002).
如表 1 所示,在 AEL 中代表相关互补行为的目标导向序列和交流手势的总数仅限于 9 种替代方案。在人机交互的任何时间点,只有少数这些替代方案能够同时得到支持。但是,需要强调的是,AEL 中实施的动态决策机制也适用于拥有更大可选响应替代方案的更复杂情况(例如在构建过程的后期阶段)。符合经典 Hick-Hyman 法则,替代方案的数量只会影响动态神经场中稳定峰值决策所需的时间(Erlhagen & Schöner, 2002)。
如表 1 所示,在 AEL 中代表相关互补行为的目标导向序列和交流手势的总数仅限于 9 种替代方案。在人机交互的任何时间点,只有少数这些替代方案能够同时得到支持。但是,需要强调的是,AEL 中实施的动态决策机制也适用于拥有更大可选响应替代方案的更复杂情况(例如在构建过程的后期阶段)。符合经典 Hick-Hyman 法则,替代方案的数量只会影响动态神经场中稳定峰值决策所需的时间(Erlhagen & Schöner, 2002)。
Action 行动 | Sequence of motor primitives 运动原语序列 | Short description 简短描述 |
---|---|---|
Reach wheel → grasp → attach 伸手→握住→固定 | Attach wheel 安装轮子 | |
Reach wheel → grasp → handover 伸手抓取→交付 | Give wheel 给轮子 | |
Reach hand → grasp wheel → attach 伸手→握住方向盘→安装 | Receive wheel to attach 收到轮子安装 | |
Reach nut → grasp → attach 触及→抓住→连接 | Attach nut 固定螺母 | |
Reach nut → grasp → handover 够到坚果→抓住→交接 | Give nut 给坚果 | |
Reach hand → grasp nut → attach 伸手→抓住坚果→固定 | Receive nut to attach 接收螺母以固定 | |
Hold out hand 伸出手 | Request piece 请求 | |
Point to wheel 指轮子 | Point to wheel 指向轮子 | |
Point to nut 指向坚果 | Point to nut 指向坚果 |
Videos of the human–robot experiments can be found at http://dei-s1.dei.uminho.pt/pessoas/estela/JASTvideos.htm. The robot uses speech to communicate the outcome of the goal inference and decision making processes implemented in the dynamic field model to the human co-actor. As our studies with naive users show, this basic form of verbal communication facilitates natural and fluent interaction with the robot (Bicho, Louro, & Erlhagen, 2010).
人机实验视频可在 http://dei-s1.dei.uminho.pt/pessoas/estela/JASTvideos.htm 找到。机器人使用语音与人类共同参与者传达在动态场模型中实现的目标推断和决策过程的结果。正如我们与新手用户的研究所示,这种基本的语言交流有助于与机器人进行自然流畅的互动(Bicho、Louro 和 Erlhagen,2010)。
人机实验视频可在 http://dei-s1.dei.uminho.pt/pessoas/estela/JASTvideos.htm 找到。机器人使用语音与人类共同参与者传达在动态场模型中实现的目标推断和决策过程的结果。正如我们与新手用户的研究所示,这种基本的语言交流有助于与机器人进行自然流畅的互动(Bicho、Louro 和 Erlhagen,2010)。
Numerical values for the dynamic field parameters and synaptic inter-field connections may be found in the Supplemental material.
动态场参数和突触间场连接的数值可在补充材料中找到。
动态场参数和突触间场连接的数值可在补充材料中找到。
6.1. Same observed action, different complementary behaviors
6.1. 相同的观察行为,不同的补充行为
According to the plan, the construction starts with attaching the two wheels and subsequently fixing each of them with a nut. Fig. 8 (video snapshots), Fig. 9 (DNF in AEL) and Fig. 10 (DNF in IL), show that the same observed gesture may have a different interpretation depending in the context in which it occurs and thus leads to a different complementary action.
根据计划,建设工作从安装两个车轮开始,随后用螺母固定每个车轮。图 8(视频快照)、图 9(AEL 中的 DNF)和图 10(IL 中的 DNF)显示,同一个观察到的动作在不同的上下文中可能有不同的解释,从而导致不同的补充动作。
根据计划,建设工作从安装两个车轮开始,随后用螺母固定每个车轮。图 8(视频快照)、图 9(AEL 中的 DNF)和图 10(IL 中的 DNF)显示,同一个观察到的动作在不同的上下文中可能有不同的解释,从而导致不同的补充动作。
In this experiment the two wheels and one nut are initially located in the workspace of the robot. The robot may thus decide to choose an action that serves one of two possible subtasks: attaching the wheel on its side () or transferring a wheel to the partner () so that he can attach it on his side. As shown in snapshots S2-S3 the co-actor first reaches his empty hand towards the robot.
在这个实验中,两个轮子和一个螺母最初位于机器人的工作空间内。因此,机器人可以选择一个行动来完成两个可能的子任务之一:将轮子安装在自己的一侧( )或将轮子转移给伙伴( )以便他可以将其安装在自己的一侧。如快照 S2-S3 所示,合作伙伴首先伸出空手向机器人伸去。
在这个实验中,两个轮子和一个螺母最初位于机器人的工作空间内。因此,机器人可以选择一个行动来完成两个可能的子任务之一:将轮子安装在自己的一侧( )或将轮子转移给伙伴( )以便他可以将其安装在自己的一侧。如快照 S2-S3 所示,合作伙伴首先伸出空手向机器人伸去。
The robot interprets this gesture as a request for a wheel since attaching a wheel on the side of the co-actor is still a valid subtask. As a consequence, both possible complementary actions, and , in AEL are initially supported by excitatory input from the connected layers OML, IL and CSGL. As can been seen by the suprathreshold activation peak at position , the robot decides to first serve the human by grasping the wheel for handing it over (snapshots S4-S5). Later (see snapshots S6-S7) the robot selects and performs the action sequence associated with the achievement of its own subtask. The reasons for this ‘social’ attitude of the robot are slightly stronger synaptic connections from the intention layer IL to AEL compared to the synaptic links between representations in CSGL and AEL. As a consequence the decision process appears to be biased by the stronger input from the representation of the request. After both wheels have been attached to the platform, the human partner again demands an object (at time ≈78 s). This time the robot interprets the hand gesture as demanding a nut since the associated subtask representation in CSGL (‘fix wheel with nut’) has become active (see snapshot S8). Fig. 10 illustrates the evolution of an activation peak in IL representing the inferred goal of the co-actor. However, a fulfilment of the request would not support efficient joint task execution since both co-actors have exactly one nut in their working areas. The information about the spatial distribution of the nuts represented in OML inhibits the representation of the complementary action that would lead to the transfer of the nut () and triggers instead the selection of a pointing movement () as the most adequate motor behavior. Pointing towards the nut in the partner’s workspace is an efficient way to attract attention and communicate the error to the co-actor (snapshots S9-S10). As illustrated by snapshots S11-S12 and the field activation after time 99 s, the robot subsequently decides to reach and grasp a nut with the purpose to fix the wheel at its construction side ().
机器人将此动作解释为请求一个轮子,因为在共同参与者一侧附加一个轮子仍然是一个有效的子任务。因此,在 AEL 中,两种可能的补充性操作 和 最初都受到了来自连接层 OML、IL 和 CSGL 的激励性输入的支持。从位置 处的超阈值激活峰值可以看出,机器人首先决定通过抓取轮子并将其交给人类来为人类服务(快照 S4-S5)。后来(见快照 S6-S7),机器人选择并执行与实现其自身子任务相关的动作序列。机器人呈现这种"社交"态度的原因是,从意图层 IL 到 AEL 的突触连接比 CSGL 和 AEL 之间的突触连接稍强。因此,决策过程似乎受到对请求的表征的更强输入的偏斜影响。在两个轮子都安装到平台上之后,人类合作伙伴再次要求一个物体(大约在 78 秒时)。这次,机器人将手势解释为需要一个螺母,因为 CSGL 中与该子任务相关的表征('用螺母固定轮子')已被激活(见快照 S8)。图 10 说明了表示参与者推断目标的 IL 中激活峰值的演化过程。然而,满足该请求不会支持有效的联合任务执行,因为两位参与者的工作区域内都只有一个螺母。在 OML 中表示的螺母的空间分布信息抑制了导致转移螺母的补充性行为( )的表征,而是触发了指向动作( )作为最恰当的运动行为。指向合作伙伴工作区域内的螺母是吸引注意力并向合作伙伴传达错误信息的有效方式(快照 S9-S10)。 正如快照 S11-S12 和时间 99 秒后的现场激活所示,机器人随后决定抓取一个螺母,目的是在其施工现场修复轮子(code5)。
机器人将此动作解释为请求一个轮子,因为在共同参与者一侧附加一个轮子仍然是一个有效的子任务。因此,在 AEL 中,两种可能的补充性操作 和 最初都受到了来自连接层 OML、IL 和 CSGL 的激励性输入的支持。从位置 处的超阈值激活峰值可以看出,机器人首先决定通过抓取轮子并将其交给人类来为人类服务(快照 S4-S5)。后来(见快照 S6-S7),机器人选择并执行与实现其自身子任务相关的动作序列。机器人呈现这种"社交"态度的原因是,从意图层 IL 到 AEL 的突触连接比 CSGL 和 AEL 之间的突触连接稍强。因此,决策过程似乎受到对请求的表征的更强输入的偏斜影响。在两个轮子都安装到平台上之后,人类合作伙伴再次要求一个物体(大约在 78 秒时)。这次,机器人将手势解释为需要一个螺母,因为 CSGL 中与该子任务相关的表征('用螺母固定轮子')已被激活(见快照 S8)。图 10 说明了表示参与者推断目标的 IL 中激活峰值的演化过程。然而,满足该请求不会支持有效的联合任务执行,因为两位参与者的工作区域内都只有一个螺母。在 OML 中表示的螺母的空间分布信息抑制了导致转移螺母的补充性行为( )的表征,而是触发了指向动作( )作为最恰当的运动行为。指向合作伙伴工作区域内的螺母是吸引注意力并向合作伙伴传达错误信息的有效方式(快照 S9-S10)。 正如快照 S11-S12 和时间 99 秒后的现场激活所示,机器人随后决定抓取一个螺母,目的是在其施工现场修复轮子(code5)。
In this example pointing is an appropriate complementary behavior since the human partner could reach the nut that he had seemingly overlooked probably because of the presence of an obstacle. The situation would be different if the obstacle not only reduces the visibility of the nut but also makes it impossible for the user to grasp it. In this case, removing the obstacle or grasping the object to hold it out for the co-actor would be appropriate complementary actions (if the object could be reached by the robot). In the model, additional input from a population encoding the presence of an obstacle to the AEL could bias the decision process towards the selection of one of these behaviors. Interestingly, a class of mirror neurons has been recently described that are differently modulated by the location in space of observed motor acts relative to the monkey (Caggiano, Fogassi, Rizzolatti, Thier, & Casile, 2009). These are “grasping” neurons that become active when the experimenter places an object in the monkey’s peripersonal space. In the experiments an obstacle was introduced that changed the properties of the mirror neurons according to the possibility that the monkey was able to interact with the object. The authors interpret their findings as further evidence for the hypothesis that mirror neurons encode observed actions for subsequent different types of behavior either direct grasping or intermediate steps like approaching the observed agent or removing the obstacle. This interpretation fits nicely to the highly context-sensitive mapping of observed actions onto executed actions implemented by the dynamic field model.
在这个例子中,指点是一种恰当的补充行为,因为人类伙伴可以达到他似乎忽视的坚果,可能是由于障碍物的存在。如果障碍物不仅降低了坚果的可见性,而且使用户无法抓住它,情况就会不同。在这种情况下,移除障碍物或抓住物体为合作者抓取物体会是适当的补充行为(如果机器人可以够到物体)。在模型中,来自编码障碍物存在的神经元群落的额外输入可以偏向决策过程,选择这些行为之一。有趣的是,最近描述了一类镜像神经元,它们根据观察到的运动行为相对于猴子的空间位置而以不同的方式调制(Caggiano, Fogassi, Rizzolatti, Thier, & Casile, 2009)。这些是"抓取"神经元,当实验者将物体放在猴子的近端空间时,它们会激活。在实验中,引入了一个障碍物,根据猴子能否与物体交互,改变了镜像神经元的特性。作者将其解释为进一步证据,支持镜像神经元编码观察到的行为,以便后续进行不同类型的行为,包括直接抓取或接近观察到的参与者或移除障碍物等中间步骤。这种解释很好地符合动态场模型实现的观察行为向执行行为的高度上下文相关的映射。
在这个例子中,指点是一种恰当的补充行为,因为人类伙伴可以达到他似乎忽视的坚果,可能是由于障碍物的存在。如果障碍物不仅降低了坚果的可见性,而且使用户无法抓住它,情况就会不同。在这种情况下,移除障碍物或抓住物体为合作者抓取物体会是适当的补充行为(如果机器人可以够到物体)。在模型中,来自编码障碍物存在的神经元群落的额外输入可以偏向决策过程,选择这些行为之一。有趣的是,最近描述了一类镜像神经元,它们根据观察到的运动行为相对于猴子的空间位置而以不同的方式调制(Caggiano, Fogassi, Rizzolatti, Thier, & Casile, 2009)。这些是"抓取"神经元,当实验者将物体放在猴子的近端空间时,它们会激活。在实验中,引入了一个障碍物,根据猴子能否与物体交互,改变了镜像神经元的特性。作者将其解释为进一步证据,支持镜像神经元编码观察到的行为,以便后续进行不同类型的行为,包括直接抓取或接近观察到的参与者或移除障碍物等中间步骤。这种解释很好地符合动态场模型实现的观察行为向执行行为的高度上下文相关的映射。
6.2. Anticipatory action selection
6.2. 预知性动作选择
In the second experiment the robot has two nuts and one wheel in its workspace. The human partner has a wheel in his workspace and thus decides to start the assembly work by reaching and grasping the wheel on his side to directly attach it (Fig. 11). As can be seen in Fig. 11a, before the co-actor starts the movement the input to the decision field of the robot from the task representations in CGSL support the selection of the action sequence associated with a subgoal that the robot can achieve alone. However, immediately after motion onset the observed reaching behavior triggers a motor simulation process in ASL that anticipates that the co-actor most likely is going to attach the wheel on his side. Fig. 11a shows that a new input to the decision field appears in the period 10 and 14 s that is associated with the achievement of a future goal of the co-actor (‘fix wheel with nut’) represented in CSGL. Like in the preceding example the robot is supposed to serve first observed or anticipated needs of the human user. Since the input supporting a handing-over sequence () is stronger than the input supporting the goal-directed sequence , the robot decides at about time 18 s to reach and grasp a nut to hold it out for the co-actor (compare Fig. 11b and the snapshots in Fig. 11c). As discussed in Section 4 the predictive rather than reactive updating of common subgoals in CSGL is automatically triggered by the input from the representation of the inferred goal in IL.
在第二个实验中,机器人在工作区域内有两个螺母和一个轮子。人类合作伙伴在自己的工作区域内有一个轮子,因此决定从抓取自己一侧的轮子开始装配工作,直接将其安装(如图 11 所示)。从图 11a 可以看出,在共同行为者开始移动之前,来自任务表征的决策场的输入支持单独由机器人完成的动作序列 子目标的选择。然而,在运动开始后立即,在 ASL 中的运动仿真过程预测共同行为者最有可能将轮子安装在自己一侧。图 11a 显示,在 10 到 14 秒的时间内,决策场出现了一个新的输入,与共同行为者在 CSGL 中表示的未来目标('用螺母固定轮子')的实现有关。与前面的例子一样,机器人应该首先服务于观察到或预期的人类用户的需求。由于支持交接序列( )的输入强于支持目标导向序列( )的输入,机器人在大约 18 秒时决定伸手抓取一个螺母,以将其递给共同行为者(参见图 11b 和图 11c 中的快照)。如第 4 节所讨论的,CSGL 中对共同子目标的预测性而非反应性更新是由来自推断目标表征的输入自动触发的。
在第二个实验中,机器人在工作区域内有两个螺母和一个轮子。人类合作伙伴在自己的工作区域内有一个轮子,因此决定从抓取自己一侧的轮子开始装配工作,直接将其安装(如图 11 所示)。从图 11a 可以看出,在共同行为者开始移动之前,来自任务表征的决策场的输入支持单独由机器人完成的动作序列 子目标的选择。然而,在运动开始后立即,在 ASL 中的运动仿真过程预测共同行为者最有可能将轮子安装在自己一侧。图 11a 显示,在 10 到 14 秒的时间内,决策场出现了一个新的输入,与共同行为者在 CSGL 中表示的未来目标('用螺母固定轮子')的实现有关。与前面的例子一样,机器人应该首先服务于观察到或预期的人类用户的需求。由于支持交接序列( )的输入强于支持目标导向序列( )的输入,机器人在大约 18 秒时决定伸手抓取一个螺母,以将其递给共同行为者(参见图 11b 和图 11c 中的快照)。如第 4 节所讨论的,CSGL 中对共同子目标的预测性而非反应性更新是由来自推断目标表征的输入自动触发的。
6.3. Timing of actions matters
6.3. 行动的时机很重要
The last experiment highlights that the timing of actions is important for the coordination of decisions and intentions among the actors. Normally the faster actor takes the lead in the joint decision process and the slower actor follows by choosing actions that complement the observed ones. In the example shown in Fig. 12 the wheel on the co-actor’s construction side has been already attached. The robot has two nuts in its workspace, whereas the co-actor has a wheel in his reach. The trial starts with the co-actor requesting an object (see snapshot S1). The robot infers that he wants to fix the wheel with a nut and decides to first hand over the nut (at time ≈14 s). Subsequently, the robot requests a wheel from the co-actor (at time ≈50 s). The first handing-over sequence thus appears to be initiated by the human, whereas the robot leads the transfer of the second object. The snapshots in Fig. 12c illustrate this sequence of human–robot interactions. Concerning the state of the construction and the initial distribution of object in the two working areas, the situation illustrated in Fig. 13c is exactly the same as in Fig. 12c. The only difference is that the co-actor now quickly tries to serve the needs of the robot after having fixed the wheel on his side with a nut. As shown in snapshots S4 and S5 the co-actor reaches and grasps a wheel with the intention to hold it out for the robot. Since through motor simulation the robot is able to predict at the time of the grasping the consequences of the ongoing action, it may prepare for receiving the wheel. The input associated with the respective action sequence appears at about 45 s (compare Fig. 12a). It is stronger compared to the input supporting the request of a wheel and thus biases the decision process of the robot.
最后一次实验突出了行动的时机对参与者之间决策和意图协调的重要性。通常情况下,较快的参与者在联合决策过程中起主导作用,较慢的参与者通过选择补充观察到的动作来跟随。在图 12 所示的例子中,合作对象的建筑侧的轮子已经安装好了。机器人在工作区有两个螺母,而合作对象有一个轮子在伸手可及的范围。试验从合作对象请求一个物品开始(见快照 S1)。机器人推断他想用螺母修理轮子,并决定先递交螺母(大约在 14 秒时)。随后,机器人请求合作对象提供一个轮子(在大约 50 秒时)。第一次递交对象序列似乎是由人类发起的,而机器人主导了第二个物体的转移。图 12c 中的快照说明了这一人机互动序列。关于建筑状态和两个工作区域中初始物品分布,图 13c 所示的情况与图 12c 完全相同。唯一的区别是,在修理好自己这边的轮子后,合作对象现在快速尝试满足机器人的需求。如快照 S4 和 S5 所示,合作对象伸手抓取一个轮子,打算递给机器人。由于通过运动模拟,机器人能够预测正在进行的动作的后果,因此可能准备接收轮子。相应动作序列的输入出现在大约 45 秒时(参见图 12a)。它比支持请求轮子的输入更强,因此会偏向机器人的决策过程。
最后一次实验突出了行动的时机对参与者之间决策和意图协调的重要性。通常情况下,较快的参与者在联合决策过程中起主导作用,较慢的参与者通过选择补充观察到的动作来跟随。在图 12 所示的例子中,合作对象的建筑侧的轮子已经安装好了。机器人在工作区有两个螺母,而合作对象有一个轮子在伸手可及的范围。试验从合作对象请求一个物品开始(见快照 S1)。机器人推断他想用螺母修理轮子,并决定先递交螺母(大约在 14 秒时)。随后,机器人请求合作对象提供一个轮子(在大约 50 秒时)。第一次递交对象序列似乎是由人类发起的,而机器人主导了第二个物体的转移。图 12c 中的快照说明了这一人机互动序列。关于建筑状态和两个工作区域中初始物品分布,图 13c 所示的情况与图 12c 完全相同。唯一的区别是,在修理好自己这边的轮子后,合作对象现在快速尝试满足机器人的需求。如快照 S4 和 S5 所示,合作对象伸手抓取一个轮子,打算递给机器人。由于通过运动模拟,机器人能够预测正在进行的动作的后果,因此可能准备接收轮子。相应动作序列的输入出现在大约 45 秒时(参见图 12a)。它比支持请求轮子的输入更强,因此会偏向机器人的决策过程。
7. Discussion 7. 讨论
Decision making refers to the process of selecting a particular action from a set of alternatives. When acting alone an individual may choose a motor behavior that best serves a certain task based on the integration of sensory evidence and prior task knowledge. In a social context, this process is more complex since the outcome of one’s decisions can be influenced by the decisions of others. A fundamental building block of social interaction is thus the capacity to predict and understand actions of others. This allows an individual to select and prepare an appropriate motor response which may range from cooperative to competitive (Sebanz et al., 2006).
决策指从一组备选方案中选择特定行动的过程。当单独行动时,个人可根据感官证据和先前任务知识的整合,选择最能完成某项任务的运动行为。在社会背景下,这一过程更为复杂,因为个人决策的结果可能受到他人决策的影响。因此,预测和理解他人行为的能力是社会互动的基础。这使个人能够选择和准备适当的运动反应,范围从合作到竞争。
决策指从一组备选方案中选择特定行动的过程。当单独行动时,个人可根据感官证据和先前任务知识的整合,选择最能完成某项任务的运动行为。在社会背景下,这一过程更为复杂,因为个人决策的结果可能受到他人决策的影响。因此,预测和理解他人行为的能力是社会互动的基础。这使个人能够选择和准备适当的运动反应,范围从合作到竞争。
Here we have presented a dynamic neural field model of decision making in a joint action task. In its multi-layered architecture the model reflects mechanisms that are believed to support the remarkably efficient and fluent interaction of humans in cooperative tasks. Most importantly, the model implements a highly context sensitive mapping of observed actions onto to-be-executed complementary actions. As our real-world joint construction experiments show, the robot responds to the same observed behavior in dependence of the context in which it occurs. This is in line with a growing body of experimental evidence supporting the notion that the matching of observed and executed actions in the mirror circuit is much more flexible than previously thought (Newman-Norlund, van Schie et al., 2007, van Schie et al., 2008). As the representation of context, goals and goal-directed action sequences are interconnected, the dynamic model explains how the observation of a motor act together with situational cues may directly activate the self-sustained population representations of the associated goal and the most appropriate complementary action. This ‘automatic’ process suggests that in known task settings the coordination of actions, goals and intentions in space and time between co-actors may occur rather effortlessly and does not require a fully developed human capacity for conscious control (Hassin et al., 2005, Ferguson and Bargh, 2004).
这里我们提出了一个关于决策的动态神经场模型,用于联合行动任务。在其多层架构中,该模型反映了支持人类在合作任务中惊人高效和流畅互动的机制。最重要的是,该模型实现了对观察到的行为与要执行的互补行为之间的高度上下文敏感映射。正如我们的实际联合建筑实验所示,机器人根据行为发生的上下文做出响应。这与越来越多的实验证据一致,支持观察到的行为与执行行为在镜像回路中的匹配要比先前认为的更加灵活(Newman-Norlund,van Schie 等人,2007,van Schie 等人,2008)。由于对上下文、目标和目标导向行动序列的表征是相互关联的,该动态模型解释了观察到的运动行为加上情境线索如何直接激活相关目标和最合适的互补行动的自持群体表征。这种"自动"过程表明,在已知的任务环境中,合作者之间的行动、目标和意图在时间和空间上的协调可能会相当轻松地发生,而不需要完全发展的人类有意识控制能力(Hassin 等人,2005,Ferguson 和 Bargh,2004)。
这里我们提出了一个关于决策的动态神经场模型,用于联合行动任务。在其多层架构中,该模型反映了支持人类在合作任务中惊人高效和流畅互动的机制。最重要的是,该模型实现了对观察到的行为与要执行的互补行为之间的高度上下文敏感映射。正如我们的实际联合建筑实验所示,机器人根据行为发生的上下文做出响应。这与越来越多的实验证据一致,支持观察到的行为与执行行为在镜像回路中的匹配要比先前认为的更加灵活(Newman-Norlund,van Schie 等人,2007,van Schie 等人,2008)。由于对上下文、目标和目标导向行动序列的表征是相互关联的,该动态模型解释了观察到的运动行为加上情境线索如何直接激活相关目标和最合适的互补行动的自持群体表征。这种"自动"过程表明,在已知的任务环境中,合作者之间的行动、目标和意图在时间和空间上的协调可能会相当轻松地发生,而不需要完全发展的人类有意识控制能力(Hassin 等人,2005,Ferguson 和 Bargh,2004)。
The theoretical framework of dynamic neural fields has been first introduced to the motor domain to model metric and dynamic aspects of motor planning and decision making found in neurophysiological and behavioral studies (for a recent review see Schöner (2008)). DNF-based models implement two basic ideas: (1) movement plans evolve continuously in time and are updated at any time during movement preparation as a function of new sensory evidence, and (2) the brain performs action selection and motor planning in an integrated manner (Gold & Shadlen, 2002). The activity in a neural population representing a certain decision variable increases continuously in time as a result of accumulated evidence represented by input from connected populations. If a certain activation threshold is reached the integration process is over and the system is committed to a decision. In the model this transition is paralleled by a transition from an input-driven to a self-stabilized regime of the field dynamics. The decision variable may represent simple movement parameters such as direction and extent (e.g., Erlhagen & Schöner (2002)) or like in the present application complete temporal behaviors such as grasping or entire action sequences composed of action primitives. The discovery of the mirror neurons in premotor cortex suggests that in motor planning areas neural populations encoding very different levels of abstraction coexist (Rizzolatti & Craighero, 2004).
动态神经场论的理论框架最早被引入到运动领域,以模拟神经生理学和行为学研究中发现的运动规划和决策的度量和动态方面(参见最新评论 Schöner (2008))。基于动态神经场(DNF)的模型实现了两个基本思想:(1)运动计划随时间连续演化,并根据新的感觉证据在运动准备过程中随时更新;(2)大脑以集成的方式执行动作选择和运动规划(Gold & Shadlen, 2002)。代表某一决策变量的神经群体活动随着累积证据(由连接群体的输入表示)而持续增加。当达到某个激活阈值时,积分过程结束,系统做出决策。在模型中,这一转变与场动力学从输入驱动到自稳定状态的转变相一致。决策变量可以表示简单的运动参数,如方向和幅度(例如 Erlhagen & Schöner (2002)),也可以表示与本应用相似的完整时间行为,如抓取或由动作基元组成的整个行动序列。前运动皮质中发现的镜像神经元表明,在运动规划区域,编码不同抽象水平的神经群体共存(Rizzolatti & Craighero, 2004)。
动态神经场论的理论框架最早被引入到运动领域,以模拟神经生理学和行为学研究中发现的运动规划和决策的度量和动态方面(参见最新评论 Schöner (2008))。基于动态神经场(DNF)的模型实现了两个基本思想:(1)运动计划随时间连续演化,并根据新的感觉证据在运动准备过程中随时更新;(2)大脑以集成的方式执行动作选择和运动规划(Gold & Shadlen, 2002)。代表某一决策变量的神经群体活动随着累积证据(由连接群体的输入表示)而持续增加。当达到某个激活阈值时,积分过程结束,系统做出决策。在模型中,这一转变与场动力学从输入驱动到自稳定状态的转变相一致。决策变量可以表示简单的运动参数,如方向和幅度(例如 Erlhagen & Schöner (2002)),也可以表示与本应用相似的完整时间行为,如抓取或由动作基元组成的整个行动序列。前运动皮质中发现的镜像神经元表明,在运动规划区域,编码不同抽象水平的神经群体共存(Rizzolatti & Craighero, 2004)。
Bayesian models represent a quite popular alternative approach for modeling decision and integration processes in the face of uncertainty (Kersten and Yuille, 2003, Körding and Wolpert, 2006). More recently, Bayesian inference and belief propagation have been used as theoretical tools to model also aspects of joint action coordination (e.g., Cuijpers et al., 2006, Hoffman and Breazeal, 2007). It is important to note that the dynamic field framework is compatible with central aspects of probabilistic models. For instance, the pre-activation below threshold of several populations in the action execution layer due to prior task knowledge and contextual information may be interpreted in the sense of a probability density function for different complementary actions. This prior information has to be combined with evidence about the inferred goal of the co-actor. In fact, it can be shown that in the input-driven regime the field dynamics may implement Bayes’ rules (Cuijpers & Erlhagen, 2008). In our view, there are two major advantages of the dynamic field approach. First, stabilizing decision against noise and fluctuations in the input stream is of particular importance in cases of high conflict between alternative complementary actions. Second, as an example of the dynamical approach to cognition (Schöner, 2008), a DNF-based model allows us to address the important temporal dimension of goal coordination in joint action (Sebanz et al., 2006). The decision process linked to complementary actions unfolds over time under multiple influences which are themselves modeled as dynamic representations with proper time scales. As our experiments show, the absence or delay of information from layer IL will automatically lead to a decision that does not include the co-actor’s behavior (or its interpretation). This may cause a change of roles in the joint task execution. Normally, the teammate with the faster decision process takes the lead in the cooperative task and the observer follows by choosing an action which complements the inferred goal (e.g., grasping the object with a complementary grip in the handing-over sequence). This flexibility in joint task execution greatly contributes to an efficient team performance.
贝叶斯模型代表了一种相当流行的替代方法,用于在不确定性面前建模决策和整合过程(Kersten 和 Yuille,2003;Körding 和 Wolpert,2006)。最近,贝叶斯推断和信念传播已被用作建模联合行为协调各方面的理论工具(例如,Cuijpers 等人,2006;Hoffman 和 Breazeal,2007)。值得注意的是,动态场框架与概率模型的核心方面是兼容的。例如,由于先前的任务知识和上下文信息,在行动执行层中的多个群体的预激活低于阈值,可被解释为不同补充行动的概率密度函数。这种先前信息必须与关于协作者推断目标的证据相结合。事实上,可以证明在输入驱动的模式下,场动力学可能实现贝叶斯规则(Cuijpers 和 Erlhagen,2008)。在我们看来,动态场方法有两个主要优势。首先,在备选补充行动之间存在高度冲突的情况下,稳定决策对噪声和输入流中的波动尤其重要。其次,作为认知动力学方法(Schöner,2008)的一个例子,基于 DNF 的模型使我们能够解决联合行动中目标协调的重要时间维度(Sebanz 等人,2006)。与补充行动相关的决策过程在多重影响下随时间展开,这些影响本身被建模为具有适当时间尺度的动态表征。正如我们的实验所示,缺乏或延迟来自 IL 层的信息将自动导致一个不包括协作者行为(或其解释)的决策。这可能导致联合任务执行中的角色发生变化。 通常情况下,决策速度更快的队友在合作任务中担任主导角色,观察者则根据推断的目标选择一个补充性的动作(例如,在传递序列中采用一个补充的抓取方式)。这种在联合任务执行中的灵活性,极大地提高了团队的工作效率。
贝叶斯模型代表了一种相当流行的替代方法,用于在不确定性面前建模决策和整合过程(Kersten 和 Yuille,2003;Körding 和 Wolpert,2006)。最近,贝叶斯推断和信念传播已被用作建模联合行为协调各方面的理论工具(例如,Cuijpers 等人,2006;Hoffman 和 Breazeal,2007)。值得注意的是,动态场框架与概率模型的核心方面是兼容的。例如,由于先前的任务知识和上下文信息,在行动执行层中的多个群体的预激活低于阈值,可被解释为不同补充行动的概率密度函数。这种先前信息必须与关于协作者推断目标的证据相结合。事实上,可以证明在输入驱动的模式下,场动力学可能实现贝叶斯规则(Cuijpers 和 Erlhagen,2008)。在我们看来,动态场方法有两个主要优势。首先,在备选补充行动之间存在高度冲突的情况下,稳定决策对噪声和输入流中的波动尤其重要。其次,作为认知动力学方法(Schöner,2008)的一个例子,基于 DNF 的模型使我们能够解决联合行动中目标协调的重要时间维度(Sebanz 等人,2006)。与补充行动相关的决策过程在多重影响下随时间展开,这些影响本身被建模为具有适当时间尺度的动态表征。正如我们的实验所示,缺乏或延迟来自 IL 层的信息将自动导致一个不包括协作者行为(或其解释)的决策。这可能导致联合任务执行中的角色发生变化。 通常情况下,决策速度更快的队友在合作任务中担任主导角色,观察者则根据推断的目标选择一个补充性的动作(例如,在传递序列中采用一个补充的抓取方式)。这种在联合任务执行中的灵活性,极大地提高了团队的工作效率。
Although the focus of the present study was on testing the dynamic model in real-time human–robot interactions it is worth mentioning that the model makes also predictions for human joint action coordination that could be further investigated in experiments. The reaction time study by van Schie and colleagues (2008) that showed evidence for an automatic facilitation of a complementary response in a cooperative setting used a one-to-one mapping between observed and to-be-executed actions. It would be interesting to extend this study to a more realistic situation in which a single observed action is compatible with several complementary behaviors like in the present assembly task. Due to the lateral inhibition in the action execution layer, the level of pre-activation of populations will decrease whenever several response alternatives are simultaneously supported by contextual and/or task information. Since the level of pre-activation affects the rate at which the population activity rises, the model predicts a dependence of reaction time on the number and probability of choices in the cooperative task (for a dynamic field approach to the classical Hick–Heyman law see Erlhagen and Schöner (2002)).
尽管本次研究的重点是在真时人机互动中检验动态模型,但值得一提的是,该模型还对人类联合行动协调做出预测,可进一步在实验中进行调查。范·斯基和同事(2008)的反应时间研究表明,在合作环境中存在自动促进互补反应的证据,其使用了观察到的动作和要执行的动作之间的一对一映射。将这一研究扩展到一个更加现实的情况会很有趣,即单一观察到的动作与多种互补行为相兼容,就像在现有装配任务中一样。由于行动执行层存在侧翼抑制,只要同时存在多个反应替代方案,受上下文和/或任务信息支持,其预激活水平就会降低。由于预激活水平会影响种群活动上升的速率,模型预测合作任务中反应时间取决于选择的数量和概率(关于经典的 Hick–Heyman 定律的动态场方法,参见 Erlhagen 和 Schöner(2002))。
尽管本次研究的重点是在真时人机互动中检验动态模型,但值得一提的是,该模型还对人类联合行动协调做出预测,可进一步在实验中进行调查。范·斯基和同事(2008)的反应时间研究表明,在合作环境中存在自动促进互补反应的证据,其使用了观察到的动作和要执行的动作之间的一对一映射。将这一研究扩展到一个更加现实的情况会很有趣,即单一观察到的动作与多种互补行为相兼容,就像在现有装配任务中一样。由于行动执行层存在侧翼抑制,只要同时存在多个反应替代方案,受上下文和/或任务信息支持,其预激活水平就会降低。由于预激活水平会影响种群活动上升的速率,模型预测合作任务中反应时间取决于选择的数量和概率(关于经典的 Hick–Heyman 定律的动态场方法,参见 Erlhagen 和 Schöner(2002))。
Among the many questions about joint action coordination that have not been directly addressed in the present experiments, an important one concerns how the model could be extended to deal with the coordination of multiple tasks. We have started to explore this challenge in a join construction task in which the human–robot team has to assemble several distinct toy objects from a fixed set of components. Different from the present study, the robot does not directly participate in the construction work but serves as an intelligent assistant that pro-actively hands over components and informs the user about detected errors. By reducing the complexity of the action selection and execution process this choice allowed us to focus on the high-level planning aspects of the multi-object construction task. The dynamic field model implements the idea supported by many behavioral and neurophysiological studies that people encode goal-directed behaviors, such as assembling an object, by segmenting them into discrete actions, organized as goal-subgoal hierarchies (e.g., Hamilton and Grafton, 2008, Hard et al., 2006). In the model, a population encoding a particular object as the final goal of the assembly work pre-activates through synaptic links the various populations encoding all associated subgoals. In the CGSL, the representations of the currently available subgoals for the team become activated above threshold due to the input from connected pools representing already achieved subtasks. Since different objects may share a set of subtasks (e.g., attaching two parts in a specific manner) synaptic links may exist to representations of subgoals belonging to objects other than the one currently under construction. These representations lack, however, the additional input from a population encoding the final goal. Consequently, their activation level remains below threshold. Using the same neural representations as part of action plans belonging to different tasks is attractive from an engineering point of view since it allows us to optimize the computational resources. It is important to stress however that joint action coordination may also benefit from special purpose representations (e.g., the grasping populations described in Fogassi et al. (2005)). The activation of intentional action chains in ASL during action observation supports the capacity of the robot to react in anticipation of the co-actor’s motor intentions.
在目前实验中尚未直接解决的许多联合行动协调问题中,一个重要的问题是如何将该模型扩展到处理多个任务的协调。我们开始在一项联合建构任务中探索这一挑战,在该任务中,人机团队需要从一组固定的组件中组装几个不同的玩具对象。与现有研究不同,机器人不直接参与建构工作,而是作为一个主动为用户提供组件并告知检测到的错误的智能助手。通过减少行动选择和执行过程的复杂性,这一选择使我们能够专注于多物体建构任务的高级规划方面。动态场模型实现了许多行为和神经生理学研究支持的观点,即人们通过将目标导向的行为(如组装一个物体)分割成有目标-子目标层次结构组织的离散行动来编码它们(例如,Hamilton 和 Grafton,2008 年;Hard 等人,2006 年)。在该模型中,编码某个物体作为最终组装目标的一个种群通过突触链接预激所有相关子目标的各种种群。在 CGSL 中,当前可用的团队子目标的表征由于来自已完成子任务相关池的输入而被激活到阈值之上。由于不同的物体可能共享一组子任务(例如,以特定方式连接两个部件),因此可能存在到属于非当前建构对象的子目标表征的突触链接。然而,这些表征缺乏来自编码最终目标的种群的额外输入,因此其激活水平始终低于阈值。 从工程角度来看,将同样的神经表征作为不同任务的部分行动计划是有吸引力的,因为它允许我们优化计算资源。但是,需要强调的是,联合行动协调也可能从专门的表征中受益(例如,Fogassi et al. (2005)中描述的抓取群体)。在行动观察过程中,在 ASL 中激活有意图的行动链支持机器人预测合作者的运动意图做出反应的能力。
在目前实验中尚未直接解决的许多联合行动协调问题中,一个重要的问题是如何将该模型扩展到处理多个任务的协调。我们开始在一项联合建构任务中探索这一挑战,在该任务中,人机团队需要从一组固定的组件中组装几个不同的玩具对象。与现有研究不同,机器人不直接参与建构工作,而是作为一个主动为用户提供组件并告知检测到的错误的智能助手。通过减少行动选择和执行过程的复杂性,这一选择使我们能够专注于多物体建构任务的高级规划方面。动态场模型实现了许多行为和神经生理学研究支持的观点,即人们通过将目标导向的行为(如组装一个物体)分割成有目标-子目标层次结构组织的离散行动来编码它们(例如,Hamilton 和 Grafton,2008 年;Hard 等人,2006 年)。在该模型中,编码某个物体作为最终组装目标的一个种群通过突触链接预激所有相关子目标的各种种群。在 CGSL 中,当前可用的团队子目标的表征由于来自已完成子任务相关池的输入而被激活到阈值之上。由于不同的物体可能共享一组子任务(例如,以特定方式连接两个部件),因此可能存在到属于非当前建构对象的子目标表征的突触链接。然而,这些表征缺乏来自编码最终目标的种群的额外输入,因此其激活水平始终低于阈值。 从工程角度来看,将同样的神经表征作为不同任务的部分行动计划是有吸引力的,因为它允许我们优化计算资源。但是,需要强调的是,联合行动协调也可能从专门的表征中受益(例如,Fogassi et al. (2005)中描述的抓取群体)。在行动观察过程中,在 ASL 中激活有意图的行动链支持机器人预测合作者的运动意图做出反应的能力。
Even in routine tasks errors in joint action coordination may occur. The user may for instance select a part associated with a subgoal that has been already accomplished by the team or that represents an assembly step to be performed in the future only. It is thus important that the robot is able to cope with erroneous situations and unexpected events. In the present experiments, the robot points to a part in the co-actor’s workspace that he has seemingly overlooked. The observed request gesture is thus associated not only with the complementary behavior of handing over the required part to the user but also with a hand movement that aims at attracting the co-actor’s attention. The additional information about whether or not the part is located in the user’s workspace (represented in the OML) biases the selection in the AEL. A more sophisticated action monitoring system that would allow the robot to deal with errors on the intention level should be able to detect a mismatch between predicted outcomes of observed actions and the possible subgoals for the team. Within the DNF-framework this can be achieved by postulating the existence of neural representations that integrate the activity from populations in the IL and the CSGL. For instance, the co-actor’s request of a part that is not compatible with the current state of the construction will automatically activate a population representing this mismatch. Through synaptic links to neural pools in the AEL, the suprathreshold activity of this population may in turn bias the selection of an adequate corrective response (Bicho, Louro, Hipolito, & Erlhagen, 2009).
即使在日常任务中,联合行动协调也可能出现错误。例如,用户可能会选择一个已经由团队完成的子目标相关的部件,或者是只能在未来执行的装配步骤。因此,机器人有能力应对错误情况和意外事件非常重要。在现有的实验中,机器人指向协作者工作区域中似乎被忽略的部件。所观察到的请求手势不仅与补充性行为(向用户交接所需部件)相关,还与试图吸引协作者注意力的手势动作相关。关于该部件是否位于用户工作区域(在 OML 中表示)的额外信息,会影响 AEL 中的选择。一个更复杂的行动监控系统,能够检测观察到的行动预测结果与团队可能的子目标之间的不匹配,应该能够应对意图层面的错误。在 DNF 框架内,这可以通过假设存在整合来自 IL 和 CSGL 种群活动的神经表征来实现。例如,协作者请求一个与当前建造状态不兼容的部件,将自动激活一个代表这种不匹配的种群。通过与 AEL 中神经池的突触连接,这个种群的超阈值活动可能会偏向选择一个适当的纠正响应。
即使在日常任务中,联合行动协调也可能出现错误。例如,用户可能会选择一个已经由团队完成的子目标相关的部件,或者是只能在未来执行的装配步骤。因此,机器人有能力应对错误情况和意外事件非常重要。在现有的实验中,机器人指向协作者工作区域中似乎被忽略的部件。所观察到的请求手势不仅与补充性行为(向用户交接所需部件)相关,还与试图吸引协作者注意力的手势动作相关。关于该部件是否位于用户工作区域(在 OML 中表示)的额外信息,会影响 AEL 中的选择。一个更复杂的行动监控系统,能够检测观察到的行动预测结果与团队可能的子目标之间的不匹配,应该能够应对意图层面的错误。在 DNF 框架内,这可以通过假设存在整合来自 IL 和 CSGL 种群活动的神经表征来实现。例如,协作者请求一个与当前建造状态不兼容的部件,将自动激活一个代表这种不匹配的种群。通过与 AEL 中神经池的突触连接,这个种群的超阈值活动可能会偏向选择一个适当的纠正响应。
For the present robot experiments all inter-field connections were tailored by the designer. Consequently, testing the dynamic field model of decision making in joint action was restricted to the specific assembly task. An important long-term goal of our research is to endow the robot with learning and adaptation capacities that ultimately will allow the artificial agent to autonomously develop the cognitive skills necessary for efficient joint action in new tasks from a minimal set of in-built representations. We adopt here a socially guided machine learning paradigm in which a human trainer teaches a robot through demonstration and verbal or gestural commands in much the same way as parents teach their children (e.g., Otero et al., 2008, Thomaz and Breazeal, 2008). First experimental results of our attempt to apply a learning dynamics for establishing inter-field connections show the feasibility of the approach. Using correlation based learning rules with a gating that signals the success of behavior, we have shown for instance how mirror-like representations that support an action understanding capacity may develop during learning and practice (Erlhagen et al., 2006a, Erlhagen et al., 2006b). Importantly, the developmental process goes beyond a simple modification of parameters of pre-defined representations. It may explain the emergence of new task-specific populations which have not been introduced to the architecture by the human designer (Erlhagen et al., 2007).
对于目前的机器人实验,所有的跨领域连接都是由设计者量身定制的。因此,测试决策制定动态场模型在联合行动中的应用受到了特定装配任务的限制。我们研究的一个重要长期目标是赋予机器人学习和适应能力,最终使人工智能代理能够从最小的内置表示中自主发展必要的认知技能,以实现高效的联合行动。我们采用社会引导的机器学习范式,在这种范式中,人类教练通过示范和语言或手势命令来教授机器人,就像父母教育孩子一样(例如,Otero 等,2008 年,Thomaz 和 Breazeal,2008 年)。我们尝试应用学习动力学建立跨领域连接的初步实验结果显示了这种方法的可行性。使用基于相关的学习规则,并设置一个反映行为成功的门控,我们已经展示了如何在学习和实践过程中发展出支持行动理解能力的类镜像表征(Erlhagen 等人,2006a,Erlhagen 等人,2006b)。重要的是,发展过程超越了对预定义表征参数的简单修改。它可以解释新的特定任务群体的出现,这些群体并没有被人类设计者引入到结构中(Erlhagen 等人,2007 年)。
对于目前的机器人实验,所有的跨领域连接都是由设计者量身定制的。因此,测试决策制定动态场模型在联合行动中的应用受到了特定装配任务的限制。我们研究的一个重要长期目标是赋予机器人学习和适应能力,最终使人工智能代理能够从最小的内置表示中自主发展必要的认知技能,以实现高效的联合行动。我们采用社会引导的机器学习范式,在这种范式中,人类教练通过示范和语言或手势命令来教授机器人,就像父母教育孩子一样(例如,Otero 等,2008 年,Thomaz 和 Breazeal,2008 年)。我们尝试应用学习动力学建立跨领域连接的初步实验结果显示了这种方法的可行性。使用基于相关的学习规则,并设置一个反映行为成功的门控,我们已经展示了如何在学习和实践过程中发展出支持行动理解能力的类镜像表征(Erlhagen 等人,2006a,Erlhagen 等人,2006b)。重要的是,发展过程超越了对预定义表征参数的简单修改。它可以解释新的特定任务群体的出现,这些群体并没有被人类设计者引入到结构中(Erlhagen 等人,2007 年)。
A major goal of our group is to advance towards a new generation of robots able to interact with humans in a more natural and efficient manner. We believe that taking inspiration from biology to make both the observable trajectories and the cognitive processes supporting joint action more human-like is a promising approach since it makes the artificial agent more predictable for the human user. While many technical aspects of robotics (e.g., vision, sensorimotor coordination) have been simplified, we also believe that the robotics work based on the unifying framework of dynamic neural fields is potentially very interesting for researchers from the cognitive and neuroscience domains. Implementing the dynamics of cooperative joint action in an embodied cognitive system allows them to directly test their theories and hypothesis about joint action coordination.
我们团队的主要目标是推进新一代机器人的发展,使它们能够以更自然和高效的方式与人类互动。我们认为,从生物学中汲取灵感,使人工智能代理表现出更人性化的运动轨迹和认知过程,是一种有前景的方法,因为这使人类用户能够更容易预测人工智能的行为。尽管机器人技术的许多方面(如视觉、感知运动协调)已经被简化,但我们相信基于动态神经场统一框架的机器人研究对认知和神经科学领域的研究人员来说也非常有趣。在一个具有身体的认知系统中实现合作性的联合行为动力学,使研究人员能够直接检验他们关于联合行为协调的理论和假设。
我们团队的主要目标是推进新一代机器人的发展,使它们能够以更自然和高效的方式与人类互动。我们认为,从生物学中汲取灵感,使人工智能代理表现出更人性化的运动轨迹和认知过程,是一种有前景的方法,因为这使人类用户能够更容易预测人工智能的行为。尽管机器人技术的许多方面(如视觉、感知运动协调)已经被简化,但我们相信基于动态神经场统一框架的机器人研究对认知和神经科学领域的研究人员来说也非常有趣。在一个具有身体的认知系统中实现合作性的联合行为动力学,使研究人员能够直接检验他们关于联合行为协调的理论和假设。
Acknowledgments 致谢
The present research was conducted in the context of the fp6-IST2 EU-IP Project JAST (Project No. 003747) and partly financed by the FCT grants POCI/V.5/A0119/2005 and CONC-REEQ/17/2001. We would like to thank Profs. Harold Bekkering and Ruud Meulenbroek for the numerous discussions, and Emanuel Sousa, Flora Ferreira, Nzoji Hipolito, Rui Silva and Toni Machado for their help during the robotic experiments. We also thank the anonymous reviewers for their insightful comments and feedback on how to improve the manuscript.
本研究是在 FP6-IST2 EU-IP 项目 JAST(项目编号 003747)的背景下进行的,并部分由 FCT 资助,授予号为 POCI/V.5/A0119/2005 和 CONC-REEQ/17/2001。我们要感谢哈罗德·贝克林和鲁德·米伦布鲁克教授进行了许多次讨论,还要感谢埃马努埃尔·苏萨、弗洛拉·费雷拉、恩佐吉·希波利托、瑞·席尔瓦和托尼·马察多在机器人实验期间提供的帮助。我们也感谢匿名评审人提出的富有洞见的意见和反馈,用于改进本手稿。
本研究是在 FP6-IST2 EU-IP 项目 JAST(项目编号 003747)的背景下进行的,并部分由 FCT 资助,授予号为 POCI/V.5/A0119/2005 和 CONC-REEQ/17/2001。我们要感谢哈罗德·贝克林和鲁德·米伦布鲁克教授进行了许多次讨论,还要感谢埃马努埃尔·苏萨、弗洛拉·费雷拉、恩佐吉·希波利托、瑞·席尔瓦和托尼·马察多在机器人实验期间提供的帮助。我们也感谢匿名评审人提出的富有洞见的意见和反馈,用于改进本手稿。
Appendix A. Supplementary data
附录 A.补充数据
References
- Amari, 1977Dynamics of pattern formation in lateral-inhibitory type neural fieldsBiological Cybernetics, 27 (1977), pp. 77-87
- Bicho et al., 2010Integrating verbal and nonverbal communication in a dynamic neural field architecture for human–robot interaction
- Bicho et al., 2008A dynamic neural field architecture for flexible and fluent human–robot interactionProceedings of the 2008 international conference on cognitive systems, University of Karlsruhe, Germany (2008), pp. 179-185
- Bicho et al., 2009Bicho, E., Louro, L., Hipolito, N., & Erlhagen, W. (2009). A dynamic field approach to goal inference and error monitoring for human–robot interaction. In K. Dautenhahn (Ed.), Proceedings of the 2009 international symposium on new frontiers in human–robot interaction. AISB 2009 Convention (pp.31–37) Edinburgh; Heriot-Watt University.
- Bicho et al., 2000Target representation on an autonomous vehicle with low-level sensorsThe International Journal of Robotics Research, 19 (2000), pp. 424-447
- Blakemore and Decety, 2001From the perception of action to the understanding of intentionNature Reviews Neuroscience, 2 (2001), pp. 561-567
- Brass et al., 2001Movement observation affects movement execution in a simple response taskActa Psychologica, 106 (2001), pp. 3-22
- Caggiano et al., 2009Mirror neurons differentially encode the peripersonal and the extrapersonal space in monkeyScience, 324 (2009), pp. 403-406