这是用户在 2024-4-30 18:17 为 https://arxiv.org/html/2402.12393v2?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
License: CC BY-NC-ND 4.0 许可证:CC BY-NC-ND 4.0
arXiv:2402.12393v2 [cs.HC] 02 Apr 2024
arXiv:2402.12393v2 [cs.HC] 2024 年 4 月 2 日

On Automating Video Game Regression Testing by Planning and Learning
通过规划和学习来自动化视频游戏回归测试

Tomáš Balyo, G. Michael Youngblood, Filip Dvořák, Lukáš Chrpa, and Roman Barták
Tomáš Balyo,G. Michael Youngblood,Filip Dvořák,Lukáš Chrpa 和 Roman Barták
Abstract 摘要

In this paper, we propose a method and workflow for automating regression testing of certain video game aspects using automated planning and incremental action model learning techniques. The basic idea is to use detailed game logs and incremental action model learning techniques to maintain a formal model in the planning domain description language (PDDL) of the gameplay mechanics. The workflow enables efficient cooperation of game developers without any experience with PDDL or other formal systems and a person experienced with PDDL modeling but no game development skills. We describe the method and workflow in general and then demonstrate it on a concrete proof-of-concept example — a simple role-playing game provided as one of the tutorial projects in the popular game development engine Unity. This paper presents the first step towards minimizing or even eliminating the need for a modeling expert in the workflow, thus making automated planning accessible to a broader audience.
在本文中,我们提出了一种方法和工作流程,用于利用自动规划和增量动作模型学习技术自动化回归测试某些视频游戏方面。基本思想是利用详细的游戏日志和增量动作模型学习技术来维护游戏机制的规划领域描述语言(PDDL)的形式模型。该工作流程使得游戏开发人员能够高效地合作,而无需具有 PDDL 或其他形式系统经验的人员以及具有 PDDL 建模经验但没有游戏开发技能的人员。我们首先概述了该方法和工作流程,然后在一个具体的概念验证示例上进行演示 —— 这是一个简单的角色扮演游戏,作为流行的游戏开发引擎 Unity 中的教程项目之一。本文介绍了在工作流程中最小化甚至消除建模专家需求的第一步,从而使自动规划对更广泛的受众可及。

Introduction 简介

Game testing is a vital yet complex component of video game development, focusing on identifying and resolving bugs, performance issues, and ensuring a high-quality user experience. This multifaceted process involves testing diverse game scenarios, enhancing user interface and playability, and maintaining stability and efficiency. It also includes compliance with standards and regulations, especially for games with online components. Modern game testing combines automated and manual methods, often engaging professional testers and the gaming community, to not only fix issues but also elevate overall player satisfaction and game quality. However, automation of testing methods still has a long way to go, especially for techniques that do not significantly disrupt developer workflow (Politowski, Guéhéneuc, and Petrillo 2022). This paper focuses on a contribution to those automated methods by leveraging planning.
游戏测试是视频游戏开发中至关重要且复杂的组成部分,侧重于识别和解决错误、性能问题,并确保高质量的用户体验。这个多方面的过程涉及测试不同的游戏场景,增强用户界面和可玩性,以及保持稳定性和效率。它还包括遵守标准和法规,特别是对于带有在线组件的游戏。现代游戏测试结合了自动化和手动方法,通常会吸引专业测试人员和游戏社区,不仅修复问题,还提升整体玩家满意度和游戏质量。然而,测试方法的自动化仍有很长的路要走,特别是对于不会显著干扰开发者工作流程的技术(Politowski, Guéhéneuc, and Petrillo 2022)。本文重点介绍了通过规划来促进这些自动化方法的贡献。

Planning is a less common skill game developers utilize but can be a powerful tool for game testing. One key aspect of that difficulty comes from the authorial burden of writing domain and problem descriptions in PDDL (Planning Domain Definition Language)  (Ghallab et al. 1998). Good PDDL modeling is a rare skill that develops over time, so an efficient use of modelers by enabling them to be more effective is another key to adoption.
规划是游戏开发者利用较少的技能,但可以成为游戏测试的强大工具。其中一个困难的关键方面来自于在 PDDL(规划领域定义语言)中编写领域和问题描述的作者负担(Ghallab 等,1998 年)。良好的 PDDL 建模是一种难得的技能,随着时间的推移而发展,因此通过使建模者更加有效地使用他们来实现更高效的使用是另一个采用的关键。

Our work addresses these critical issues to help game developers utilize planners for generating test scripts driven by PDDL through automation support. We accomplish this by taking the game logs, produced in software development, providing easy-to-implement guidance to augment the logging for action model capture, and performing automated domain synthesis to generate PDDL models easily validated, verified, and adjusted by the game developer. These good PDDL examples also improve the developers’ understanding and authorial abilities in PDDL in the flow. Goal templates are then used to formulate domain problems of interest for testing. We demonstrate our approach and process on a simple RPG game in the ubiquitous Unity game engine  (Unity Technologies 2023).
我们的工作致力于解决这些关键问题,帮助游戏开发者利用规划器通过自动化支持生成由 PDDL 驱动的测试脚本。我们通过获取在软件开发中产生的游戏日志,提供易于实施的指导以增强日志记录以捕获动作模型,并执行自动化领域合成以生成易于由游戏开发者验证、验证和调整的 PDDL 模型。这些良好的 PDDL 示例还提高了开发者在 PDDL 方面的理解和创作能力。然后使用目标模板来制定感兴趣的领域问题进行测试。我们在广泛使用的 Unity 游戏引擎(Unity Technologies 2023)上展示了我们的方法和过程。

Motivation 动机

We will focus on regression testing during game development, i.e., running frequent tests to ensure that recent changes did not introduce unintended breaks and the game still performs as expected. This is a particularly repetitive and dull task for a human tester but crucial for efficient development. It is often the case, that longer the time difference between the introduction and discovery of the bug, the harder it is to fix.
在游戏开发过程中,我们将专注于回归测试,即频繁运行测试以确保最近的更改没有引入意外故障,并且游戏仍然按预期运行。这对于人类测试人员来说是一项特别重复和乏味的任务,但对于高效的开发至关重要。通常情况下,bug 引入和发现之间的时间差越长,修复起来就越困难。

What can be tested with Planning
计划可以测试什么

The most obvious and important usage is to test whether the game can be completed, i.e., each intended objective can be achieved from a starting state of each scenario. Next, we can generate a large amount of random test scenarios. This approach is beneficial if we can execute and evaluate test scenarios automatically. Furthermore, we can search for dead-ends in the game, i.e., states that can be reached but are impossible to continue from or to win. In modern game design, such states are most undesirable and annoying for the players. If the PDDL model is detailed enough, one can detect such dead ends by running planners from random reachable states. Last but not least, one can prove that undesired shortcuts are impossible through regular gameplay using optimal planners.
最明显和重要的用途是测试游戏是否可以完成,即从每个场景的起始状态实现每个预期目标。接下来,我们可以生成大量的随机测试场景。如果我们能够自动执行和评估测试场景,这种方法是有益的。此外,我们可以搜索游戏中的死胡同,即可以到达但无法继续或获胜的状态。在现代游戏设计中,这些状态对玩家来说是最不希望和最烦人的。如果 PDDL 模型足够详细,可以通过从随机可达状态运行规划器来检测这些死胡同。最后但同样重要的是,可以通过使用最佳规划器进行常规游戏来证明不希望的捷径是不可能的。

Applicability 适用性

The classical planning based testing approach is most suitable for games with many discreet causal interactions such as role playing games (RPG), point-and-click adventures, strategy games, visual novels or puzzle games. It is less suitable for fast-paced action oriented games such as shooters or driving games. For such games, reinforcement learning based approaches, that have been studied in previous work, seem to better suitable (see the Related Work Section of this paper). Overall, we observe that the reinforcement learning and the planning based approaches can complement each other rather well.
基于经典规划的测试方法最适合具有许多离散因果交互的游戏,如角色扮演游戏(RPG)、点与点击冒险游戏、策略游戏、视觉小说或益智游戏。对于快节奏的动作导向游戏,如射击游戏或驾驶游戏,这种方法就不太适用。对于这类游戏,之前研究过的强化学习方法似乎更为适用(请参阅本文的相关工作部分)。总的来说,我们观察到强化学习和基于规划的方法可以相互很好地补充。

Preliminaries 准备工作

The planning problem is to find a plan — a sequence of grounded actions that transform the world from an initial state to a goal state, i.e., a state where all goal conditions are satisfied. A planning problem instance consists of a domain definition and a task definition. The domain definition describes possible actions with their preconditions and effects. In STRIPS planning (Fikes and Nilsson 1971), both preconditions and effects are sets of predicates and negated predicates connected by a conjunction. The task definition contains the descriptions of the initial state and the goal conditions.
规划问题是找到一个计划——一系列基于行动的序列,将世界从初始状态转换到目标状态,即所有目标条件都得到满足的状态。规划问题实例包括领域定义和任务定义。领域定义描述可能的行动及其前置条件和效果。在 STRIPS 规划中(Fikes 和 Nilsson 1971),前置条件和效果都是由连接的谓词和否定谓词集合组成的。任务定义包含初始状态和目标条件的描述。

Planning action model learning (AML) involves synthesizing a domain definition from logs with state sequences. Many algorithms have been published for this task (Wang 1996; Aineto, Celorrio, and Onaindia 2019; Zhuo et al. 2010; Yang, Wu, and Jiang 2007; Juba, Le, and Stern 2021), a recent overview and new implementation of multiple techniques is provided in the MacQ project (Callanan et al. 2022). For the evaluation in this paper we use our own new action learning method (Balyo et al. 2024).
规划行动模型学习(AML)涉及从具有状态序列的日志中合成领域定义。已经发表了许多算法用于此任务(Wang 1996; Aineto, Celorrio, 和 Onaindia 2019; Zhuo 等 2010; Yang, Wu, 和 Jiang 2007; Juba, Le, 和 Stern 2021),MacQ 项目提供了最近的概述和多种技术的新实现(Callanan 等 2022)。在本文的评估中,我们使用了我们自己的新行动学习方法(Balyo 等 2024)。

Refer to caption
Figure 1: The overview of the workflow for acquiring PDDL domain models based on logs from the game execution. The objects with white background represent the tasks best done by game developer(s). The tasks and decisions with a blue background should be done by the PDDL modeller. The light green items can be done automatically.
图 1:基于游戏执行日志获取 PDDL 领域模型的工作流程概述。白色背景的对象代表最适合由游戏开发者完成的任务。蓝色背景的任务和决策应由 PDDL 建模者完成。浅绿色项目可以自动完成。
1...
22023-08-15 19:31:20 697 ;NEXT-STATE
32023-08-15 19:31:21 916 (not (questState apples-quest ready));start_quest
42023-08-15 19:31:21 916 (questState apples-quest started);start_quest
52023-08-15 19:31:21 917 ;NEXT-STATE
62023-08-15 19:31:24 315 (not (location player 8,-4));move
72023-08-15 19:31:24 315 (location player 9,-4);move
82023-08-15 19:31:24 316 ;NEXT-STATE
92023-08-15 19:31:24 574 (not (location player 9,-4));move
102023-08-15 19:31:24 574 (location player 10,-4);move
112023-08-15 19:31:24 574 ;NEXT-STATE
122023-08-15 19:31:24 598 (not (collected apples 0));pickup
132023-08-15 19:31:24 598 (collected apples 1);pickup
142023-08-15 19:31:24 598 (not (location golden-apple0 10,-4));pickup
152023-08-15 19:31:24 598 (location golden-apple0 none);pickup
162023-08-15 19:31:24 598 ;NEXT-STATE
17...
182023-08-15 19:31:29 816 ;NEXT-STATE
192023-08-15 19:31:30 055 (not (questState apples-quest started));complete_quest
202023-08-15 19:31:30 056 (questState apples-quest complete);complete_quest
212023-08-15 19:31:30 056 ;NEXT-STATE
22...
Figure 2: A fragment of the log file created by playing the game and used for action model acquisition.
图 2:游戏中创建的日志文件片段,用于行动模型获取。

Related Work 相关工作

Planning is rooted in search and, as such, has had a long history with games. At the beginning of AI, the focus was on games like chess  (Newell, Simon et al. 1972) and checkers  (Samuel 1959), which initially relied on the search for solutions. Planning continued to dominate in the 80s and 90s with checkers and chess with Deep Blue  (Hsu et al. 1990) and Chinook  (Schaeffer et al. 1996). Agent architectures and production systems added value, and soon, planning started to add value in games like bridge  (Smith, Nau, and Throop 1998) and the class of Real-Time Strategy (RTS) games  (Chung, Buro, and Schaeffer 2005). In  (Duarte et al. 2020), they survey the history of planning and learning in games, covering the spectrum as well as diving into the lineage of planning from search, minimax and alpha-beta pruning, hierarchical task networks, and Monte Carlo Tree search, through classical planning, rapidly-exploring random trees, case-based planning, and behavior trees. Most of the work is focused on creating AI-driven opponents (Wurman et al. 2022), which are sometimes used to play both sides for evaluation, AI training, and testing.
规划根植于搜索之中,因此在游戏领域有着悠久的历史。在人工智能初期,重点放在象棋(Newell,Simon 等人,1972 年)和跳棋(Samuel,1959 年)等游戏上,最初依赖于搜索解决方案。规划在 80 年代和 90 年代继续主导着跳棋和象棋,如深蓝(Hsu 等人,1990 年)和 Chinook(Schaeffer 等人,1996 年)。代理架构和生产系统增加了价值,很快,规划开始在桥牌(Smith,Nau 和 Throop,1998 年)和实时战略(RTS)游戏(Chung,Buro 和 Schaeffer,2005 年)等游戏中增加价值。在(Duarte 等人,2020 年)中,他们调查了游戏中规划和学习的历史,涵盖了整个领域,并深入探讨了从搜索、极小化和 Alpha-Beta 剪枝、层次任务网络和蒙特卡洛树搜索,再到经典规划、快速探索随机树、基于案例的规划和行为树的规划血统。大部分工作集中在创建 AI 驱动的对手(Wurman 等人,2022 年),有时用于双方评估、AI 训练和测试。

Automated testing with AI has been a rising research focus more recently with work that has focused on agent-based approaches that include navigation mesh pathfinding (Shirzadehhajimahmood et al. 2021), reinforcement learning agents for finding design and environmental defects  (Ariyurek, Betin-Can, and Surer 2019; Ferdous et al. 2022), reinforcement learning for load testing  (Tufano et al. 2022), modeling of user interaction for boundary testing  (Owen, Anton, and Baker 2016), search for test case generation  (Ferdous et al. 2021b), and search for automated play testing  (Ferdous et al. 2021a). Despite planning use in game AI, we do not see its use in game testing more broadly beyond back to search. However, as evidenced by Bram Ridder’s (AI Programmer for Rebellion) keynote talk at the 2021 AIIDE Conference on “Improved Automated Game Testing Using Domain-Independent AI Planning” (Riddler 2021) and his 2021 GDC AI Summit talk “Automated Game Testing Using a Numeric Domain Independent AI Planner,” planning techniques for game testing are beginning to be used in the games industry mixed in with calls for more AI automation of testing (Fray 2023).
近年来,基于代理的方法已成为自动化 AI 测试的研究重点,其中包括导航网格路径规划(Shirzadehhajimahmood 等,2021 年)、用于发现设计和环境缺陷的强化学习代理(Ariyurek,Betin-Can 和 Surer,2019 年;Ferdous 等,2022 年)、用于负载测试的强化学习(Tufano 等,2022 年)、用户交互建模用于边界测试(Owen,Anton 和 Baker,2016 年)、测试用例生成的搜索(Ferdous 等,2021b 年)以及自动化游戏测试的搜索(Ferdous 等,2021a 年)。尽管规划在游戏 AI 中得到应用,但我们并未看到其在游戏测试中更广泛的应用,除了回到搜索。然而,根据 Bram Ridder(Rebellion 的 AI 程序员)在 2021 年 AIIDE 会议上的主题演讲“使用领域无关 AI 规划改进自动化游戏测试”(Riddler,2021 年)以及他在 2021 年 GDC AI 峰会上的演讲“使用数值领域无关 AI 规划进行自动化游戏测试”,游戏测试的规划技术开始在游戏行业中得到应用,并呼吁更多的 AI 自动化测试(Fray,2023 年)。

The General Workflow 一般工作流程

The general workflow is visualized in a flowchart in Figure 1. Each task requires a person (or a group) having one of the following two skill sets:
通用工作流程在图 1 的流程图中可视化。每个任务都需要一个具有以下两种技能之一的人(或一组人):

  • A game developer or a game designer (Developer for short). This is someone familiar with the rules of the game and the ability to modify the source code of the game to add log generation functionality. Also, they need to be able to play the game in such a way that all the available game mechanics are utilized. Lastly, they must be capable of (automatically) evaluating whether a given test script is consistent with the game’s rules.


    • 游戏开发人员或游戏设计师(简称为开发人员)。这是一个熟悉游戏规则并具有修改游戏源代码以添加日志生成功能的能力的人。此外,他们需要能够以使所有可用的游戏机制得以利用的方式玩游戏。最后,他们必须能够(自动地)评估给定的测试脚本是否与游戏规则一致。
  • A person with PDDL modeling skills (Modeller for short). This is someone who knows PDDL and has some experience with modeling planning scenarios in this language.


    具有 PDDL 建模技能的人(简称建模者)。这是指懂得 PDDL 并具有一定建模规划场景经验的人。

We start by writing an informal but structured description of all the legal player moves/actions. For each player action, we need to provide a name and list of preconditions (what conditions need to be valid in order to allow the player to play this action) and effects (what changes after the action is played out) written in natural language. This task should be easily doable by the developer. It requires no PDDL modeling experience, even though the required structure of the action is very similar to that of an STRIPS action.
我们首先要编写一个非正式但结构化的描述,列出所有合法的玩家移动/动作。对于每个玩家动作,我们需要提供一个名称以及前提条件(玩家需要满足哪些条件才能执行此动作)和效果(动作执行后会发生什么变化)的自然语言列表。开发人员应该很容易完成这项任务。这不需要 PDDL 建模经验,尽管动作的所需结构与 STRIPS 动作非常相似。

Next, the informal action descriptions are handed to a modeler. The modeler’s task at this point is only to identify a set of predicates required to formally describe all the properties mentioned in the preconditions and effects sections of the action descriptions.
接下来,非正式的动作描述被交给建模者。建模者此时的任务仅是识别一组谓词,以正式描述动作描述的前提条件和效果部分中提到的所有属性。

Then, the set of predicates is given to a developer who needs to extend the game’s source code to facilitate the logging of changes to those predicates. Now, we are ready to play the game while collecting logs. The logs are passed to an action model learning (AML) tool, and a PDDL model is automatically generated.
然后,将谓词集合提供给需要扩展游戏源代码以便记录这些谓词变化的开发人员。现在,我们准备好玩游戏并收集日志。这些日志将传递给一个动作模型学习(AML)工具,并自动生成一个 PDDL 模型。

Many games already produce logs, so why cannot we just use existing logs instead of modifying the code and creating new ones? In principle, we could do that. However, in our experience, the existing logs (if they are available at all) usually only contain method calls and error messages and do not record changes of state variables. Those kinds of logs unfortunately cannot be used to learn action models because the required information (state changes) is not present in them.
许多游戏已经生成日志,那么为什么我们不能只使用现有的日志,而不是修改代码并创建新的呢?原则上,我们可以这样做。然而,根据我们的经验,现有的日志(如果有的话)通常只包含方法调用和错误消息,并不记录状态变量的更改。遗憾的是,这种类型的日志无法用于学习行动模型,因为所需的信息(状态变化)在其中并不存在。

Now, it is the modeler’s turn again, as they need to examine and evaluate the generated PDDL model manually. Due to the imprecise nature of AML algorithms, the generated model often needs minor corrections and adjustments. After such adjustments, it is possible to verify automatically whether the new PDDL model is still consistent with the logs. Note that the first model coming from the AML is always consistent with the logs (this is a property of the used action model learning algorithm (Balyo et al. 2024)). If the model is consistent with the logs, the modeler can either continue editing or decide that it is good enough and can be passed to the planning step. On the other hand, if the model is inconsistent with the logs, we need to either change the PDDL model or the logs. Changing the logs may be required because they are inconsistent due to bugs in the implementation of the logging process or because we are not logging the proper predicates. In the latter case, we may need to go back all the way to the beginning of the process and adjust the informal description, update the set of required predicates, and repeat all the follow-up steps (see Figure 1).
现在,又轮到建模者出场了,因为他们需要手动检查和评估生成的 PDDL 模型。由于 AML 算法的不精确性,生成的模型通常需要进行轻微的修正和调整。在进行这些调整之后,可以自动验证新的 PDDL 模型是否仍与日志一致。请注意,来自 AML 的第一个模型始终与日志一致(这是所使用的动作模型学习算法的特性(Balyo 等人,2024 年))。如果模型与日志一致,建模者可以继续编辑,或者决定它已经足够好,可以传递到规划步骤。另一方面,如果模型与日志不一致,我们需要更改 PDDL 模型或日志。可能需要更改日志,因为由于日志记录过程的实现中存在错误或者因为我们没有记录正确的谓词而导致不一致。在后一种情况下,我们可能需要回到整个过程的开始,调整非正式描述,更新所需谓词集,并重复所有后续步骤(见图 1)。

If the logs are consistent with the current domain model and the modeler believes it is accurate, we can use a planner to create a collection of plans. These plans are then handed over to the developers, who can check whether they are consistent with the game’s rules following some simple goal templates. This process can be done manually or automatically if the game can execute test scripts. If no problems are discovered with the generated plans/test scripts, we are finished, and we have obtained a PDDL model representing the game’s rules. If we discover any problems, we handle the situation the same way as in the case of having a domain inconsistent with the logs (see Figure 1).
如果日志与当前领域模型一致,并且建模者认为它是准确的,我们可以使用规划器创建一系列计划。然后将这些计划交给开发人员,他们可以检查这些计划是否符合一些简单的目标模板,以确定是否与游戏规则一致。这个过程可以手动完成,也可以自动完成,如果游戏能够执行测试脚本的话。如果生成的计划/测试脚本没有发现问题,那么我们完成了,获得了代表游戏规则的 PDDL 模型。如果我们发现任何问题,我们会处理这种情况,方式与领域与日志不一致的情况相同(见图 1)。

Proof of concept: A Simple RPG Game
概念验证:一个简单的 RPG 游戏

We will demonstrate our workflow on the tutorial project of “Creator Kit: RPG” (Unity Technologies 2019), available in the popular video game engine Unity (Unity Technologies 2023) versions 2019.1 – 2021.3.
我们将在“创作者工具包:RPG”教程项目上演示我们的工作流程(Unity Technologies 2019),该项目可在流行的视频游戏引擎 Unity(Unity Technologies 2023)版本 2019.1 - 2021.3 中使用。

It is a basic single-player RPG with a 2D environment viewed from above. The player controls a hero character using the arrow keys. The hero can talk to non-player characters (NPCs) on the map. Some NPCs can provide a “quest” (a task for the hero to complete) when spoken to. The quests all follow the pattern of: “collect n𝑛nitalic_n items of type T𝑇Titalic_T”. After a quest is activated, the relevant items appear on the map. To pick up an item, the hero only needs to walk near it. When the required amount of items is collected, the player must visit the NPC who provided the quest to complete it. There are two quests in the demo: “collect 3 golden apples” and “collect 10 chickens”.
这是一个基本的单人 RPG,采用俯视角 2D 环境。玩家使用方向键控制英雄角色。英雄可以与地图上的非玩家角色(NPC)交谈。一些 NPC 在交谈时可以提供“任务”(英雄需要完成的任务)。所有任务都遵循以下模式:“收集 n𝑛nitalic_nT𝑇Titalic_T 类型的物品”。任务激活后,相关物品会出现在地图上。英雄只需靠近物品即可拾取。当收集到所需数量的物品时,玩家必须访问提供任务的 NPC 以完成任务。演示中有两个任务:“收集 3 个金苹果”和“收集 10 只鸡”。

Following the workflow defined above (in Figure 1), we need to start by providing an informal but structured description of the possible actions. For our simple example, we only require four actions described in Table 1.
按照上述定义的工作流程(见图 1),我们需要从提供非正式但结构化的可能操作描述开始。对于我们的简单示例,我们只需要描述表 1 中的四个操作。

Move hero
Hero is at the start tile
英雄在起始块
Hero is at the goal tile
英雄在目标块
Start and goal tiles
起点和终点瓦片
are neighbours 是邻居
Pick up quest item
拾取任务物品
Hero is at the same
英雄在同一个
The item is gone
物品已经消失
location as the item
物品的位置
The quest related to
与任务相关
The hero has one more of
英雄多了一个
this item is active
这个物品是激活的
this kind of item
这种物品
Start quest
The quest is ready
任务已准备就绪
The quest is active
任务正在进行中
Hero is at the same location
英雄在相同位置
as the quest giver
作为任务发布者
Complete quest 完成任务
The quest is active
任务正在进行中
The quest is done
任务完成
The hero has enough items
英雄有足够的物品
of the required type
所需类型的来源
Hero is at the same location
英雄在相同位置
as the quest giver
作为任务发布者
Table 1: Structured informal description of player actions. The left column is the preconditions, right is the effects.
表 1:玩家行动的结构化非正式描述。左列是前提条件,右列是影响。
1(define (domain rpg)
2    (:requirements :strips :typing :negative-preconditions)
3    (:types
4        number state location locatable typename - object
5        quest item hero - locatable
6    )
7    (:constants
8      active ready done - state
9    )
10    (:predicates
11        (next ?n1 - number ?n2 - number)
12        (connected ?l1 - location ?l2 - location)
13        (questState ?q - quest ?s - state)
14        (itemType ?i - item ?t - typename)
15        (questItemType ?q - quest ?t - typename)
16        (location ?l - locatable ?loc - location)
17        (collected ?t - typename ?n - number)
18        (questItemCount ?q - quest ?n - number)
19    )
20    (:action start_quest
21        :parameters(?quest - quest ?hero - hero ?location - location)
22        :precondition (and
23            (questState ?quest ready)
24            (location ?hero ?location)
25            (location ?quest ?location)
26        )
27        :effect (and
28            (not (questState ?quest ready))
29            (questState ?quest active)
30        )
31    )
32    (:action complete_quest
33        :parameters(?quest - quest ?hero - hero ?typename - typename
34                    ?number - number ?location - location)
35        :precondition (and
36            (questState ?quest active)
37            (location ?hero ?location)
38            (location ?quest ?location)
39            (collected ?typename ?number)
40            (questItemType ?quest ?typename)
41            (questItemCount ?quest ?number)
42        )
43        :effect (and
44            (not (questState ?quest active))
45            (questState ?quest done)
46        )
47    )
Figure 3: First half of the RPG domain PDDL description.
图 3:RPG 领域 PDDL 描述的前半部分。
1    (:action pickup
2        :parameters(?hero - hero ?item - item ?location - location
3                    ?number-1 ?number-2 - number ?typename - typename ?quest - quest)
4        :precondition (and
5            (collected ?typename ?number-1)
6            (location ?item ?location)
7            (location ?hero ?location)
8            (itemType ?item ?typename)
9            (questItemType ?quest ?typename)
10            (questState ?quest active)
11            (next ?number-1 ?number-2)
12        )
13        :effect (and
14            (not (collected ?typename ?number-1))
15            (not (location ?item ?location))
16            (collected ?typename ?number-2)
17        )
18    )
19    (:action move
20        :parameters(?hero - hero ?location-1 ?location-2 - location)
21        :precondition (and
22            (location ?hero ?location-1)
23            (connected ?location-1 ?location-2)
24        )
25        :effect (and
26            (not (location ?hero ?location-1))
27            (location ?hero ?location-2)
28        )
29    )
30)
Figure 4: Second half of the RPG domain PDDL description.
图 4:RPG 领域 PDDL 描述的后半部分。
1(define (problem level1)
2    (:domain rpg)
3    (:objects
4        player - hero
5        ready - state
6        apples chicken - typename
7        apples-quest chicken-quest - quest
8        chicken0 ... chicken9 - item
9        golden-apple0 ... golden-apple2 - item
10        n-1x-7 ... n9x2 - location
11        n0 ... n9 - number
12    )
13    (:init
14        (collected apples n0)
15        (collected chicken n0)
16        (questItemCount apples-quest n3)
17        (questItemCount chicken-quest n10)
18        (questItemType apples-quest apples)
19        (questItemType chicken-quest chicken)
20        (questState apples-quest ready)
21        (questState chicken-quest ready)
22        (location player n4x10)
23        (location apples-quest n8x-4)
24        (location chicken-quest n-6x7)
25        (connected n-1x-7 n-2x-7)
26        ...
27        (itemType chicken0 chicken)
28        (location chicken0 n-1x3)
29        ...
30        (itemType golden-apple0 apples)
31        (location golden-apple0 n10x-4)
32        ...
33        (next n0 n1)
34        ...
35    )
36    (:goal (and
37        (questState apples-quest done)
38        (questState chicken-quest done)
39    ))
40)
Figure 5: The problem file for the RPG demo. Some of the objects and initial state predicates are redacted to shorten the listing.
图 5:RPG 演示的问题文件。一些对象和初始状态谓词已被删除以缩短列表。
1...
222: MOVE PLAYER N10X-4 N9X-4
323: MOVE PLAYER N9X-4 N8X-4
424: START_QUEST APPLES-QUEST PLAYER N8X-4
525: MOVE PLAYER N8X-4 N9X-4
626: MOVE PLAYER N9X-4 N10X-4
727: PICKUP PLAYER GOLDEN-APPLE0 N10X-4 N0 N1 APPLES APPLES-QUEST
828: MOVE PLAYER N10X-4 N10X-3
929: MOVE PLAYER N10X-3 N10X-2
1030: MOVE PLAYER N10X-2 N10X-1
1131: MOVE PLAYER N10X-1 N10X0
1232: PICKUP PLAYER GOLDEN-APPLE1 N10X0 N1 N2 APPLES APPLES-QUEST
1333: MOVE PLAYER N10X0 N11X0
1434: MOVE PLAYER N11X0 N11X-1
1535: PICKUP PLAYER GOLDEN-APPLE2 N11X-1 N2 N3 APPLES APPLES-QUEST
1636: MOVE PLAYER N11X-1 N11X-2
1737: MOVE PLAYER N11X-2 N10X-2
18...
Figure 6: A fragment of the plan to solve the demo level of the game.
图 6:解决游戏演示关卡的计划片段。

In this example the modeller decided to use a grid-based (i.e., via tiles) abstraction of locations
在这个例子中,建模者决定使用基于网格的(即通过瓦片)位置抽象
111Using the grid abstraction turned out to be rather inconvenient. The reasons are that it is not very robust if the items are not aligned to the grid and its is hard to determine which tiles are connected. In a game with free movement such as our example it would be better to use a navigation-mesh-based abstraction (Shirzadehhajimahmood et al. 2021).
使用网格抽象结果相当不方便。原因是如果物品未对齐到网格,它就不太稳健,并且很难确定哪些瓦片是连接的。在像我们的例子这样具有自由移动的游戏中,最好使用基于导航网格的抽象(Shirzadehhajimahmood 等,2021 年)。
and came up with the following list of properties that should be logged:
并列出了应记录的属性列表:

  • location of the hero (tile),


    英雄的位置(瓦片)
  • locations of each item (tile or none),


    • 每个物品的位置(瓷砖或无),
  • the state of each quest (ready, active, done),


    • 每个任务的状态(准备、进行中、已完成),
  • the location of each quest-giving NPC (tile),


    • 每个授予任务的 NPC 的位置(瓷砖),
  • number of items of each kind the hero has.


    • 英雄拥有的每种物品数量。

Additionally, we will need to know the following static properties of the environment and quests:
另外,我们还需要了解环境和任务的以下静态属性:

  • initial locations of each item (tile),


    • 每个物品(瓦片)的初始位置。
  • type of each item (apple, chicken, …),


    • 每个物品的类型(苹果,鸡肉,...),
  • type of item required for a quest (apple, chicken, …),


    • 任务所需物品的类型(苹果,鸡肉,...),
  • number of items required to fulfill the quest,


    • 完成任务所需的物品数量,
  • which tiles are neighbours.


    • 哪些瓦片是相邻的。

To see the concrete predicate declarations corresponding to these properties see Figure 3.
要查看与这些属性对应的具体谓词声明,请参见图 3。

The developers then implemented logging and provided logs for learning (see Figure 2). The modeller refined the PDDL domain synthesized from the logs (see Figures 3 and 4) and used a planner to a generate a plan to win the game (see Figure 6). We used the FF planner (Hoffmann and Nebel 2001) which worked very well for our problem and found a plan in 0.01 seconds.
开发人员随后实现了日志记录并为学习提供了日志(请参见图 2)。建模者从日志中综合出的 PDDL 领域进行了细化(请参见图 3 和 4),并使用规划器生成了赢得游戏的计划(请参见图 6)。我们使用了 FF 规划器(Hoffmann 和 Nebel 2001),它在 0.01 秒内很好地解决了我们的问题,并找到了一个计划。

In order to test whether a generated plan (test script) is valid222A test script is valid if it is executable in a bug-free implementation of the game. If a test script is valid, but not executable, then a bug in the game is indicated
如果测试脚本在游戏的无错误实现中可执行,则测试脚本有效。如果测试脚本有效但不可执行,则表示游戏中存在错误。

为了测试生成的计划(测试脚本)是否有效 2
, we implemented automatic test script execution. In this case, it was very easy, since the game is controlled just by moving the hero. Quests are activated/completed automatically when their requirements are fulfilled and the hero comes near their NPC. Also, items are picked up automatically just by walking up to them. Therefore, to execute the test scripts we only need to execute the Move actions. We did that by simulating the pushing of the arrow buttons in the directions of the next goal tile until we are close enough to the center of that tile333A recording of the plan execution within the game is available here: https://www.youtube.com/watch?v=BASKvQAbG04
游戏中计划执行的录制可在此处获得:https://www.youtube.com/watch?v=BASKvQAbG04

,我们实现了自动测试脚本执行。在这种情况下,这非常容易,因为游戏只需通过移动英雄来控制。当任务的要求得到满足并且英雄靠近其 NPC 时,任务会自动激活/完成。此外,只需走到物品旁边就可以自动拾取物品。因此,要执行测试脚本,我们只需要执行移动操作。我们通过模拟按下箭头按钮推动到下一个目标方块的方向,直到我们足够靠近该方块的中心。 3
.

Usage During Development 开发过程中的使用

The process we described should be applied in the beginning of the development process, when the game is simple and only contains a few game mechanics like our RPG demo. As the game is developed and new features are added, the PDDL model should be developed together with the game. Using incremental action model learning can aid the developers with maintaining the PDDL model.
我们描述的流程应该在开发过程的开始阶段应用,当游戏简单且仅包含少量游戏机制时,就像我们的 RPG 演示一样。随着游戏的开发和新功能的添加,PDDL 模型应该与游戏一起开发。使用增量动作模型学习可以帮助开发人员维护 PDDL 模型。

Challenges with Fully Automated Logging for Learning
完全自动化日志记录学习中的挑战

Deciding which properties/predicates should be logged and then implementing the logging into the game code can be complex and time-consuming. Therefore, automating this part of the workflow would be very beneficial. This section will discuss some examples of issues and challenges that need to be solved to achieve full automation.
决定应记录哪些属性/谓词,然后将日志记录实现到游戏代码中可能会很复杂且耗时。因此,自动化工作流程的这一部分将非常有益。本节将讨论一些需要解决的问题和挑战的示例,以实现完全自动化。

In general, logging must be detailed enough to capture all the game mechanics precisely; therefore, it is necessary to log the values and changes of all variables, arrays, and other data structures of selected game-play related classes in the code. On the other hand, for automated planning to be efficient, we must work with an abstraction of the game rules. We need high-level (of abstraction) predicates describing world states for this. The level of abstraction also cannot be too high; otherwise, the learned domain would be useless since it could only generate nonspecific and broad test scenarios. Those would be very difficult to execute automatically.
一般来说,日志记录必须足够详细,以精确捕获所有游戏机制;因此,有必要记录代码中选定的与游戏玩法相关类的所有变量、数组和其他数据结构的值和变化。另一方面,为了使自动化规划高效,我们必须使用游戏规则的抽象。我们需要描述世界状态的高级(抽象级别)谓词。抽象级别也不能太高;否则,学习到的领域将是无用的,因为它只能生成非特定和广泛的测试场景。这些场景将非常难以自动执行。

Precisely, we must map all numeric variables, arrays, strings, and custom data structures to predicates. This step is required because we cannot automatically decide what is necessary to describe all game-play rules fully. Interestingly, mapping all available variables is both too much detail and not necessarily sufficient at the same time. To completely model the behavior of a general computer program, we would also need to represent the “hidden” data like stack traces, the instruction pointer, the variables in the game engine, the operating system, and properties that are implied by the level design or interactions in the physics engine that are not represented in the code. For example, it is possible to jump from one location to another.
确切地说,我们必须将所有数字变量、数组、字符串和自定义数据结构映射到谓词。这一步骤是必需的,因为我们无法自动决定完全描述所有游戏规则所需的内容。有趣的是,映射所有可用变量既包含太多细节,又不一定足够。要完全建模通用计算机程序的行为,我们还需要表示“隐藏”的数据,如堆栈跟踪、指令指针、游戏引擎中的变量、操作系统以及由关卡设计或物理引擎中的交互所暗示的属性,这些属性在代码中没有体现。例如,可以从一个位置跳转到另一个位置。

Furthermore, we would like to (1-to-1) map planning actions to certain significant functions and methods in the code that represent player actions. The reason is that we want to call these methods from the plans obtained by the learned model. This step is problematic since STRIPS actions have a very limited structure. On the other hand, functions in code are usually much more complex. They contain branching, loops, other function calls, recursion, and other elements a STRIPS action cannot do. Therefore, mapping most functions from the program code one to one is impossible with the planning actions. One solution would be to break up game-code functions into simpler elements that can be mapped to planning actions. We would also need to use additional helper predicates to express the stack traces, program pointers, etc. The disadvantage is that the generated plans would then contain these elementary actions instead of the high-level ones we desired.
此外,我们希望将规划行动(1 对 1)映射到代码中代表玩家行动的某些重要功能和方法。原因是我们希望从学习模型获得的计划中调用这些方法。这一步骤存在问题,因为 STRIPS 行动具有非常有限的结构。另一方面,代码中的函数通常更复杂。它们包含分支、循环、其他函数调用、递归和其他 STRIPS 行动无法执行的元素。因此,将程序代码中的大多数函数与规划行动一对一映射是不可能的。一个解决方案是将游戏代码函数拆分为可以映射到规划行动的简单元素。我们还需要使用额外的辅助谓词来表示堆栈跟踪、程序指针等。缺点是生成的计划将包含这些基本行动,而不是我们期望的高级行动。

Alternatively, we could achieve a “one to many” mapping where one function from the program code gets mapped to a couple of planning actions, each representing the whole original function. This step can be done easily and automatically, but from our experience, many very specific planning actions are learned, and the model does not generalize well. It is somewhat similar to the situation known as overfitting in machine learning.
或者,我们可以实现“一对多”映射,其中程序代码中的一个函数被映射到一对规划动作,每个动作代表整个原始函数。这一步骤可以轻松自动地完成,但根据我们的经验,学习到了许多非常具体的规划动作,模型的泛化能力不佳。这在机器学习中被称为过拟合的情况有些相似。

We could mitigate these problems by supporting additional PDDL features like conditional effects and universal quantifiers so that more code functions can be mapped to a single (or at least fewer than before) planning action(s). This process is complicated since most action model learning methods only support STRIPS-like action definitions.
通过支持额外的 PDDL 功能,如条件效果和全称量词,我们可以缓解这些问题,从而使更多的代码功能能够映射到单个(或至少比以前少)规划动作。由于大多数动作模型学习方法只支持类似 STRIPS 的动作定义,这个过程很复杂。

Conclusion 结论

In this paper we presented a workflow that applies research from the areas of automated planning and action model learning in order help with the automation of regression testing during video game development. We demonstrated it with a proof-of-concept project and discussed what challenges need to be addressed to further automate its steps.
在本文中,我们提出了一个工作流程,应用了自动规划和动作模型学习领域的研究成果,以帮助自动化视频游戏开发过程中的回归测试。我们通过一个概念验证项目进行了演示,并讨论了进一步自动化其步骤需要解决的挑战。

Future Work 未来工作

We plan to evaluate and refine the workflow on larger and more complex games. We are currently experimenting by applying it on a real-time strategy game. We also want to run proper user studies to quantify the usefulness of the proposed workflow compared to just creating the domains manually. Furthermore, we want to address and resolve the challenges related to further automating the synthesis of PDDL domains mentioned in the previous section.
我们计划评估和完善在更大更复杂的游戏中的工作流程。我们目前正在尝试将其应用于实时策略游戏。我们还希望进行适当的用户研究,以量化所提出的工作流程相对于仅手动创建领域的实用性。此外,我们希望解决并解决与进一步自动合成前一节中提到的 PDDL 领域相关的挑战。

References 参考文献

  • Aineto, Celorrio, and Onaindia (2019)
    Aineto,Celorrio 和 Onaindia(2019)
    Aineto, D.; Celorrio, S. J.; and Onaindia, E. 2019. Learning action models with minimal observability. Artificial Intelligence, 275: 104–137.
    Aineto,D .; Celorrio,S. J .; 和 Onaindia,E. 2019. 以最小可观察性学习动作模型。人工智能,275:104-137。
  • Ariyurek, Betin-Can, and Surer (2019)
    Ariyurek,Betin-Can 和 Surer(2019)
    Ariyurek, S.; Betin-Can, A.; and Surer, E. 2019. Automated video game testing using synthetic and humanlike agents. IEEE Transactions on Games, 13(1): 50–67.
    Ariyurek, S.; Betin-Can, A.; 和 Surer, E. 2019. 使用合成和类人代理进行自动化视频游戏测试. 《IEEE 游戏交易》,13(1): 50–67.
  • Balyo et al. (2024) Balyo 等人 (2024) Balyo, T.; Suda, M.; Chrpa, L.; Šafránek, D.; Dvořák, F.; Barták, R.; and Youngblood, G. M. 2024. Learning Planning Action Models from State Traces. arXiv:2402.10726.
    Balyo, T.; Suda, M.; Chrpa, L.; Šafránek, D.; Dvořák, F.; Barták, R.; 和 Youngblood, G. M. 2024. 从状态跟踪中学习规划行动模型. arXiv:2402.10726.
  • Callanan et al. (2022) Callanan 等人(2022 年) Callanan, E.; De Venezia, R.; Armstrong, V.; Paredes, A.; Chakraborti, T.; and Muise, C. 2022. MACQ: a holistic view of model acquisition techniques. arXiv preprint arXiv:2206.06530.
    Callanan, E.; De Venezia, R.; Armstrong, V.; Paredes, A.; Chakraborti, T.; 和 Muise, C. 2022. MACQ: 模型获取技术的整体视角. arXiv 预印本 arXiv:2206.06530.
  • Chung, Buro, and Schaeffer (2005)
    Chung, Buro 和 Schaeffer(2005 年)
    Chung, M.; Buro, M.; and Schaeffer, J. 2005. Monte Carlo Planning in RTS Games. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence and Games (CIG05). IEEE.
    Chung, M.; Buro, M.; 和 Schaeffer, J. 2005. 实时战略游戏中的蒙特卡洛规划。在 2005 年 IEEE 计算智能与游戏研讨会(CIG05)论文集中。IEEE。
  • Duarte et al. (2020) Duarte, F. F.; Lau, N.; Pereira, A.; and Reis, L. P. 2020. A survey of planning and learning in games. Applied Sciences, 10(13): 4529.
  • Ferdous et al. (2021a) Ferdous, R.; Kifetew, F.; Prandi, D.; Prasetya, I.; Shirzadehhajimahmood, S.; and Susi, A. 2021a. Search-based automated play testing of computer games: A model-based approach. In International Symposium on Search Based Software Engineering, 56–71. Springer.
  • Ferdous et al. (2021b) Ferdous, R.; Kifetew, F.; Prandi, D.; Prasetya, I. S. W. B.; Shirzadehhajimahmood, S.; and Susi, A. 2021b. Search-Based Automated Play Testing of Computer Games: A Model-Based Approach. In Search-Based Software Engineering: 13th International Symposium, SSBSE 2021, Bari, Italy, October 11–12, 2021, Proceedings, 56–71. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-030-88105-4.
  • Ferdous et al. (2022) Ferdous, R.; Kifetew, F.; Prandi, D.; and Susi, A. 2022. Towards Agent-Based Testing of 3D Games Using Reinforcement Learning. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 1–8.
  • Fikes and Nilsson (1971) Fikes, R. E.; and Nilsson, N. J. 1971. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3-4): 189–208.
  • Fray (2023) Fray, A. 2023. Automated Testing Roundtables GDC 2023. https://autotestingroundtable.com/. (Accessed on 12/12/2023).
  • Ghallab et al. (1998) Ghallab, M.; Howe, A.; Knoblock, C.; Mcdermott, D.; Ram, A.; Veloso, M.; Weld, D.; and Wilkins, D. 1998. PDDL—The Planning Domain Definition Language.
  • Hoffmann and Nebel (2001) Hoffmann, J.; and Nebel, B. 2001. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research, 14: 253–302.
  • Hsu et al. (1990) Hsu, F.-h.; Anantharaman, T. S.; Campbell, M. S.; and Nowatzyk, A. 1990. Deep thought. In Computers, Chess, and Cognition, 55–78. Springer.
  • Juba, Le, and Stern (2021) Juba, B.; Le, H. S.; and Stern, R. 2021. Safe Learning of Lifted Action Models. In Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, 379–389.
  • Newell, Simon et al. (1972) Newell, A.; Simon, H. A.; et al. 1972. Human problem solving, volume 104:9. Prentice-hall Englewood Cliffs, NJ.
  • Owen, Anton, and Baker (2016) Owen, V. E.; Anton, G.; and Baker, R. 2016. Modeling user exploration and boundary testing in digital learning games. In Proceedings of the 2016 conference on user modeling adaptation and personalization, 301–302.
  • Politowski, Guéhéneuc, and Petrillo (2022) Politowski, C.; Guéhéneuc, Y.-G.; and Petrillo, F. 2022. Towards automated video game testing: still a long way to go. In Proceedings of the 6th International ICSE Workshop on Games and Software Engineering: Engineering Fun, Inspiration, and Motivation, 37–43.
  • Riddler (2021) Riddler, B. 2021. Improve Automated Game Testing Using Domain Independent AI Planning - YouTube. https://www.youtube.com/watch?v=2KXmxuCjjCw. (Accessed on 12/12/2023).
  • Samuel (1959) Samuel, A. L. 1959. Some studies in machine learning using the game of checkers. IBM Journal of research and development, 3(3): 210–229.
  • Schaeffer et al. (1996) Schaeffer, J.; Lake, R.; Lu, P.; and Bryant, M. 1996. Chinook the world man-machine checkers champion. AI magazine, 17(1): 21–21.
  • Shirzadehhajimahmood et al. (2021) Shirzadehhajimahmood, S.; Prasetya, I.; Dignum, F.; Dastani, M.; and Keller, G. 2021. Using an agent-based approach for robust automated testing of computer games. In Proceedings of the 12th International Workshop on Automating TEST Case Design, Selection, and Evaluation, 1–8.
  • Smith, Nau, and Throop (1998) Smith, S. J.; Nau, D.; and Throop, T. 1998. Computer bridge: A big win for AI planning. AI magazine, 19(2): 93–93.
  • Tufano et al. (2022) Tufano, R.; Scalabrino, S.; Pascarella, L.; Aghajani, E.; Oliveto, R.; and Bavota, G. 2022. Using reinforcement learning for load testing of video games. In Proceedings of the 44th International Conference on Software Engineering, 2303–2314.
  • Unity Technologies (2019) Unity Technologies. 2019. Creator Kit: RPG - Unity Learn. https://learn.unity.com/project/creator-kit-rpg. (Accessed on 12/09/2023).
  • Unity Technologies (2023) Unity Technologies. 2023. Unity Real-Time Development Platform — 3D, 2D, VR & AR Engine. https://unity.com/. (Accessed on 12/09/2023).
  • Wang (1996) Wang, X. 1996. Learning planning operators by observation and practice. Ph.D. thesis, Citeseer.
  • Wurman et al. (2022) Wurman, P. R.; Barrett, S.; Kawamoto, K.; MacGlashan, J.; Subramanian, K.; Walsh, T. J.; Capobianco, R.; Devlic, A.; Eckert, F.; Fuchs, F.; Gilpin, L.; Khandelwal, P.; Kompella, V.; Lin, H.; MacAlpine, P.; Oller, D.; Seno, T.; Sherstan, C.; Thomure, M. D.; Aghabozorgi, H.; Barrett, L.; Douglas, R.; Whitehead, D.; Dürr, P.; Stone, P.; Spranger, M.; and Kitano, H. 2022. Outracing champion Gran Turismo drivers with deep reinforcement learning. Nat., 602(7896): 223–228.
  • Yang, Wu, and Jiang (2007) Yang, Q.; Wu, K.; and Jiang, Y. 2007. Learning action models from plan examples using weighted MAX-SAT. Artificial Intelligence, 171(2-3): 107–143.
  • Zhuo et al. (2010) Zhuo, H. H.; Yang, Q.; Hu, D. H.; and Li, L. 2010. Learning complex action models with quantifiers and logical implications. Artificial Intelligence, 174(18): 1540–1569.