Query2box: Reasoning over Knowledge
Graphs in Vector Space using Box Embeddings
Query2box：使用盒子嵌入在向量空间中对知识图谱进行推理

Hongyu Ren , Weihua Hu^∗, Jure Leskovec
任鸿宇，胡伟华 ^∗ ，Jure Leskovec
Department of Computer Science, Stanford University
{hyren,weihuahu,jure}@cs.stanford.edu
斯坦福大学计算机科学系{hyren,weihuahu,jure}@cs.stanford.edu Equal contributions. Project website with data and code: http://snap.stanford.edu/query2box
平等贡献。项目网站，包含数据和代码：http://snap.stanford.edu/query2box

Abstract 摘要

Answering complex logical queries on large-scale incomplete knowledge graphs (KGs) is a fundamental yet challenging task. Recently, a promising approach to this problem has been to embed KG entities as well as the query into a vector space such that entities that answer the query are embedded close to the query. However, prior work models queries as single points in the vector space, which is problematic because a complex query represents a potentially large set of its answer entities, but it is unclear how such a set can be represented as a single point. Furthermore, prior work can only handle queries that use conjunctions ( $\wedge$ ) and existential quantifiers ( $\exists$ ). Handling queries with logical disjunctions ( $\vee$ ) remains an open problem. Here we propose query2box, an embedding-based framework for reasoning over arbitrary queries with $\wedge$ , $\vee$ , and $\exists$ operators in massive and incomplete KGs. Our main insight is that queries can be embedded as boxes (i.e., hyper-rectangles), where a set of points inside the box corresponds to a set of answer entities of the query. We show that conjunctions can be naturally represented as intersections of boxes and also prove a negative result that handling disjunctions would require embedding with dimension proportional to the number of KG entities. However, we show that by transforming queries into a Disjunctive Normal Form, query2box is capable of handling arbitrary logical queries with $\wedge$ , $\vee$ , $\exists$ in a scalable manner. We demonstrate the effectiveness of query2box on three large KGs and show that query2box achieves up to 25% relative improvement over the state of the art.
在大规模不完整知识图谱（KGs）上回答复杂逻辑查询是一项基本而具有挑战性的任务。最近，解决这个问题的一种有希望的方法是将 KG 实体以及查询嵌入到一个向量空间中，以便回答查询的实体嵌入在查询附近。然而，先前的工作将查询建模为向量空间中的单个点，这是有问题的，因为复杂查询表示其答案实体的一个潜在大集合，但是如何将这样的集合表示为单个点尚不清楚。此外，先前的工作只能处理使用合取（ $\wedge$ ）和存在量词（ $\exists$ ）的查询。处理具有逻辑析取（ $\vee$ ）的查询仍然是一个未解决的问题。在这里，我们提出了 query2box，这是一个基于嵌入的框架，用于在大规模和不完整的 KGs 上推理任意查询，其中包括 $\wedge$ ， $\vee$ 和 $\exists$ 运算符。我们的主要见解是查询可以被嵌入为盒子（即超矩形），其中盒子内的一组点对应于查询的一组答案实体。我们展示了连词可以自然地表示为盒子的交集，并且还证明了一个负面结果，即处理析取需要嵌入的维度与 KG 实体的数量成比例。然而，我们展示了通过将查询转换为析取范式，query2box 能够以可扩展的方式处理任意逻辑查询，其中 $\wedge$ ， $\vee$ ， $\exists$ 。我们在三个大型 KG 上展示了 query2box 的有效性，并且显示 query2box 相对于现有技术取得了高达 25%的相对改进。

1 Introduction 1 介绍

Knowledge graphs (KGs) capture different types of relationships between entities, e.g., Canada $\xrightarrow{citizen}$ Hinton. Answering arbitrary logical queries, such as “where did Canadian citizens with Turing Award graduate?”, over such KGs is a fundamental task in question answering, knowledge base reasoning, as well as AI more broadly.
知识图谱（KGs）捕捉实体之间的不同类型关系，例如，加拿大 $\xrightarrow{citizen}$ Hinton。在这样的 KG 上回答任意逻辑查询，例如“图灵奖获得者的加拿大公民毕业于哪里？”是一个基本的问题回答、知识库推理以及更广泛的人工智能任务。

First-order logical queries can be represented as Directed Acyclic Graphs (DAGs) (Fig. 1(A)) and be reasoned according to the DAGs to obtain a set of answers (Fig. 1(C)). While simple and intuitive, such approach has many drawbacks: (1) Computational complexity of subgraph matching is exponential in the query size, and thus cannot scale to modern KGs; (2) Subgraph matching is very sensitive as it cannot correctly answer queries with missing relations. To remedy (2) one could impute missing relations (Koller et al., 2007; Džeroski, 2009; De Raedt, 2008; Nickel et al., 2016) but that would only make the KG denser, which would further exacerbate issue (1) (Dalvi & Suciu, 2007; Krompaß et al., 2014).
一阶逻辑查询可以被表示为有向无环图（DAGs）（图 1（A））并且可以根据这些 DAGs 进行推理以获得一组答案（图 1（C））。虽然简单直观，但这种方法有许多缺点：（1）子图匹配的计算复杂性在查询大小上呈指数增长，因此无法适应现代知识图谱；（2）子图匹配非常敏感，因为它无法正确回答带有缺失关系的查询。要纠正（2），可以补充缺失的关系（Koller 等，2007; Džeroski，2009; De Raedt，2008; Nickel 等，2016），但这只会使知识图谱更加密集，进一步加剧问题（1）（Dalvi 和 Suciu，2007; Krompaß等，2014）。

Refer to caption — Figure 1: Query2Box reasoning framework. (A) A given conjunctive query “Where did Canadian citizens with Turing Award graduate?” can be represented with a dependency graph. (B) Computation graph specifies the reasoning procedure to obtain a set of answers for the query in (A). (C) Example knowledge graph, where green nodes/entities denote answers to the query. Bold arrows indicate subgraphs that match the query graph in (A). (D) In query2box, nodes of the KG are embedded as points in the vector space. We then obtain query embedding according to the computation graph (B) as a sequence of box operations: start with two nodes TuringAward and Canada and apply Win and Citizen projection operators, followed by an intersection operator (denoted as a shaded intersection of yellow and orange boxes) and another projection operator. The final embedding of the query is a green box and query’s answers are the entities inside the box.
图 1：Query2Box 推理框架。（A）给定一个合取查询“图灵奖得主的加拿大公民毕业于哪里？”可以用依赖图表示。（B）计算图指定了获取查询（A）的一组答案的推理过程。（C）示例知识图，其中绿色节点/实体表示查询的答案。粗箭头表示与查询图（A）匹配的子图。（D）在 query2box 中，知识图的节点被嵌入到向量空间中的点。然后，根据计算图（B）获得查询嵌入，作为一系列盒子操作的序列：从两个节点 TuringAward 和 Canada 开始，应用 Win 和 Citizen 投影运算符，然后是一个交集运算符（用黄色和橙色盒子的阴影交集表示）和另一个投影运算符。查询的最终嵌入是一个绿色盒子，查询的答案是盒子内的实体。

Recently, a promising alternative approach has emerged, where logical queries as well as KG entities are embedded into a low-dimensional vector space such that entities that answer the query are embedded close to the query (Guu et al., 2015; Hamilton et al., 2018; Das et al., 2017). Such approach robustly handles missing relations (Hamilton et al., 2018) and is also orders of magnitude faster, as answering an arbitrary logical query is reduced to simply identifying entities nearest to the embedding of the query in the vector space.
最近，出现了一种有前途的替代方法，其中逻辑查询和知识图谱实体被嵌入到低维向量空间中，以使回答查询的实体嵌入接近查询（Guu 等，2015 年; Hamilton 等，2018 年; Das 等，2017 年）。这种方法可以稳健地处理缺失关系（Hamilton 等，2018 年），而且速度也快得多，因为回答任意逻辑查询只需在向量空间中识别最接近查询嵌入的实体。

However, prior work embeds a query into a single point in the vector space. This is problematic because answering a logical query requires modeling a set of active entities while traversing the KG (Fig. 1(C)), and how to effectively model a set with a single point is unclear. Furthermore, it is also unnatural to define logical operators (e.g., set intersection) of two points in the vector space. Another fundamental limitation of prior work is that it can only handle conjunctive queries, a subset of first-order logic that only involves conjunction ( $\wedge$ ) and existential quantifier ( $\exists$ ), but not disjunction ( $\vee$ ). It remains an open question how to handle disjunction effectively in the vector space.
然而，先前的工作将查询嵌入到向量空间中的一个单点中。这是有问题的，因为回答一个逻辑查询需要对一组活动实体进行建模，同时遍历知识图谱（图 1（C）），如何有效地对一个单点进行集合建模是不清楚的。此外，将两个点在向量空间中定义为逻辑运算符（例如，集合交集）也是不自然的。先前的工作的另一个基本限制是它只能处理合取查询，即一阶逻辑的一个子集，只涉及合取（ $\wedge$ ）和存在量词（ $\exists$ ），而不涉及析取（ $\vee$ ）。如何在向量空间中有效处理析取仍然是一个未解决的问题。

Here we present query2box, an embedding-based framework for reasoning over KGs that is capable of handling arbitrary Existential Positive First-order (EPFO) logical queries (i.e., queries that include any set of $\wedge$ , $\vee$ , and $\exists$ ) in a scalable manner. First, to accurately model a set of entities, our key idea is to use a closed region rather than a single point in the vector space. Specifically, we use a box (axis-aligned hyper-rectangle) to represent a query (Fig. 1(D)). This provides three important benefits: (1) Boxes naturally model sets of entities they enclose; (2) Logical operators (e.g., set intersection) can naturally be defined over boxes similarly as in Venn diagrams (Venn, 1880); (3) Executing logical operators over boxes results in new boxes, which means that the operations are closed; thus, logical reasoning can be efficiently performed in query2box by iteratively updating boxes according to the query computation graph (Fig. 1(B)(D)).
在这里，我们介绍了 query2box，这是一个基于嵌入的知识图谱推理框架，能够以可扩展的方式处理任意存在性正向一阶（EPFO）逻辑查询（即，包括任意一组 $\wedge$ ， $\vee$ 和 $\exists$ 的查询）。首先，为了准确地建模一组实体，我们的关键思想是使用一个封闭区域而不是向量空间中的单个点。具体而言，我们使用一个盒子（轴对齐的超矩形）来表示一个查询（图 1(D)）。这提供了三个重要的好处：（1）盒子自然地模拟了它们所包围的实体集合；（2）逻辑运算符（例如，集合交集）可以像文氏图（Venn, 1880）中那样自然地定义在盒子上；（3）在盒子上执行逻辑运算符会产生新的盒子，这意味着操作是封闭的；因此，可以通过根据查询计算图（图 1(B)(D)）迭代更新盒子来在 query2box 中高效执行逻辑推理。

We show that query2box can naturally handle conjunctive queries. We first prove a negative result that embedding EPFO queries to only single points or boxes is intractable as it would require embedding dimension proportional to the number of KG entities. However, we provide an elegant solution, where we transform a given EPFO logical query into a Disjunctive Normal Form (DNF) (Davey & Priestley, 2002), i.e., disjunction of conjunctive queries. Given any EPFO query, query2box represents it as a set of individual boxes, where each box is obtained for each conjunctive query in the DNF. We then return nearest neighbor entities to any of the boxes as the answers to the query. This means that to answer any EPFO query we first answer individual conjunctive queries and then take the union of the answer entities.
我们展示了 query2box 可以自然地处理合取查询。我们首先证明了一个负面结果，即将 EPFO 查询嵌入到单个点或盒子中是不可行的，因为这将需要与 KG 实体数量成比例的嵌入维度。然而，我们提供了一个优雅的解决方案，将给定的 EPFO 逻辑查询转化为析取范式（DNF）（Davey＆Priestley，2002），即合取查询的析取。对于任何 EPFO 查询，query2box 将其表示为一组单独的盒子，其中每个盒子对应于 DNF 中的每个合取查询。然后，我们返回到任何盒子的最近邻实体作为查询的答案。这意味着要回答任何 EPFO 查询，我们首先回答单独的合取查询，然后取答案实体的并集。

We evaluate query2box on three standard KG benchmarks and show: (1) query2box provides strong generalization as it can answer complex queries; (2) query2box can generalize to new logical query structures that it has never seen during training; (3) query2box is able to implicitly impute missing relations as it can answer any EPFO query with high accuracy even when relations involving answering the query are missing in the KG; (4) query2box provides up to 25% relative improvement in accuracy of answering EPFO queries over state-of-the-art baselines.
我们在三个标准的知识图谱基准上评估了 query2box，并展示了以下结果：（1）query2box 具有强大的泛化能力，可以回答复杂的查询；（2）query2box 可以泛化到在训练期间从未见过的新的逻辑查询结构；（3）query2box 能够隐式地填充缺失的关系，即使在知识图谱中缺少与回答查询相关的关系，也能以高准确性回答任何 EPFO 查询；（4）query2box 在回答 EPFO 查询的准确性上相对于最先进的基准模型提供了高达 25%的相对改进。

2 Further Related Work 2 进一步相关工作

Most related to our work are embedding approaches for multi-hop reasoning over KGs (Bordes et al., 2013; Das et al., 2017; Guu et al., 2015; Hamilton et al., 2018). Crucial difference is that we provide a way to tractably handle a larger subset of the first-order logic (EPFO queries vs. conjunctive queries) and that we embed queries as boxes, which provides better accuracy and generalization.
最相关于我们工作的是针对知识图谱的多跳推理的嵌入方法 (Bordes 等人，2013 年; Das 等人，2017 年; Guu 等人，2015 年; Hamilton et al., 2018 年)。关键区别在于我们提供了一种处理更大子集的一阶逻辑 (EPFO 查询 vs. 附加查询) 的可处理方法，并且我们将查询嵌入为方框，这提供了更好的准确性和泛化能力。

Second line of related work is on structured embeddings, which associate images, words, sentences, or knowledge base concepts with geometric objects such as regions (Erk, 2009; Vilnis et al., 2018; Li et al., 2019), densities (Vilnis & McCallum, 2014; He et al., 2015; Athiwaratkun & Wilson, 2018), and orderings (Vendrov et al., 2016; Lai & Hockenmaier, 2017; Li et al., 2017). While the above work uses geometric objects to model individual entities and their pairwise relations, we use the geometric objects to model sets of entities and reason over those sets. In this sense our work is also related to classical Venn Diagrams (Venn, 1880), where boxes are essentially the Venn Diagrams in vector space, but our boxes and entity embeddings are jointly learned, which allows us to reason over incomplete KGs.
相关工作的第二行是关于结构化嵌入，它将图像、单词、句子或知识库概念与几何对象（如区域）相关联（Erk, 2009; Vilnis et al., 2018; Li et al., 2019），密度（Vilnis 和 McCallum，2014; He et al., 2015; Athiwaratkun 和 Wilson，2018）以及排序（Vendrov 等，2016; Lai 和 Hockenmaier，2017; Li 等，2017）。虽然上述工作使用几何对象来对个体实体及其两两关系建模，但我们使用几何对象来对实体集合进行建模，并对这些集合进行推理。从这个意义上说，我们的工作也与经典的文氏图（Venn, 1880）相关，其中盒子基本上是向量空间中的文氏图，但我们的盒子和实体嵌入是共同学习的，这允许我们在不完整的知识图谱上进行推理。

Box embeddings have also been used to model hierarchical nature of concepts in an ontology with uncertainty (Vilnis et al., 2018; Li et al., 2019). While our work is also based on box embeddings we employ them for logical reasoning in massive heterogeneous knowledge graphs.
盒子嵌入也被用于以不确定性建立本体的概念的层次性模型（Vilnis et al., 2018; Li et al., 2019）。虽然我们的工作也基于盒子嵌入，但我们将其用于在大规模异构知识图谱中进行逻辑推理。

3 Query2Box: Logical Reasoning over KGs in Vector Space
3Query2Box：在向量空间中对知识图谱进行逻辑推理

Here we present the query2box, where we will define an objective function that allows us to learn embeddings of entities in the KG, and at the same time also learn parameterized geometric logical operators over boxes. Then given an arbitrary EPFO query $q$ (Fig. 1(A)), we will identify its computation graph (Fig. 1(B)), and embed the query by executing a set of geometric operators over boxes (Fig. 1(D)). Entities that are enclosed in the final box embedding are returned as answers to the query (Fig. 1(D)).
在这里，我们介绍了 query2box，我们将定义一个目标函数，使我们能够学习知识图谱中实体的嵌入，并同时学习参数化的几何逻辑运算符。然后，给定任意的 EPFO 查询 $q$ （图 1（A）），我们将识别其计算图（图 1（B）），并通过执行一组几何运算符来嵌入查询（图 1（D））。最终嵌入框中包含的实体作为查询的答案返回（图 1（D））。

In order to train our system, we generate a set of queries together with their answers at training time and then learn entity embeddings and geometric operators such that queries can be accurately answered. We show in the following sections that our approach is able to generalize to queries and logical structures never seen during training. Furthermore, as we show in experiments, our approach is able to implicitly impute missing relations and answer queries that would be impossible to answer with traditional graph traversal methods.
为了训练我们的系统，我们在训练时生成一组查询及其答案，然后学习实体嵌入和几何运算符，以便准确回答查询。我们在以下章节中展示，我们的方法能够推广到在训练期间从未见过的查询和逻辑结构。此外，正如我们在实验中展示的那样，我们的方法能够隐式填充缺失的关系并回答传统图遍历方法无法回答的查询。

In the following we first only consider conjunctive queries (conjunction and existential operator) and then we extend our method to also include disjunction.
在接下来的内容中，我们首先只考虑合取查询（合取和存在运算符），然后我们将我们的方法扩展到包括析取。

3.1 Knowledge Graphs and Conjunctive Queries
3.1 知识图谱和合取查询

We denote a KG as $\mathcal{G}=(\mathcal{V},\mathcal{R})$ , where $v\in\mathcal{V}$ represents an entity, and $r\in\mathcal{R}$ is a binary function $r:\mathcal{V}\times\mathcal{V}\to\{\text{True},\text{False}\}$ , indicating whether the relation $r$ holds between a pair of entities or not. In the KG, such binary output indicates the existence of the directed edge between a pair of entities, i.e., $v\xrightarrow{r}v^{\prime}$ iff $r(v,v^{\prime})=$ True.
我们将知识图谱表示为 $\mathcal{G}=(\mathcal{V},\mathcal{R})$ ，其中 $v\in\mathcal{V}$ 表示一个实体， $r\in\mathcal{R}$ 是一个二元函数 $r:\mathcal{V}\times\mathcal{V}\to\{\text{True},\text{False}\}$ ，表示关系 $r$ 是否在一对实体之间存在。在知识图谱中，这样的二元输出表示一对实体之间的有向边的存在，即当且仅当 $v\xrightarrow{r}v^{\prime}$ 为真时。

Conjunctive queries are a subclass of the first-order logical queries that use existential ( $\exists$ ) and conjunction ( $\wedge$ ) operations. They are formally defined as follows.
合取查询是一类使用存在量词（ $\exists$ ）和合取（ $\wedge$ ）操作的一阶逻辑查询的子类。它们的形式定义如下。

		$\displaystyle q[V_{?}]=V_{?}\>.\>\exists V_{1},\ldots,V_{k}:e_{1}\land e_{2}\land...\land e_{n},$		(1)
		$\displaystyle\textrm{where }\textrm{$e_{i}=r(v_{a},V)$},V\in\{V_{?},V_{1},\ldots,V_{k}\},v_{a}\in\mathcal{V},r\in\mathcal{R},$
		$\displaystyle\qquad\ \ \textrm{or }\textrm{$e_{i}=r(V,V^{\prime})$},V,V^{\prime}\in\{V_{?},V_{1},\ldots,V_{k}\},V\neq V^{\prime},r\in\mathcal{R},$

where $v_{a}$ represents non-variable anchor entity, $V_{1},\ldots,V_{k}$ are existentially quantified bound variables, $V_{?}$ is the target variable. The goal of answering the logical query $q$ is to find a set of entities $\llbracket q\rrbracket\subseteq\mathcal{V}$ such that $v\in\llbracket q\rrbracket$ iff $q[v]=$ True. We call $\llbracket q\rrbracket$ the denotation set (i.e., answer set) of query $q$ .
其中 $v_{a}$ 表示非变量锚定实体， $V_{1},\ldots,V_{k}$ 是存在量化的绑定变量， $V_{?}$ 是目标变量。回答逻辑查询 $q$ 的目标是找到一组实体 $\llbracket q\rrbracket\subseteq\mathcal{V}$ ，使得 $v\in\llbracket q\rrbracket$ 当且仅当 $q[v]=$ 为真。我们称 $\llbracket q\rrbracket$ 为查询 $q$ 的指示集（即答案集）。

As shown in Fig. 1(A), the dependency graph is a graphical representation of conjunctive query $q$ , where nodes correspond to variable or non-variable entities in $q$ and edges correspond to relations in $q$ . In order for the query to be valid, the corresponding dependency graph needs to be a Directed Acyclic Graph (DAG), with the anchor entities as the source nodes of the DAG and the query target $V_{?}$ as the unique sink node (Hamilton et al., 2018).
如图 1（A）所示，依赖图是对连词查询 $q$ 的图形表示，其中节点对应于 $q$ 中的变量或非变量实体，边对应于 $q$ 中的关系。为了使查询有效，相应的依赖图需要是一个有向无环图（DAG），其中锚定实体作为 DAG 的源节点，查询目标 $V_{?}$ 作为唯一的汇节点（Hamilton 等，2018）。

From the dependency graph of query $q$ , one can also derive the computation graph, which consists of two types of directed edges that represent operators over sets of entities:
从查询 $q$ 的依赖图中，还可以推导出计算图，它由两种类型的有向边组成，表示对实体集合的运算符：

•

Projection: Given a set of entities $S\subseteq\mathcal{V}$ , and relation $r\in\mathcal{R}$ , this operator obtains $\cup_{v\in S}A_{r}(v)$ , where $A_{r}(v)\equiv\{v^{\prime}\in\mathcal{V}:\ r(v,v^{\prime})=\text{True}\}$ .

• 投影：给定实体集合 $S\subseteq\mathcal{V}$ 和关系 $r\in\mathcal{R}$ ，此运算符获得 $\cup_{v\in S}A_{r}(v)$ ，其中 $A_{r}(v)\equiv\{v^{\prime}\in\mathcal{V}:\ r(v,v^{\prime})=\text{True}\}$ 。
•

Intersection: Given a set of entity sets $\{S_{1},S_{2},\ldots,S_{n}\}$ , this operator obtains $\cap_{i=1}^{n}S_{i}.$

• 交集：给定一组实体集合 $\{S_{1},S_{2},\ldots,S_{n}\}$ ，此运算符获取 $\cap_{i=1}^{n}S_{i}.$

For a given query $q$ , the computation graph specifies the procedure of reasoning to obtain a set of answer entities, i.e., starting from a set of anchor nodes, the above two operators are applied iteratively until the unique sink target node is reached. The entire procedure is analogous to traversing KGs following the computation graph (Guu et al., 2015).
对于给定的查询 $q$ ，计算图指定了推理过程，以获取一组答案实体，即从一组锚节点开始，以上两个运算符被迭代应用，直到达到唯一的目标节点。整个过程类似于按照计算图遍历知识图谱（Guu 等，2015）。

3.2 Reasoning over Sets of Entities Using Box Embeddings
3.2 使用盒子嵌入对实体集进行推理

So far we have defined conjunctive queries as computation graphs that can be executed directly over the nodes and edges in the KG. Now, we define logical reasoning in the vector space. Our intuition follows Fig. 1: Given a complex query, we shall decompose it into a sequence of logical operations, and then execute these operations in the vector space. This way we will obtain the embedding of the query, and answers to the query will be entities that are enclosed in the final query embedding box.
迄今为止，我们已经将连词查询定义为可以直接在知识图谱中的节点和边上执行的计算图。现在，我们在向量空间中定义逻辑推理。我们的直觉遵循图 1：给定一个复杂的查询，我们将把它分解成一系列逻辑操作，然后在向量空间中执行这些操作。这样，我们将获得查询的嵌入，查询的答案将是包含在最终查询嵌入框中的实体。

In the following, we detail our two methodological advances: (1) the use of box embeddings to efficiently model and reason over sets of entities in the vector space, and (2) how to tractably handle disjunction operator ( $\vee$ ), expanding the class of first-order logic that can be modeled in the vector space (Section 3.3).
在接下来的内容中，我们详细介绍了我们的两个方法论进展：（1）使用盒子嵌入来高效地对向量空间中的实体集进行建模和推理，以及（2）如何可行地处理析取运算符（ $\vee$ ），扩展了可以在向量空间中建模的一阶逻辑类别（第 3.3 节）。

Box embeddings. To efficiently model a set of entities in the vector space, we use boxes (i.e., axis-aligned hyper-rectangles). The benefit is that unlike a single point, the box has the interior; thus, if an entity is in a set, it is natural to model the entity embedding to be a point inside the box. Formally, we operate on $\mathbb{R}^{d}$ , and define a box in $\mathbb{R}^{d}$ by $\mathbf{p}=(\text{Cen}(\mathbf{p}),\text{Off}(\mathbf{p}))\in\mathbb{R}^{2d}$ as:
盒子嵌入。为了在向量空间中高效地建模一组实体，我们使用盒子（即轴对齐的超矩形）。好处是，与单个点不同，盒子具有内部；因此，如果一个实体在一个集合中，将实体嵌入建模为盒子内的一个点是自然的。形式上，我们在 $\mathbb{R}^{d}$ 上操作，并将一个盒子定义为 $\mathbb{R}^{d}$ 乘以 $\mathbf{p}=(\text{Cen}(\mathbf{p}),\text{Off}(\mathbf{p}))\in\mathbb{R}^{2d}$ 。

\displaystyle{\rm Box}_{\mathbf{p}}\equiv\{\mathbf{v}\in\mathbb{R}^{d}:\ \text{Cen}(\mathbf{p})-\text{Off}(\mathbf{p})\preceq\mathbf{v}\preceq\text{Cen}(\mathbf{p})+\text{Off}(\mathbf{p})\},

(2)

where $\preceq$ is element-wise inequality, $\text{Cen}(\mathbf{p})\in\mathbb{R}^{d}$ is the center of the box, and $\text{Off}(\mathbf{p})\in\mathbb{R}^{d}_{\geq 0}$ is the positive offset of the box, modeling the size of the box. Each entity $v\in\mathcal{V}$ in KG is assigned a single vector $\mathbf{v}\in\mathbb{R}^{d}$ (i.e., a zero-size box), and the box embedding $\mathbf{p}$ models $\{v\in\mathcal{V}:\mathbf{v}\in{\rm Box}_{\mathbf{p}}\}$ , i.e., a set of entities whose vectors are inside the box. For the rest of the paper, we use the bold face to denote the embedding, e.g., embedding of $v$ is denoted by $\mathbf{v}$ .
其中 $\preceq$ 是逐元素不等式， $\text{Cen}(\mathbf{p})\in\mathbb{R}^{d}$ 是盒子的中心， $\text{Off}(\mathbf{p})\in\mathbb{R}^{d}_{\geq 0}$ 是盒子的正偏移量，模拟盒子的大小。KG 中的每个实体 $v\in\mathcal{V}$ 都被分配一个单一的向量 $\mathbf{v}\in\mathbb{R}^{d}$ （即零大小的盒子），盒子嵌入 $\mathbf{p}$ 模拟 $\{v\in\mathcal{V}:\mathbf{v}\in{\rm Box}_{\mathbf{p}}\}$ ，即一组向量在盒子内的实体。在本文的其余部分，我们使用粗体表示嵌入，例如 $v$ 的嵌入用 $\mathbf{v}$ 表示。

Our framework reasons over KGs in the vector space following the computation graph of the query, as shown in Fig. 1(D): we start from the initial box embeddings of the source nodes (anchor entities) and sequentially update the embeddings according to the logical operators. Below, we describe how we set initial box embeddings for the source nodes, as well as how we model projection and intersection operators (defined in Sec. 3.1) as geometric operators that operate over boxes. After that, we describe our entity-to-box distance function and the overall objective that learns embeddings as well as the geometric operators.
我们的框架在向量空间中通过查询的计算图推理知识图谱，如图 1(D) 所示：我们从源节点（锚定实体）的初始方框嵌入开始，根据逻辑运算符顺序更新嵌入。下面，我们描述了如何为源节点设置初始方框嵌入，以及我们如何将投影和交集运算符（在第 3.1 节中定义）作为在方框上操作的几何运算符进行建模。之后，我们描述了我们的实体到方框的距离函数和总体目标，该目标学习嵌入以及几何运算符。

Initial boxes for source nodes. Each source node represents an anchor entity $v\in\mathcal{V}$ , which we can regard as a set that only contains the single entity. Such a single-element set can be naturally modeled by a box of size/offset zero centered at $\mathbf{v}$ . Formally, we set the initial box embedding as $(\mathbf{v},\mathbf{0})$ , where $\mathbf{v}\in\mathbb{R}^{d}$ is the anchor entity vector and $\mathbf{0}$ is a $d$ -dimensional all-zero vector.
源节点的初始框。每个源节点代表一个锚定实体 $v\in\mathcal{V}$ ，我们可以将其视为仅包含单个实体的集合。这样的单元素集合可以通过一个大小/偏移为零且以 $\mathbf{v}$ 为中心的框来自然建模。形式上，我们将初始框嵌入设置为 $(\mathbf{v},\mathbf{0})$ ，其中 $\mathbf{v}\in\mathbb{R}^{d}$ 是锚定实体向量， $\mathbf{0}$ 是一个 $d$ 维全零向量。

Geometric projection operator. We associate each relation $r\in\mathcal{R}$ with relation embedding $\mathbf{r}=(\text{Cen}(\mathbf{r}),\text{Off}(\mathbf{r}))\in\mathbb{R}^{2d}$ with $\text{Off}(\mathbf{r})\succeq\mathbf{0}$ . Given an input box embedding $\mathbf{p}$ , we model the projection by $\mathbf{p}+\mathbf{r}$ , where we sum the centers and sum the offsets. This gives us a new box with the translated center and larger offset because $\text{Off}(\mathbf{r})\succeq\mathbf{0}$ , as illustrated in Fig. 2(A). The adaptive box size effectively models a different number of entities/vectors in the set.
几何投影运算符。我们将每个关系 $r\in\mathcal{R}$ 与关系嵌入 $\mathbf{r}=(\text{Cen}(\mathbf{r}),\text{Off}(\mathbf{r}))\in\mathbb{R}^{2d}$ 和 $\text{Off}(\mathbf{r})\succeq\mathbf{0}$ 相关联。给定一个输入框嵌入 $\mathbf{p}$ ，我们通过 $\mathbf{p}+\mathbf{r}$ 来建模投影，其中我们对中心点和偏移量进行求和。这样我们就得到一个新的框，其具有平移后的中心和更大的偏移量，因为 $\text{Off}(\mathbf{r})\succeq\mathbf{0}$ ，如图 2(A)所示。自适应框大小有效地模拟了集合中不同数量的实体/向量。

Geometric intersection operator. We model the intersection of a set of box embeddings $\{\mathbf{p_{1}},\ldots,\mathbf{p_{n}}\}$ as $\mathbf{p}_{{\rm inter}}=(\text{Cen}(\mathbf{p_{{\rm inter}}}),\text{Off}(\mathbf{p_{{\rm inter}}}))$ , which is calculated by performing attention over the box centers (Bahdanau et al., 2015) and shrinking the box offset using the sigmoid function:
几何交集运算符。我们将一组盒子嵌入的交集建模为 $\mathbf{p}_{{\rm inter}}=(\text{Cen}(\mathbf{p_{{\rm inter}}}),\text{Off}(\mathbf{p_{{\rm inter}}}))$ ，通过对盒子中心进行注意力计算（Bahdanau 等人，2015），并使用 sigmoid 函数缩小盒子偏移量来计算。

	$\displaystyle\text{Cen}(\mathbf{p_{{\rm inter}}})=\sum_{i}\mathbf{a}_{i}\odot\text{Cen}(\mathbf{p_{i}}),\ \ \mathbf{a}_{i}=\frac{\exp(\textsc{MLP}(\mathbf{p_{i}}))}{\sum_{j}{\exp(\textsc{MLP}(\mathbf{p_{j}}))}},$
	$\displaystyle\text{Off}(\mathbf{p_{{\rm inter}}})={\rm Min}(\{\text{Off}(\mathbf{p_{1}}),\dots,\text{Off}(\mathbf{p_{n}})\})\odot\sigma({\rm DeepSets}(\{\mathbf{p_{1}},\dots,\mathbf{p_{n}}\})),$

where $\odot$ is the dimension-wise product, ${\rm MLP}(\cdot):\mathbb{R}^{2d}\to\mathbb{R}^{d}$ is the Multi-Layer Perceptron, $\sigma(\cdot)$ is the sigmoid function, ${\rm DeepSets}(\cdot)$ is the permutation-invariant deep architecture (Zaheer et al., 2017), and both ${\rm Min}(\cdot)$ and ${\exp}(\cdot)$ are applied in a dimension-wise manner. Following Hamilton et al. (2018), we model all the deep sets by ${\rm DeepSets}(\{\mathbf{x_{1}},\ldots,\mathbf{x_{N}}\})={\rm MLP}((1/N)\cdot\sum_{i=1}^{N}{\rm MLP}(\mathbf{x_{i}}))$ , where all the hidden dimensionalities of the two MLPs are the same as the input dimensionality. The intuition behind our geometric intersection is to generate a smaller box that lies inside a set of boxes, as illustrated in Fig. 2(B).¹¹1One possible choice here would be to directly use raw box intersection, however, we find that our richer learnable parameterization is more expressive and robust
在这里，一个可能的选择是直接使用原始的框交集，然而，我们发现我们更丰富的可学习参数化更具表达能力和鲁棒性。
其中 $\odot$ 是按维度求积， ${\rm MLP}(\cdot):\mathbb{R}^{2d}\to\mathbb{R}^{d}$ 是多层感知机， $\sigma(\cdot)$ 是 sigmoid 函数， ${\rm DeepSets}(\cdot)$ 是排列不变的深度架构（Zaheer 等人，2017），而 ${\rm Min}(\cdot)$ 和 ${\exp}(\cdot)$ 都以按维度的方式应用。根据 Hamilton 等人（2018）的方法，我们通过 ${\rm DeepSets}(\{\mathbf{x_{1}},\ldots,\mathbf{x_{N}}\})={\rm MLP}((1/N)\cdot\sum_{i=1}^{N}{\rm MLP}(\mathbf{x_{i}}))$ 对所有深度集进行建模，其中两个 MLP 的隐藏维度与输入维度相同。我们几何交集背后的直觉是生成一个位于一组盒子内部的较小盒子，如图 2(B)所示。 ¹ Different from the generic deep sets to model the intersection (Hamilton et al., 2018), our geometric intersection operator effectively constrains the center position and models the shrinking set size.
与通用的深度集合不同，用于模拟交集的几何交集运算符有效地约束了中心位置并模拟了收缩的集合大小。

Entity-to-box distance. Given a query box $\mathbf{q}\in\mathbb{R}^{2d}$ and an entity vector $\mathbf{v}\in\mathbb{R}^{d}$ , we define their distance as
实体到框的距离。给定一个查询框 $\mathbf{q}\in\mathbb{R}^{2d}$ 和一个实体向量 $\mathbf{v}\in\mathbb{R}^{d}$ ，我们将它们的距离定义为

\displaystyle{\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})={\rm dist}_{\rm outside}(\mathbf{v};\mathbf{q})+\alpha\cdot{\rm dist}_{\rm inside}(\mathbf{v};\mathbf{q}),

(3)

where $\mathbf{q_{\rm max}}=\text{Cen}(\mathbf{q})+\text{Off}(\mathbf{q})\in\mathbb{R}^{d}$ , $\mathbf{q_{\rm min}}=\text{Cen}(\mathbf{q})-\text{Off}(\mathbf{q})\in\mathbb{R}^{d}$ and $0<\alpha<1$ is a fixed scalar, and
其中 $\mathbf{q_{\rm max}}=\text{Cen}(\mathbf{q})+\text{Off}(\mathbf{q})\in\mathbb{R}^{d}$ ， $\mathbf{q_{\rm min}}=\text{Cen}(\mathbf{q})-\text{Off}(\mathbf{q})\in\mathbb{R}^{d}$ 和 $0<\alpha<1$ 是固定标量，而

	$\displaystyle{\rm dist}_{\rm outside}(\mathbf{v};\mathbf{q})=\\|{\rm Max}(\mathbf{v}-\mathbf{q_{\rm max}},\mathbf{0})+{\rm Max}(\mathbf{q_{\rm min}}-\mathbf{v},\mathbf{0})\\|_{1},$
	$\displaystyle{\rm dist}_{\rm inside}(\mathbf{v};\mathbf{q})=\\|\text{Cen}(\mathbf{q})-{\rm Min}(\mathbf{q_{\rm max}},{\rm Max}(\mathbf{q_{\rm min}},\mathbf{v}))\\|_{1}.$

As illustrated in Fig. 2(C), ${\rm dist}_{\rm outside}$ corresponds to the distance between the entity and closest corner/side of the box. Analogously, ${\rm dist}_{\rm inside}$ corresponds to the distance between the center of the box and its side/corner (or the entity itself if the entity is inside the box).
如图 2（C）所示， ${\rm dist}_{\rm outside}$ 对应于实体与盒子最近的角/边之间的距离。类似地， ${\rm dist}_{\rm inside}$ 对应于盒子的中心与其边/角之间的距离（或者如果实体在盒子内，则对应于实体本身）。

The key here is to downweight the distance inside the box by using $0<\alpha<1$ . This means that as long as entity vectors are inside the box, we regard them as “close enough” to the query center (i.e., ${\rm dist}_{\rm outside}$ is 0, and ${\rm dist}_{\rm inside}$ is scaled by $\alpha$ ). When $\alpha=1$ , ${\rm dist}_{\rm box}$ reduces to the ordinary $L_{1}$ distance, i.e., $\|\text{Cen}(\mathbf{q})-\mathbf{v}\|_{1}$ , which is used by the conventional TransE (Bordes et al., 2013) as well as prior query embedding methods (Guu et al., 2015; Hamilton et al., 2018).
关键在于使用 $0<\alpha<1$ 对盒子内部的距离进行降权。这意味着只要实体向量在盒子内部，我们将其视为与查询中心“足够接近”（即 ${\rm dist}_{\rm outside}$ 为 0， ${\rm dist}_{\rm inside}$ 按 $\alpha$ 进行缩放）。当 $\alpha=1$ 时， ${\rm dist}_{\rm box}$ 变为普通的 $L_{1}$ 距离，即 $\|\text{Cen}(\mathbf{q})-\mathbf{v}\|_{1}$ ，这是传统的 TransE（Bordes 等人，2013）以及先前的查询嵌入方法（Guu 等人，2015；Hamilton 等人，2018）所使用的。

Training objective. Our next goal is to learn entity embeddings as well as geometric projection and intersection operators.
训练目标。我们的下一个目标是学习实体嵌入以及几何投影和交集运算符。

Given a training set of queries and their answers, we optimize a negative sampling loss (Mikolov et al., 2013) to effectively optimize our distance-based model (Sun et al., 2019):
鉴于一组查询及其答案的训练集，我们优化负采样损失（Mikolov 等人，2013）以有效优化我们的基于距离的模型（Sun 等人，2019）：

\displaystyle L

\displaystyle=-\log\sigma\left(\gamma-{\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})\right)-\sum_{i=1}^{k}\frac{1}{k}\log\sigma\left({\rm dist}_{\rm box}(\mathbf{v_{i}^{\prime}};\mathbf{q})-\gamma\right),

(4)

where $\gamma$ represents a fixed scalar margin, $v\in\llbracket q\rrbracket$ is a positive entity (i.e., answer to the query $q$ ), and $v_{i}^{\prime}\notin\llbracket q\rrbracket$ is the $i$ -th negative entity (non-answer to the query $q$ ) and $k$ is the number of negative entities.
其中 $\gamma$ 代表固定的标量边界， $v\in\llbracket q\rrbracket$ 是一个正实体（即，回答查询 $q$ 的答案）， $v_{i}^{\prime}\notin\llbracket q\rrbracket$ 是第 $i$ 个负实体（非查询 $q$ 的答案）， $k$ 是负实体的数量。

3.3 Tractable Handling of Disjunction Using Disjunctive Normal Form
3.3 使用析取范式处理可解的析取

So far we have focused on conjunctive queries, and our aim here is to tractably handle in the vector space a wider class of logical queries, called Existential Positive First-order (EPFO) queries (Dalvi & Suciu, 2012) that involve $\vee$ in addition to $\exists$ and $\wedge$ . We specifically focus on EPFO queries whose computation graphs are a DAG, same as that of conjunctive queries (Section 3.1), except that we now have an additional type of directed edge, called union defined as follows:
到目前为止，我们专注于合取查询，并且我们的目标是在向量空间中可解地处理更广泛的逻辑查询，称为存在性正一阶（EPFO）查询（Dalvi＆Suciu，2012），其中除了 $\exists$ 和 $\wedge$ 之外还涉及 $\vee$ 。我们特别关注 EPFO 查询的计算图是一个 DAG，与合取查询的计算图相同（第 3.1 节），只是现在我们有了一种额外类型的有向边，称为并集，定义如下：

•

Union: Given a set of entity sets $\{S_{1},S_{2},\ldots,S_{n}\}$ , this operator obtains $\cup_{i=1}^{n}S_{i}.$

• 联合：给定一组实体集合 $\{S_{1},S_{2},\ldots,S_{n}\}$ ，此运算符获取 $\cup_{i=1}^{n}S_{i}.$

A straightforward approach here would be to define another geometric operator for union and embed the query as we did in the previous sections. An immediate challenge for our box embeddings is that boxes can be located anywhere in the vector space, so their union would no longer be a simple box. In other words, union operation over boxes is not closed.
这里的一个直接方法是为并集定义另一个几何运算符，并像我们在前面的部分中所做的那样嵌入查询。我们的盒子嵌入面临的一个直接挑战是盒子可以位于向量空间的任何位置，因此它们的并集将不再是一个简单的盒子。换句话说，盒子的并集运算不是封闭的。

Theoretically, we prove a general negative result that holds for any embedding-based method that embeds query $q$ into $\mathbf{q}$ and uses some distance function to retrieve entities, i.e., ${\rm dist}(\mathbf{v};\mathbf{q})\leq\beta$ iff $v\in\llbracket q\rrbracket$ . Here, ${\rm dist}(\mathbf{v};\mathbf{q})$ is the distance between entity and query embeddings, e.g., ${\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})$ or $\|\mathbf{v}-\mathbf{q}\|_{1}$ , and $\beta$ is a fixed threshold.
从理论上讲，我们证明了一个普遍的负面结果，适用于任何将查询 $q$ 嵌入到 $\mathbf{q}$ 并使用某种距离函数来检索实体的基于嵌入的方法，即当且仅当 $v\in\llbracket q\rrbracket$ 时 ${\rm dist}(\mathbf{v};\mathbf{q})\leq\beta$ 。这里， ${\rm dist}(\mathbf{v};\mathbf{q})$ 是实体和查询嵌入之间的距离，例如 ${\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})$ 或 $\|\mathbf{v}-\mathbf{q}\|_{1}$ ，而 $\beta$ 是一个固定的阈值。

Theorem 1.

Consider any $M$ conjunctive queries $q_{1},\ldots,q_{M}$ whose denotation sets $\llbracket q_{1}\rrbracket,\ldots,\llbracket q_{M}\rrbracket$ are disjoint with each other, $\forall\ i\neq j,\>\llbracket q_{i}\rrbracket\cap\llbracket q_{j}\rrbracket=\varnothing$ . Let $D$ be the VC dimension of the function class $\{{\rm sign}(\beta-{\rm dist}(\cdot;\mathbf{q})):\mathbf{q}\in\Xi\}$ , where $\Xi$ represents the query embedding space and ${\rm sign}(\cdot)$ is the sign function. Then, we need $D\geq M$ to model any EPFO query, i.e., ${\rm dist}(\mathbf{v};\mathbf{q})\leq\beta\Leftrightarrow v\in\llbracket q\rrbracket$ is satisfied for every EPFO query $q$ .

定理 1. 考虑任何互不相交的

M

合取查询

q_{1},\ldots,q_{M}

，其表示集合

\llbracket q_{1}\rrbracket,\ldots,\llbracket q_{M}\rrbracket

。令

D

为函数类

\{{\rm sign}(\beta-{\rm dist}(\cdot;\mathbf{q})):\mathbf{q}\in\Xi\}

的 VC 维度，其中

\Xi

表示查询嵌入空间，

{\rm sign}(\cdot)

为符号函数。那么，我们需要

D\geq M

来建模任何 EPFO 查询，即对于每个 EPFO 查询

q

，都满足

{\rm dist}(\mathbf{v};\mathbf{q})\leq\beta\Leftrightarrow v\in\llbracket q\rrbracket

。

The proof is provided in Appendix A, where the key is that with the introduction of the union operation any subset of denotation sets can be the answer, which forces us to model the powerset $\{\cup_{{}_{q_{i}\in S}}\llbracket q_{i}\rrbracket:S\subseteq\{q_{1},\ldots,q_{M}\}\}$ in a vector space.
证明在附录 A 中提供，关键在于引入并操作后，任何指示集的子集都可以成为答案，这迫使我们在一个向量空间中建模幂集 $\{\cup_{{}_{q_{i}\in S}}\llbracket q_{i}\rrbracket:S\subseteq\{q_{1},\ldots,q_{M}\}\}$ 。

For a real-world KG, there are $M\approx|\mathcal{V}|$ conjunctive queries with non-overlapping answers. For example, in the commonly-used FB15k dataset (Bordes et al., 2013), derived from the Freebase (Bollacker et al., 2008), we find $M$ = 13,365, while $|\mathcal{V}|$ is 14,951 (see Appendix B for the details).
对于一个真实世界的知识图谱，存在 $M\approx|\mathcal{V}|$ 个具有非重叠答案的连结查询。例如，在常用的 FB15k 数据集（Bordes 等人，2013）中，源自 Freebase（Bollacker 等人，2008），我们发现 $M$ = 13,365，而 $|\mathcal{V}|$ 为 14,951（详见附录 B）。

Theorem 1 shows that in order to accurately model any EPFO query with the existing framework, the complexity of the distance function measured by the VC dimension needs to be as large as the number of KG entities. This implies that if we use common distance functions based on hyper-plane, Euclidean sphere, or axis-aligned rectangle,²²2For the detailed VC dimensions of these function classes, see Vapnik (2013). Crucially, their VC dimensions are all linear with respect to the number of parameters $d$ .
对于这些函数类的详细 VC 维度，请参阅 Vapnik（2013）。关键是，它们的 VC 维度都与参数数量成线性关系 $d$ 。
定理 1 表明，为了准确地模拟任何现有框架中的 EPFO 查询，距离函数的复杂度（由 VC 维度度量）需要与知识图谱实体的数量一样大。这意味着，如果我们使用基于超平面、欧几里得球体或轴对齐矩形的常见距离函数， ² their parameter dimensionality needs to be $\Theta(M)$ , which is $\Theta(|\mathcal{V}|)$ for real KGs we are interested in. In other words, the dimensionality of the logical query embeddings needs to be $\Theta(|\mathcal{V}|)$ , which is not low-dimensional; thus not scalable to large KGs and not generalizable in the presence of unobserved KG edges.
它们的参数维度需要是 $\Theta(M)$ ，对于我们感兴趣的真实知识图谱来说，这是 $\Theta(|\mathcal{V}|)$ 。换句话说，逻辑查询嵌入的维度需要是 $\Theta(|\mathcal{V}|)$ ，这并不是低维的；因此在大型知识图谱中不具有可扩展性，并且在存在未观察到的知识图谱边缘时不具有泛化能力。

To rectify this issue, our key idea is to transform a given EPFO query into a Disjunctive Normal Form (DNF) (Davey & Priestley, 2002), i.e., disjunction of conjunctive queries, so that union operation only appears in the last step. Each of the conjunctive queries can then be reasoned in the low-dimensional space, after which we can aggregate the results by a simple and intuitive procedure. In the following, we describe the transformation to DNF and the aggregation procedure.
为了纠正这个问题，我们的关键思想是将给定的 EPFO 查询转化为析取范式（DNF）（Davey＆Priestley，2002），即合取查询的析取，以便联合操作仅出现在最后一步。然后，可以在低维空间中推理每个合取查询，然后通过简单直观的过程聚合结果。接下来，我们将描述转换为 DNF 和聚合过程。

Transformation to DNF. Any first-order logic can be transformed into the equivalent DNF (Davey & Priestley, 2002). We perform such transformation directly in the space of computation graph, i.e., moving all the edges of type “union” to the last step of the computation graph. Let $G_{q}=(V_{q},E_{q})$ be the computation graph for a given EPFO query $q$ , and let $V_{\rm union}\subset V_{q}$ be a set of nodes whose in-coming edges are of type “union”. For each $v\in V_{\rm union}$ , define $P_{v}\subset V_{q}$ as a set of its parent nodes. We first generate $N=\prod_{v\in V_{\rm union}}|P_{v}|$ different computation graphs $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ as follows, each with different choices of $v_{\rm parent}$ in the first step.
将逻辑转换为 DNF 形式。任何一阶逻辑都可以转换为等价的 DNF 形式（Davey＆Priestley，2002）。我们直接在计算图空间中进行这种转换，即将所有类型为“union”的边移动到计算图的最后一步。设 $G_{q}=(V_{q},E_{q})$ 为给定 EPFO 查询 $q$ 的计算图，设 $V_{\rm union}\subset V_{q}$ 为一组入边类型为“union”的节点。对于每个 $v\in V_{\rm union}$ ，将 $P_{v}\subset V_{q}$ 定义为其父节点的集合。我们首先生成 $N=\prod_{v\in V_{\rm union}}|P_{v}|$ 个不同的计算图 $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ ，如下所示，每个计算图在第一步中选择不同的 $v_{\rm parent}$ 。

1.

For every $v\in V_{\rm union}$ , select one parent node $v_{\rm parent}\in P_{v}$ .

1. 对于每个 $v\in V_{\rm union}$ ，选择一个父节点 $v_{\rm parent}\in P_{v}$ 。
2.

Remove all the edges of type ‘union.’

删除所有类型为“union”的边缘。
3.

Merge $v$ and $v_{\rm parent}$ , while retaining all other edge connections.

3. 合并 $v$ 和 $v_{\rm parent}$ ，同时保留所有其他边连接。

We then combine the obtained computation graphs $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ as follows to give the final equivalent computation graph.
然后，我们按照以下方式组合所得到的计算图 $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ ，以得到最终等效的计算图。

1.

Convert the target sink nodes of all the obtained computation graphs into the existentially quantified bound variables nodes.

1. 将所有所得计算图的目标汇节点转换为存在量化的绑定变量节点。
2.

Create a new target sink node $V_{?}$ , and draw directed edges of type “union” from all the above variable nodes to the new target node.

2. 创建一个新的目标接收节点 $V_{?}$ ，并从所有上述变量节点绘制类型为“union”的有向边指向新的目标节点。

An example of the entire transformation procedure is illustrated in Fig. 3. By the definition of the union operation, our procedure gives the equivalent computation graph as the original one. Furthermore, as all the union operators are removed from $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ , all of these computation graphs represent conjunctive queries, which we denote as $q^{(1)},\ldots,q^{(N)}$ . We can then apply existing framework to obtain a set of embeddings for these conjunctive queries as $\mathbf{q^{(1)}},\ldots,\mathbf{q^{(N)}}$ .
整个转换过程的示例如图 3 所示。根据并集操作的定义，我们的过程给出了与原始计算图等效的计算图。此外，由于所有的并集运算符都从 $G_{q^{(1)}},\ldots,G_{q^{(N)}}$ 中移除了，所有这些计算图都表示合取查询，我们将其表示为 $q^{(1)},\ldots,q^{(N)}$ 。然后，我们可以应用现有的框架为这些合取查询获取一组嵌入，如 $\mathbf{q^{(1)}},\ldots,\mathbf{q^{(N)}}$ 所示。

Aggregation. Next we define the distance function between the given EPFO query $q$ and an entity $v\in\mathcal{V}$ . Since $q$ is logically equivalent to $q^{(1)}\vee\cdots\vee q^{(N)}$ , we can naturally define the aggregated distance function using the box distance ${\rm dist}_{\rm box}$ :
聚合。接下来，我们定义给定的 EPFO 查询 $q$ 与实体 $v\in\mathcal{V}$ 之间的距离函数。由于 $q$ 在逻辑上等价于 $q^{(1)}\vee\cdots\vee q^{(N)}$ ，我们可以自然地使用盒子距离 ${\rm dist}_{\rm box}$ 定义聚合距离函数：

\displaystyle{\rm dist}_{\rm agg}(\mathbf{v};q)={\rm Min}(\{{\rm dist}_{\rm box}(\mathbf{v};\mathbf{q^{(1)}}),\ldots,{\rm dist}_{\rm box}(\mathbf{v};\mathbf{q^{(N)}})\}),

(5)

where ${\rm dist}_{\rm agg}$ is parameterized by the EPFO query $q$ . When $q$ is a conjunctive query, i.e., $N=1$ , ${\rm dist}_{\rm agg}(\mathbf{v};q)={\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})$ . For $N>1$ , ${\rm dist}_{\rm agg}$ takes the minimum distance to the closest box as the distance to an entity. This modeling aligns well with the union operation; an entity is inside the union of sets as long as the entity is in one of the sets. Note that our DNF-query rewriting scheme is general and is able to extend any method that works for conjunctive queries (e.g., (Hamilton et al., 2018)) to handle more general class of EPFO queries.
其中 ${\rm dist}_{\rm agg}$ 由 EPFO 查询 $q$ 参数化。当 $q$ 是一个合取查询，即 $N=1$ ， ${\rm dist}_{\rm agg}(\mathbf{v};q)={\rm dist}_{\rm box}(\mathbf{v};\mathbf{q})$ 。对于 $N>1$ ， ${\rm dist}_{\rm agg}$ 将最小距离取为到最近盒子的距离作为实体的距离。这种建模与并集操作很好地对齐；只要实体在其中一个集合中，实体就在集合的并集中。请注意，我们的 DNF 查询重写方案是通用的，并且能够扩展任何适用于合取查询的方法（例如（Hamilton 等，2018））以处理更一般的 EPFO 查询类别。

Computational complexity. The computational complexity of answering an EPFO query with our framework is equal to that of answering the $N$ conjunctive queries. In practice, $N$ might not be so large, and all the $N$ computations can be parallelized. Furthermore, answering each conjunctive query is very fast as it requires us to execute a sequence of simple box operations (each of which takes constant time) and then perform a range search (Bentley & Friedman, 1979) in the embedding space, which can also be done in constant time using techniques based on Locality Sensitive Hashing (Indyk & Motwani, 1998).
计算复杂性。使用我们的框架回答 EPFO 查询的计算复杂性等于回答 $N$ 个合取查询的复杂性。在实践中， $N$ 可能不会很大，并且所有的 $N$ 计算可以并行化。此外，回答每个合取查询非常快速，因为它要求我们执行一系列简单的盒子操作（每个操作都需要恒定时间），然后在嵌入空间中执行范围搜索（Bentley＆Friedman，1979），这也可以使用基于局部敏感哈希（Indyk＆Motwani，1998）的技术在恒定时间内完成。

4 Experiments 4 实验

Our goal in the experiment section is to evaluate the performance of query2box on discovering answers to complex logical queries that cannot be obtained by traversing the incomplete KG. This means, we will focus on answering queries where one or more missing edges in the KG have to be successfully predicted in order to obtain the additional answers.
我们在实验部分的目标是评估 query2box 在发现无法通过遍历不完整的知识图谱获得的复杂逻辑查询答案时的性能。这意味着，我们将重点回答那些需要成功预测一个或多个缺失边以获得额外答案的查询。

Dataset	1p	2p	3p	2i	3i	ip	pi	2u	up
FB15k	10.8	255.6	250.0	90.3	64.1	593.8	190.1	27.8	227.0
FB15k-237	13.3	131.4	215.3	69.0	48.9	593.8	257.7	35.6	127.7
NELL995	8.5	56.6	65.3	30.3	15.9	310.0	144.9	14.4	62.5

Table 1: Average number of answer entities of test queries with missing edges grouped by different query structures (for a KG with 10% edges missing).
表 1：根据不同查询结构分组的测试查询中缺少边缘的平均答案实体数量（对于缺少 10%边缘的知识图谱）。

4.1 Knowledge Graphs and Query Generation
4.1 知识图谱和查询生成

We perform experiments on three standard KG benchmarks, FB15k (Bordes et al., 2013), FB15k-237 (Toutanova & Chen, 2015), and NELL995 (Xiong et al., 2017) (see Appendix E for NELL995 pre-processing details). Dataset statistics are summarized in Table 5 in Appendix F.
我们在三个标准 KG 基准测试上进行实验，分别是 FB15k（Bordes 等，2013）、FB15k-237（Toutanova 和 Chen，2015）和 NELL995（Xiong 等，2017）（NELL995 预处理详细信息请参见附录 E）。数据集统计信息总结在附录 F 的表 5 中。

We follow the standard evaluation protocol in KG literture: Given the standard split of edges into training, test, and validation sets, we first augment the KG to also include inverse relations and effectively double the number of edges in the graph. We then create three graphs: $\mathcal{G}_{\text{train}}$ , which only contains training edges and we use this graph to train node embeddings as well as box operators. We then also generate two bigger graphs: $\mathcal{G}_{\text{valid}}$ , which contains $\mathcal{G}_{\text{train}}$ plus the validation edges, and $\mathcal{G}_{\text{test}}$ , which includes $\mathcal{G}_{\text{valid}}$ as well as the test edges.
我们遵循 KG 文献中的标准评估协议：根据将边缘分为训练、测试和验证集的标准拆分，我们首先扩充 KG，使其包括逆关系，并有效地将图中的边缘数量加倍。然后我们创建三个图： $\mathcal{G}_{\text{train}}$ ，仅包含训练边缘，我们使用该图来训练节点嵌入以及盒子操作符。然后我们还生成两个更大的图： $\mathcal{G}_{\text{valid}}$ ，包含 $\mathcal{G}_{\text{train}}$ 以及验证边缘，以及 $\mathcal{G}_{\text{test}}$ ，包括 $\mathcal{G}_{\text{valid}}$ 以及测试边缘。

We consider 9 kinds of diverse query structures shown and named in Fig. 4. We use 5 query structures for training and then evaluate on all the 9 query structures. We refer the reader to Appendix D for full details on query generation and Table 6 in Appendix F for statistics of the generated logical queries. Given a query $q$ , let $\llbracket q\rrbracket_{\text{train}}$ , $\llbracket q\rrbracket_{\text{val}}$ , and $\llbracket q\rrbracket_{\text{test}}$ denote a set of answer entities obtained by running subgraph matching of $q$ on $\mathcal{G}_{\text{train}}$ , $\mathcal{G}_{\text{valid}}$ , and $\mathcal{G}_{\text{test}}$ , respectively. At the training time, we use $\llbracket q\rrbracket_{\text{train}}$ as positive examples for the query and other random entities as negative examples. However, at the test/validation time we proceed differently. Note that we focus on answering queries where generalization performance is crucial and at least one edge needs to be imputed in order to answer the queries. Thus, rather than evaluating a given query on the full validation (or test) set $\llbracket q\rrbracket_{\text{val}}$ ( $\llbracket q\rrbracket_{\text{test}}$ ) of answers, we validate the method only on answers that include missing relations. Given how we constructed $\mathcal{G}_{\text{train}}\subseteq\mathcal{G}_{\text{valid}}\subseteq\mathcal{G}_{\text{test}}$ , we have $\llbracket q\rrbracket_{\text{train}}\subseteq\llbracket q\rrbracket_{\text{val}}\subseteq\llbracket q\rrbracket_{\text{test}}$ and thus we evaluate the method on $\llbracket q\rrbracket_{\text{val}}\backslash\llbracket q\rrbracket_{\text{train}}$ to tune hyper-parameters and then report results identifying answer entities in $\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ . This means we always evaluate on queries/entities that were not part of the training set and the method has not seen them before. Furthermore, for these queries, traditional graph traversal techniques would not be able to find the answers (due to missing relations).
我们考虑了图 4 中展示和命名的 9 种不同的查询结构。我们使用 5 种查询结构进行训练，然后在所有 9 种查询结构上进行评估。有关查询生成的详细信息，请参见附录 D；生成的逻辑查询的统计信息，请参见附录 F 中的表 6。给定一个查询 $q$ ，让 $\llbracket q\rrbracket_{\text{train}}$ ， $\llbracket q\rrbracket_{\text{val}}$ 和 $\llbracket q\rrbracket_{\text{test}}$ 分别表示通过在 $\mathcal{G}_{\text{train}}$ ， $\mathcal{G}_{\text{valid}}$ 和 $\mathcal{G}_{\text{test}}$ 上运行子图匹配 $q$ 获得的一组答案实体。在训练时，我们使用 $\llbracket q\rrbracket_{\text{train}}$ 作为查询的正例，其他随机实体作为负例。然而，在测试/验证时，我们采取不同的方法。请注意，我们专注于回答在泛化性能至关重要且至少需要填充一个边以回答查询的查询。因此，我们不会在完整的验证（或测试）答案集 $\llbracket q\rrbracket_{\text{val}}$ （ $\llbracket q\rrbracket_{\text{test}}$ ）上评估给定的查询，而是仅在包含缺失关系的答案上验证该方法。鉴于我们构建了 $\mathcal{G}_{\text{train}}\subseteq\mathcal{G}_{\text{valid}}\subseteq\mathcal{G}_{\text{test}}$ ，我们有 $\llbracket q\rrbracket_{\text{train}}\subseteq\llbracket q\rrbracket_{\text{val}}\subseteq\llbracket q\rrbracket_{\text{test}}$ ，因此我们在 $\llbracket q\rrbracket_{\text{val}}\backslash\llbracket q\rrbracket_{\text{train}}$ 上评估该方法以调整超参数，然后报告在 $\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ 中识别答案实体的结果。这意味着我们始终对不在训练集中的查询/实体进行评估，而且该方法以前从未见过它们。此外，对于这些查询，传统的图遍历技术将无法找到答案（因为缺少关系）。

Table 1 shows the average number of answer entities for different query structures. We observe that complex logical queries (especially 2p, 3p, ip, pi, up) indeed require modeling a much larger number of answer entities (often more than 10 times) than the simple 1p queries do. Therefore, we expect our box embeddings to work particularly well in handling complex queries with many answer entities.³³3On the simple link prediction (1p query) task, box embeddings provide minor empirical performance improvement over TransE, possibly because simple link prediction does not require modeling large sets of entities, as shown in Table 1. See Appendix C for full experimental results on link prediction.
在简单的链接预测（1p 查询）任务中，盒子嵌入提供的实证性能提升略低于 TransE，可能是因为简单的链接预测不需要对大量实体进行建模，如表 1 所示。参见附录 C，获取链接预测的完整实验结果。
表 1 显示了不同查询结构的平均答案实体数量。我们观察到，复杂的逻辑查询（尤其是 2p、3p、ip、pi、up）确实需要建模更多的答案实体（通常是简单的 1p 查询的 10 倍以上）。因此，我们期望我们的盒子嵌入在处理具有许多答案实体的复杂查询时表现特别好。 ³

4.2 Evaluation Protocol 4.2 评估协议

Given a test query $q$ , for each of its non-trivial answers $v\in\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ , we use ${\rm dist}_{\rm box}$ in Eq. 3 to rank $v$ among $\mathcal{V}\backslash\llbracket q\rrbracket_{\text{test}}$ . Denoting the rank of $v$ by $\text{Rank}(v)$ , we then calculate evaluation metrics for answering query $q$ , such as Mean Reciprocal Rank (MRR) and Hits at $K$ (H@ $K$ ):
给定一个测试查询 $q$ ，对于其每个非平凡答案 $v\in\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ ，我们使用等式 3 中的 ${\rm dist}_{\rm box}$ 来对 $v$ 进行排名 $\mathcal{V}\backslash\llbracket q\rrbracket_{\text{test}}$ 。将 $v$ 的排名表示为 $\text{Rank}(v)$ ，然后计算回答查询 $q$ 的评估指标，如平均倒数排名（MRR）和命中次数 $K$ （H@ $K$ ）：

\displaystyle\text{Metrics}(q)=\frac{1}{|\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}|}\sum_{v\in\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}}f_{\text{metrics}}(\text{Rank}(v)),

(6)

where $f_{\text{metrics}}(x)=\frac{1}{x}$ for MRR, and $f_{\text{metrics}}(x)={\bf 1}[x\leq K]$ for H@ $K$ .
其中 $f_{\text{metrics}}(x)=\frac{1}{x}$ 代表 MRR， $f_{\text{metrics}}(x)={\bf 1}[x\leq K]$ 代表 H@ $K$ 。

We then average Eq. 6 over all the queries within the same query structure,⁴⁴4Note that our evaluation metric is slightly different from conventional metric (Nickel et al., 2016; Hamilton et al., 2018; Guu et al., 2015), where average is taken over query-answer pairs. The conventional metric is problematic as it can be significantly biased toward correctly answering generic queries with huge number of answers, while dismissing fine-grained queries with a few answers. Here, to treat queries equally regardless of the number of answers they have, we take average over queries.
请注意，我们的评估指标与传统指标略有不同（Nickel 等人，2016 年；Hamilton 等人，2018 年；Guu 等人，2015 年），其中对查询-答案对进行平均处理。传统指标存在问题，因为它可能会明显偏向于正确回答具有大量答案的通用查询，同时忽略了只有少数答案的细粒度查询。在这里，为了平等对待查询，无论它们具有多少答案，我们对查询进行平均处理。
然后我们对相同查询结构中的所有查询求平均值， ⁴ and report the results separately for different query structures. The same evaluation protocol is applied to the validation stage except that we evaluate on $\llbracket q\rrbracket_{\text{val}}\backslash\llbracket q\rrbracket_{\text{train}}$ rather than $\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ .
并针对不同的查询结构分别报告结果。在验证阶段，我们采用相同的评估协议，只是我们评估的对象是 $\llbracket q\rrbracket_{\text{val}}\backslash\llbracket q\rrbracket_{\text{train}}$ 而不是 $\llbracket q\rrbracket_{\text{test}}\backslash\llbracket q\rrbracket_{\text{val}}$ 。

4.3 Baseline and Model Variants
4.3 基准和模型变体

We compare our framework query2box against the state-of-the-art gqe (Hamilton et al., 2018). gqe embeds a query to a single vector, and models projection and intersection operators as translation and deep sets (Zaheer et al., 2017), respectively. The $L_{1}$ distance is used as the distance between query and entity vectors. For a fair comparison, we also compare with gqe-double (gqe with doubled embedding dimensionality) so that query2box and gqe-double have the same amount of parameters. Refer to Appendix G for the model hyper-parameters used in our experiments. Although the original gqe cannot handle EPFO queries, we apply our DNF-query rewriting strategy and in our evaluation extend gqe to handle general EPFO queries as well. Furthermore, we perform extensive ablation study by considering several variants of query2box (abbreviated as q2b). We list our method as well as its variants below.
我们将我们的框架 query2box 与最先进的 gqe（Hamilton 等人，2018）进行比较。gqe 将查询嵌入到一个单一的向量中，并将投影和交集运算建模为翻译和深度集合（Zaheer 等人，2017）。 $L_{1}$ 距离被用作查询和实体向量之间的距离。为了公平比较，我们还将其与 gqe-double（具有加倍嵌入维度的 gqe）进行比较，以便 query2box 和 gqe-double 具有相同数量的参数。有关我们实验中使用的模型超参数，请参阅附录 G。尽管原始的 gqe 无法处理 EPFO 查询，但我们应用了我们的 DNF 查询重写策略，并在我们的评估中扩展了 gqe 以处理一般的 EPFO 查询。此外，我们通过考虑 query2box 的几个变体（简称为 q2b）进行了广泛的消融研究。我们列出了我们的方法以及其变体如下。

•

q2b (our method): The box embeddings are used to model queries, and the attention mechanism is used for the intersection operator.

• q2b（我们的方法）：盒子嵌入用于建模查询，注意机制用于交集运算符。
•

q2b-avg: The attention mechanism for intersection is replaced with averaging.

• q2b-avg：交集的注意机制被替换为平均值。
•

q2b-deepsets: The attention mechanism for intersection is replaced with the deep sets.

• q2b-deepsets：交集的注意机制被替换为深度集合。
•

q2b-avg-1p: The variant of q2b-avg that is trained with only 1p queries (see Fig. 4); thus, logical operators are not explicitly trained.

• q2b-avg-1p：q2b-avg 的变体，仅使用 1p 查询进行训练（参见图 4）；因此，逻辑运算符没有被明确训练。
•

q2b-sharedoffset; The box offset is shared across all queries (every query is represented by a box with the same trainable size).

• q2b-sharedoffset; 盒子偏移在所有查询之间共享（每个查询由具有相同可训练大小的盒子表示）。

Method	Avg	1p	2p	3p	2i	3i	ip	pi	2u	up
FB15k
q2b	0.484	0.786	0.413	0.303	0.593	0.712	0.211	0.397	0.608	0.33
gqe	0.386	0.636	0.345	0.248	0.515	0.624	0.151	0.310	0.376	0.273
gqe-double	0.384	0.630	0.346	0.250	0.515	0.611	0.153	0.320	0.362	0.271
FB15k-237
q2b	0.268	0.467	0.24	0.186	0.324	0.453	0.108	0.205	0.239	0.193
gqe	0.228	0.402	0.213	0.155	0.292	0.406	0.083	0.17	0.169	0.163
gqe-double	0.23	0.405	0.213	0.153	0.298	0.411	0.085	0.182	0.167	0.16
NELL995
q2b	0.306	0.555	0.266	0.233	0.343	0.48	0.132	0.212	0.369	0.163
gqe	0.247	0.418	0.228	0.205	0.316	0.447	0.081	0.186	0.199	0.139
gqe-double	0.248	0.417	0.231	0.203	0.318	0.454	0.081	0.188	0.2	0.139

Table 2: H@3 results of query2box vs. gqe on FB15k, FB15k-237 and NELL995.
表 2：在 FB15k、FB15k-237 和 NELL995 上，查询 2box 与 gqe 的 H@3 结果。

Method	Avg	1p	2p	3p	2i	3i	ip	pi	2u	up
FB15k
q2b	0.484	0.786	0.413	0.303	0.593	0.712	0.211	0.397	0.608	0.330
q2b-avg	0.468	0.779	0.407	0.300	0.577	0.673	0.199	0.345	0.607	0.326
q2b-deepsets	0.467	0.755	0.407	0.294	0.588	0.699	0.197	0.378	0.562	0.324
q2b-avg-1p	0.385	0.812	0.262	0.173	0.463	0.529	0.126	0.263	0.653	0.187
q2b-sharedoffset	0.372	0.684	0.335	0.232	0.442	0.559	0.144	0.282	0.417	0.252
FB15k-237
q2b	0.268	0.467	0.24	0.186	0.324	0.453	0.108	0.205	0.239	0.193
q2b-avg	0.249	0.462	0.242	0.182	0.278	0.391	0.101	0.158	0.236	0.189
q2b-deepsets	0.259	0.458	0.243	0.186	0.303	0.432	0.104	0.187	0.231	0.190
q2b-avg-1p	0.219	0.457	0.193	0.132	0.251	0.319	0.083	0.142	0.241	0.152
q2b-sharedoffset	0.207	0.391	0.199	0.139	0.251	0.354	0.082	0.154	0.15	0.142
NELL995
q2b	0.306	0.555	0.266	0.233	0.343	0.480	0.132	0.212	0.369	0.163
q2b-avg	0.283	0.543	0.250	0.228	0.300	0.403	0.116	0.188	0.36	0.161
q2b-deepsets	0.293	0.539	0.26	0.231	0.317	0.467	0.11	0.202	0.349	0.16
q2b-avg-1p	0.274	0.607	0.229	0.182	0.277	0.315	0.097	0.18	0.443	0.133
q2b-sharedoffset	0.237	0.436	0.219	0.201	0.278	0.379	0.096	0.174	0.217	0.137

Table 3: H@3 results of query2box vs. several variants on FB15k, FB15k-237 and NELL995.
表 3：在 FB15k、FB15k-237 和 NELL995 上，查询 2box 与几个变体的 H@3 结果。

4.4 Main Results 4.4 主要结果

We start by comparing our q2b with state-of-the-art query embedding method gqe (Hamilton et al., 2018) on FB15k, FB15k-237, and NELL995. As listed in Tables 2, our method significantly and consistently outperforms the state-of-the-art baseline across all the query structures, including those not seen during training as well as those with union operations. On average, we obtain 9.8% (25% relative), 3.8% (15% relative), and 5.9% (24% relative) higher H@3 than the best baselines on FB15k, FB15k-237, and NELL995, respectively. Notice that naïvely increasing embedding dimensionality in gqe yields limited performance improvement. Our q2b is able to effectively model a large set of entities by using the box embedding, and achieves a significant performance gain compared with gqe-double (with same number of parameters) that represents queries as point vectors. Also notice that q2b performs well on new queries with the same structure as the training queries as well as on new query structures never seen during training, which demonstrates that q2b generalizes well within and beyond query structures.
我们首先在 FB15k、FB15k-237 和 NELL995 上将我们的 q2b 与最先进的查询嵌入方法 gqe（Hamilton 等，2018）进行比较。如表 2 所示，我们的方法在所有查询结构上都显著且一致地优于最先进的基准线，包括那些在训练过程中没有见过的查询以及具有并集操作的查询。平均而言，我们在 FB15k、FB15k-237 和 NELL995 上的 H@3 比最佳基准线分别高出 9.8%（相对增加 25%）、3.8%（相对增加 15%）和 5.9%（相对增加 24%）。请注意，简单地增加 gqe 中的嵌入维度只能带来有限的性能改进。我们的 q2b 能够通过使用盒子嵌入有效地建模大量实体，并且与将查询表示为点向量的 gqe-double（具有相同数量的参数）相比，取得了显著的性能提升。还请注意，q2b 在与训练查询具有相同结构的新查询上表现良好，也在训练过程中从未见过的新查询结构上表现良好，这表明 q2b 在查询结构内外都具有良好的泛化能力。

We also conduct extensive ablation studies (Tables 3). We summarize the results as follows:
我们还进行了广泛的消融研究（表 3）。我们将结果总结如下：

Importance of attention mechanism. First, we show that our modeling of intersection using the attention mechanism is important. Given a set of box embeddings $\{\mathbf{p_{1}},\dots,\mathbf{p_{n}}\}$ , q2b-avg is the most naïve way to calculate the center of the resulting box embedding $\mathbf{p_{{\rm inter}}}$ while q2b-deepsets is too flexible and neglects the fact that the center should be a weighted average of $\text{Cen}(\mathbf{p_{1}}),\dots,\text{Cen}(\mathbf{p_{n}})$ . Compared with the two methods, q2b achieves better performance in answering queries that involve intersection operation, e.g., 2i, 3i, pi, ip. Specifically, on FB15k-237, q2b obtains more than 4% and 2% absolute gain in H@3 compared to q2b-avg and q2b-deepsets, respectively.
注意机制的重要性。首先，我们展示了使用注意机制对交集进行建模的重要性。给定一组盒子嵌入 $\{\mathbf{p_{1}},\dots,\mathbf{p_{n}}\}$ ，q2b-avg 是计算结果盒子嵌入的中心的最简单方式 $\mathbf{p_{{\rm inter}}}$ ，而 q2b-deepsets 过于灵活，忽视了中心应该是 $\text{Cen}(\mathbf{p_{1}}),\dots,\text{Cen}(\mathbf{p_{n}})$ 的加权平均值。与这两种方法相比，q2b 在涉及交集操作的查询中表现更好，例如 2i、3i、pi、ip。具体而言，在 FB15k-237 上，与 q2b-avg 和 q2b-deepsets 相比，q2b 在 H@3 上获得了超过 4%和 2%的绝对增益。

Necessity of training on complex queries. Second, we observe that explicitly training on complex logical queries beyond one-hop path queries (1p in Fig. 4) improves the reasoning performance. Although q2b-avg-1p is able to achieve strong performance on 1p and 2u, where answering 2u is essentially answering two 1p queries with an additional minimum operation (see Eq. 5 in Section 3.3), q2b-avg-1p fails miserably in answering other types of queries involving logical operators. On the other hand, other methods (q2b, q2b-avg, and q2b-deepsets) that are explicitly trained on the logical queries achieve much higher accuracy, with up to 10% absolute average improvement of H@3 on FB15k.
复杂查询训练的必要性。其次，我们观察到明确训练复杂逻辑查询，超出一跳路径查询（图 4 中的 1p）可以提高推理性能。尽管 q2b-avg-1p 在 1p 和 2u 上表现出色，其中回答 2u 实际上是回答两个 1p 查询并进行额外的最小操作（见第 3.3 节的公式 5），但在回答涉及逻辑运算符的其他类型查询时，q2b-avg-1p 表现糟糕。另一方面，明确训练逻辑查询的其他方法（q2b、q2b-avg 和 q2b-deepsets）在 FB15k 上的 H@3 上实现了更高的准确性，平均提高了 10%的绝对值。

Adaptive box size for different queries. Third, we investigate the importance of learning adaptive offsets (box size) for different queries. q2b-sharedoffset is a variant of our q2b where all the box embeddings share the same learnable offset. q2b-sharedoffset does not work well on all types of queries. This is most likely because different queries have different numbers of answer entities, and the adaptive box size enables us to better model it. In fact, we find that box offset varies significantly across different relations, and one-to-many relations tend to have larger offset embeddings (see Appendix H for the details).
不同查询的自适应框大小。第三，我们研究了学习不同查询的自适应偏移（框大小）的重要性。q2b-sharedoffset 是我们的 q2b 的一个变体，其中所有的框嵌入共享相同的可学习偏移量。q2b-sharedoffset 在所有类型的查询上效果不好。这很可能是因为不同的查询具有不同数量的答案实体，而自适应框大小使我们能够更好地对其进行建模。事实上，我们发现框偏移在不同关系之间有显著差异，而一对多关系往往具有较大的偏移嵌入（详见附录 H）。

5 Conclusion 5 结论

In this paper we proposed a reasoning framework called query2box that can effectively model and reason over sets of entities as well as handle EPFO queries in a vector space. Given a logical query, we first transform it into DNF, embed each conjunctive query into a box, and output entities closest to their nearest boxes. Our approach is capable of handling all types of EPFO queries scalably and accurately. Experimental results on standard KGs demonstrate that query2box significantly outperforms the existing work in answering diverse logical queries.
在本文中，我们提出了一个称为 query2box 的推理框架，可以有效地对实体集合进行建模和推理，并在向量空间中处理 EPFO 查询。给定一个逻辑查询，我们首先将其转换为 DNF，将每个合取查询嵌入到一个盒子中，并输出最接近其最近盒子的实体。我们的方法能够可扩展且准确地处理所有类型的 EPFO 查询。在标准知识图谱上的实验结果表明，query2box 在回答各种逻辑查询方面明显优于现有工作。

Acknowledgments 致谢

We thank William Hamilton, Rex Ying, and Jiaxuan You for their helpful discussion. W.H is supported by Funai Overseas Scholarship and Masason Foundation Fellowship. J.L is a Chan Zuckerberg Biohub investigator. We gratefully acknowledge the support of DARPA under Nos. FA865018C7880 (ASED), N660011924033 (MCS); ARO under Nos. W911NF-16-1-0342 (MURI), W911NF-16-1-0171 (DURIP); NSF under Nos. OAC-1835598 (CINES), OAC-1934578 (HDR); Stanford Data Science Initiative, Wu Tsai Neurosciences Institute, Chan Zuckerberg Biohub, JD.com, Amazon, Boeing, Docomo, Huawei, Hitachi, Observe, Siemens, UST Global.
我们感谢 William Hamilton，Rex Ying 和 Jiaxuan You 的有益讨论。W.H 受到 Funai 海外奖学金和 Masason 基金会奖学金的支持。J.L 是一位 Chan Zuckerberg Biohub 研究员。我们衷心感谢 DARPA 在 FA865018C7880（ASED），N660011924033（MCS）下的支持；ARO 在 W911NF-16-1-0342（MURI），W911NF-16-1-0171（DURIP）下的支持；NSF 在 OAC-1835598（CINES），OAC-1934578（HDR）下的支持；Stanford Data Science Initiative，Wu Tsai Neurosciences Institute，Chan Zuckerberg Biohub，JD.com，Amazon，Boeing，Docomo，Huawei，Hitachi，Observe，Siemens，UST Global 的支持。

The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied, of DARPA, NIH, ARO, or the U.S. Government.
尽管上面有任何版权注释，但美国政府有权为政府目的复制和分发再版。本材料中表达的任何观点、发现、结论或建议均属于作者个人观点，不一定反映 DARPA、NIH、ARO 或美国政府的观点、政策或认可，无论是明示还是暗示。

References 参考资料

Allen et al. (2019) Allen 等人（2019 年） Carl Allen, Ivana Balazevic, and Timothy M Hospedales. On understanding knowledge graph representation. arXiv preprint arXiv:1909.11611, 2019.
Carl Allen，Ivana Balazevic 和 Timothy M Hospedales。关于理解知识图谱表示。arXiv 预印本 arXiv:1909.11611，2019 年。
Athiwaratkun & Wilson (2018)
Athiwaratkun 和 Wilson（2018 年） Ben Athiwaratkun and Andrew Gordon Wilson. Hierarchical density order embeddings. In International Conference on Learning Representations (ICLR), 2018.
Ben Athiwaratkun 和 Andrew Gordon Wilson。分层密度排序嵌入。在国际学习表示会议（ICLR）上，2018 年。
Bahdanau et al. (2015) Bahdanau 等人（2015 年） Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR), 2015.
Dzmitry Bahdanau，Kyunghyun Cho 和 Yoshua Bengio。通过联合学习对齐和翻译的神经机器翻译。在国际学习表示会议（ICLR）上，2015 年。
Bentley & Friedman (1979)
贝特利和弗里德曼（1979 年） Jon Louis Bentley and Jerome H Friedman. Data structures for range searching. ACM Computing Surveys (CSUR), 11(4):397–409, 1979.
Jon Louis Bentley 和 Jerome H Friedman。范围搜索的数据结构。ACM Computing Surveys（CSUR），11（4）：397-409，1979 年。
Bollacker et al. (2008) Bollacker 等人（2008 年） Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In ACM SIGMOD international conference on Management of data (SIGMOD), pp. 1247–1250. AcM, 2008.
Kurt Bollacker，Colin Evans，Praveen Paritosh，Tim Sturge 和 Jamie Taylor。Freebase：一个协作创建的用于结构化人类知识的图形数据库。在 ACM SIGMOD 国际数据管理会议（SIGMOD）上，第 1247-1250 页。AcM，2008 年。
Bordes et al. (2013) Bordes 等人（2013 年） Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2787–2795, 2013.
安托万·博德斯，尼古拉斯·乌苏尼尔，阿尔贝托·加西亚-杜兰，杰森·韦斯顿和奥克萨娜·亚赫涅科。翻译嵌入以建模多关系数据。在神经信息处理系统（NeurIPS）的进展中，第 2787-2795 页，2013 年。
Dalvi & Suciu (2007)
Dalvi＆Suciu（2007） Nilesh Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. VLDB, 16(4):523–544, 2007.
Nilesh Dalvi 和 Dan Suciu。概率数据库上的高效查询评估。VLDB，16(4)：523-544，2007 年。
Dalvi & Suciu (2012)
Dalvi＆Suciu（2012） Nilesh Dalvi and Dan Suciu. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), 59(6):30, 2012.
Nilesh Dalvi 和 Dan Suciu。概率推理在合取查询的并集中的二分法。ACM 期刊（JACM），59（6）：30，2012 年。
Das et al. (2017) 达斯等人（2017 年） Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. Chains of reasoning over entities, relations, and text using recurrent neural networks. In European Chapter of the Association for Computational Linguistics (EACL), pp. 132–141, 2017.
Rajarshi Das，Arvind Neelakantan，David Belanger 和 Andrew McCallum。使用循环神经网络对实体、关系和文本进行推理链。在欧洲计算语言学协会（EACL）上，第 132-141 页，2017 年。
Davey & Priestley (2002)
戴维和普里斯特利（2002 年） Brian A Davey and Hilary A Priestley. Introduction to lattices and order. Cambridge university press, 2002.
Brian A Davey 和 Hilary A Priestley。格与序的介绍。剑桥大学出版社，2002 年。
De Raedt (2008) 德·雷特（2008 年） Luc De Raedt. Logical and relational learning. Springer Science & Business Media, 2008.
Luc De Raedt。逻辑和关系学习。斯普林格科学与商业媒体，2008 年。
Džeroski (2009) Džeroski（2009 年） Sašo Džeroski. Relational data mining. In Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer, 2009.
Sašo Džeroski。关系数据挖掘。在《数据挖掘与知识发现手册》中，第 887-911 页。Springer，2009 年。
Erk (2009) Erk（2009） Katrin Erk. Representing words as regions in vector space. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, pp. 57–65. Annual Meeting of the Association for Computational Linguistics (ACL), 2009.
Katrin Erk。将单词表示为向量空间中的区域。在《第十三届计算自然语言学会议论文集》中，第 57-65 页。计算语言学协会（ACL）年会，2009 年。
Guu et al. (2015) Guu 等人（2015 年） Kelvin Guu, John Miller, and Percy Liang. Traversing knowledge graphs in vector space. In Empirical Methods in Natural Language Processing (EMNLP), pp. 318–327, 2015.
Kelvin Guu，John Miller 和 Percy Liang。在向量空间中遍历知识图谱。在自然语言处理的经验方法（EMNLP）中，第 318-327 页，2015 年。
Hamilton et al. (2018) 汉密尔顿等人（2018 年） Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, and Jure Leskovec. Embedding logical queries on knowledge graphs. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2027–2038, 2018.
威尔·汉密尔顿，帕亚尔·巴贾，玛琳卡·齐特尼克，丹·朱拉夫斯基和尤尔·莱斯科维奇。在神经信息处理系统（NeurIPS）中嵌入逻辑查询的知识图。2018 年，第 2027-2038 页。
He et al. (2015) 他等人（2015 年） Shizhu He, Kang Liu, Guoliang Ji, and Jun Zhao. Learning to represent knowledge graphs with gaussian embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 623–632. ACM, 2015.
何世柱，刘康，季国良和赵军。学习用高斯嵌入表示知识图谱。在第 24 届 ACM 国际信息与知识管理会议论文集上，第 623-632 页。ACM，2015 年。
Indyk & Motwani (1998)
Indyk 和 Motwani（1998 年） Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 604–613. ACM, 1998.
Piotr Indyk 和 Rajeev Motwani。近似最近邻：消除维度诅咒的方法。在第 30 届 ACM 理论计算机科学年会论文集上，第 604-613 页。ACM，1998 年。
Kingma & Ba (2015)
金马和巴（2015 年） Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
迪德里克·皮·金马和吉米·巴。Adam：一种随机优化方法。在国际学习表示会议（ICLR）上，2015 年。
Koller et al. (2007) 科勒等人（2007 年） Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, David Heckerman, Chris Meek, et al. Introduction to statistical relational learning. MIT press, 2007.
Daphne Koller，Nir Friedman，Sašo Džeroski，Charles Sutton，Andrew McCallum，Avi Pfeffer，Pieter Abbeel，Ming-Fai Wong，David Heckerman，Chris Meek，等。统计关系学导论。MIT 出版社，2007 年。
Krompaß et al. (2014) Krompaß等人（2014 年） Denis Krompaß, Maximilian Nickel, and Volker Tresp. Querying factorized probabilistic triple databases. In International Semantic Web Conference, pp. 114–129. Springer, 2014.
Denis Krompaß，Maximilian Nickel 和 Volker Tresp。查询分解的概率三元数据库。在国际语义网会议上，第 114-129 页。Springer，2014 年。
Lai & Hockenmaier (2017)
莱和霍肯迈尔（2017 年） Alice Lai and Julia Hockenmaier. Learning to predict denotational probabilities for modeling entailment. In Annual Meeting of the Association for Computational Linguistics (ACL), pp. 721–730, 2017.
Alice Lai 和 Julia Hockenmaier。学习预测用于建模蕴涵的指示概率。在计算语言学协会年会（ACL）上，第 721-730 页，2017 年。
Li et al. (2017) Li 等人（2017 年） Xiang Li, Luke Vilnis, and Andrew McCallum. Improved representation learning for predicting commonsense ontologies. arXiv preprint arXiv:1708.00549, 2017.
Xiang Li，Luke Vilnis 和 Andrew McCallum。改进的表示学习用于预测常识本体。arXiv 预印本 arXiv:1708.00549，2017 年。
Li et al. (2019) Li 等人（2019 年） Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, and Andrew McCallum. Smoothing the geometry of probabilistic box embeddings. In International Conference on Learning Representations (ICLR), 2019.
Xiang Li，Luke Vilnis，Dongxu Zhang，Michael Boratko 和 Andrew McCallum。平滑概率盒嵌入的几何形状。在 2019 年国际学习表示（ICLR）会议上。
Mikolov et al. (2013) Mikolov 等人（2013 年） Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations (ICLR), 2013.
Tomas Mikolov，Kai Chen，Greg Corrado 和 Jeffrey Dean。在向量空间中高效估计词表示。在国际学习表示会议（ICLR）上，2013 年。
Nickel et al. (2016) Nickel et al. (2016) Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2016.
Maximilian Nickel，Kevin Murphy，Volker Tresp 和 Evgeniy Gabrilovich。关于知识图谱的关系机器学习综述。IEEE 会议论文集，104（1）：11-33，2016 年。
Sun et al. (2019) Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations (ICLR), 2019.
孙志庆，邓志宏，聂建云和唐健。Rotate：复杂空间中的关系旋转知识图嵌入。在国际学习表示（ICLR）会议上，2019 年。
Toutanova & Chen (2015)
Toutanova 和 Chen（2015） Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66, 2015.
Kristina Toutanova 和 Danqi Chen。知识库和文本推理的观察特征与潜在特征。在第三届连续向量空间模型及其组合性研讨会论文集中，第 57-66 页，2015 年。
Vapnik (2013) 瓦普尼克（2013） Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
弗拉基米尔·瓦普尼克。统计学习理论的本质。Springer 科学与商业媒体，2013 年。
Vendrov et al. (2016) Vendrov 等人（2016 年） Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. Order-embeddings of images and language. In International Conference on Learning Representations (ICLR), 2016.
Ivan Vendrov，Ryan Kiros，Sanja Fidler 和 Raquel Urtasun。图像和语言的顺序嵌入。在 2016 年国际学习表示会议（ICLR）上。
Venn (1880) 文（1880 年） John Venn. I. on the diagrammatic and mechanical representation of propositions and reasonings. The London, Edinburgh, and Dublin philosophical magazine and journal of science, 10(59):1–18, 1880.
John Venn。关于命题和推理的图示和机械表示。伦敦、爱丁堡和都柏林哲学杂志和科学期刊，10（59）：1-18，1880 年。
Vilnis & McCallum (2014)
Vilnis 和 McCallum（2014 年） Luke Vilnis and Andrew McCallum. Word representations via gaussian embedding. In International Conference on Learning Representations (ICLR), 2014.
Luke Vilnis 和 Andrew McCallum。通过高斯嵌入的词表示。在国际学习表示会议（ICLR）上，2014 年。
Vilnis et al. (2018) Vilnis 等人（2018 年） Luke Vilnis, Xiang Li, Shikhar Murty, and Andrew McCallum. Probabilistic embedding of knowledge graphs with box lattice measures. In Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
Luke Vilnis，Xiang Li，Shikhar Murty 和 Andrew McCallum。用盒子格度量概率嵌入知识图。在计算语言学协会年会（ACL）上，2018 年。
Xiong et al. (2017) 熊等人（2017 年） Wenhan Xiong, Thien Hoang, and William Yang Wang. Deeppath: A reinforcement learning method for knowledge graph reasoning. In Empirical Methods in Natural Language Processing (EMNLP), 2017.
熊文瀚，黄天，和威廉·杨·王。Deeppath：一种用于知识图谱推理的强化学习方法。在自然语言处理的经验方法（EMNLP）中，2017 年。
Zaheer et al. (2017) 扎希尔等人（2017 年） Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3391–3401, 2017.
Manzil Zaheer，Satwik Kottur，Siamak Ravanbakhsh，Barnabas Poczos，Ruslan R Salakhutdinov 和 Alexander J Smola。深度集合。在神经信息处理系统（NeurIPS）的进展中，第 3391-3401 页，2017 年。

Appendix A Proof of Theorem 1
附录 A 定理 1 的证明

Proof.

To model any EPFO query, we need to at least model a subset of EPFO queries $\mathcal{Q}=\{\vee_{{}_{q_{i}\in S}}q_{i}:S\subseteq\{q_{1},\ldots,q_{M}\}\}$ , where the corresponding denotation sets are $\{\cup_{{}_{q_{i}\in S}}\llbracket q_{i}\rrbracket:S\subseteq\{q_{1},\ldots,q_{M}\}\}$ . For the sake of modeling $\mathcal{Q}$ , without loss of generality, we consider assigning a single entity embedding $\mathbf{v_{q_{i}}}$ to all $v\in\llbracket q_{i}\rrbracket$ , so there are $M$ kinds of entity vectors, $\mathbf{v_{q_{1}}},\ldots,\mathbf{v_{q_{M}}}$ . To model all queries in $\mathcal{Q}$ , it is necessary to satisfy the following.

\displaystyle\exists\mathbf{v_{q_{1}}},\ldots,\exists\mathbf{v_{q_{M}}},\forall S\subseteq\{q_{1},\ldots,q_{M}\},\exists\mathbf{q_{S}}\in\Xi,\text{such that }{\rm dist}(\mathbf{v_{q_{i}}};\mathbf{q_{S}})\begin{cases}\leq\beta\ \ \text{ if $q_{i}\in S$,}\\ >\beta\ \ \text{ if $q_{i}\notin S$}.\end{cases}

(7)

where $\mathbf{q_{S}}$ is the embedding of query $\vee_{{}_{q_{i}\in S}}q_{i}$ . Eq. 7 means that we can learn the $M$ kinds of entity vectors such that for every query in $\mathcal{Q}$ , we can obtain its embedding to model the corresponding set using the distance function. Notice that this is agnostic to the specific algorithm to embed query $\vee_{{}_{q\in S}}q$ into $\mathbf{q_{S}}$ ; thus, our result is generally applicable to any method that embeds the query into a single vector.
其中 $\mathbf{q_{S}}$ 是查询 $\vee_{{}_{q_{i}\in S}}q_{i}$ 的嵌入。方程 7 意味着我们可以学习 $M$ 种实体向量，以便对 $\mathcal{Q}$ 中的每个查询，我们可以使用距离函数获取其嵌入以建模相应的集合。请注意，这与将查询 $\vee_{{}_{q\in S}}q$ 嵌入 $\mathbf{q_{S}}$ 的具体算法无关；因此，我们的结果通常适用于将查询嵌入为单个向量的任何方法。

证明。为了模拟任何 EPFO 查询，我们至少需要模拟 EPFO 查询的一个子集

\mathcal{Q}=\{\vee_{{}_{q_{i}\in S}}q_{i}:S\subseteq\{q_{1},\ldots,q_{M}\}\}

，其中相应的指示集为

\{\cup_{{}_{q_{i}\in S}}\llbracket q_{i}\rrbracket:S\subseteq\{q_{1},\ldots,q_{M}\}\}

。为了模拟

\mathcal{Q}

，不失一般性，我们考虑将单个实体嵌入

\mathbf{v_{q_{i}}}

分配给所有

v\in\llbracket q_{i}\rrbracket

，因此有

M

种实体向量

\mathbf{v_{q_{1}}},\ldots,\mathbf{v_{q_{M}}}

。为了模拟

\mathcal{Q}

中的所有查询，需要满足以下条件。

Crucially, satisfying Eq. 7 is equivalent to $\{{\rm sign}(\beta-{\rm dist}(\cdot;\mathbf{q})):\mathbf{q}\in\Xi\}$ being able to shutter $\{\mathbf{v_{q_{1}}},\ldots,\mathbf{v_{q_{M}}}\}$ , i.e., any binary labeling of the points can be perfectly fit by some classifier in the function class. To sum up, in order to model any EPFO query, we need to at least model any query in $\mathcal{Q}$ , which requires the VC dimension of the distance function to be larger than or equal to $M$ . ∎
关键是满足方程 7 等价于 $\{{\rm sign}(\beta-{\rm dist}(\cdot;\mathbf{q})):\mathbf{q}\in\Xi\}$ 能够关闭 $\{\mathbf{v_{q_{1}}},\ldots,\mathbf{v_{q_{M}}}\}$ ，即点的任何二进制标记都可以被函数类中的某个分类器完美拟合。总之，为了建模任何 EPFO 查询，我们至少需要建模 $\mathcal{Q}$ 中的任何查询，这要求距离函数的 VC 维度大于或等于 $M$ 。∎

Appendix B Details about Computing $M$ in Theorem 1
附录 B：关于在定理 1 中计算 $M$ 的详细信息

Given the full KG $\mathcal{G}_{\text{test}}$ for the FB15k dataset, our goal is to find conjunctive queries $q_{1},\ldots,q_{M}$ such that $\llbracket q_{1}\rrbracket,\ldots,\llbracket q_{M}\rrbracket$ are disjoint with each other. For conjunctive queries, we use two types of queries: ‘1p’ and ‘2i’ whose query structures are shown in Figure 4. On the FB15k, we instantiate 308,006 queries of type ‘1p’, which we denote by $S_{\rm 1p}$ . Out of all the queries in $S_{\rm 1p}$ , 129,717 queries have more than one answer entities, and we denote such a set of the queries by $S^{\prime}_{\rm 1p}$ . We then generate a set of queries of type ‘2i’ by first randomly sampling two queries from $S^{\prime}_{\rm 1p}$ and then taking conjunction; we denote the resulting set of queries by $S_{\rm 2i}$ .
鉴于 FB15k 数据集的完整 KG $\mathcal{G}_{\text{test}}$ ，我们的目标是找到互不相交的合取查询 $q_{1},\ldots,q_{M}$ 。对于合取查询，我们使用两种类型的查询：“1p”和“2i”，其查询结构如图 4 所示。在 FB15k 上，我们实例化了 308,006 个“1p”类型的查询，我们用 $S_{\rm 1p}$ 表示。在 $S_{\rm 1p}$ 中的所有查询中，有 129,717 个查询具有多个答案实体，我们用 $S^{\prime}_{\rm 1p}$ 表示这样的一组查询。然后，我们通过从 $S^{\prime}_{\rm 1p}$ 中随机抽取两个查询并进行合取来生成一组“2i”类型的查询；我们用 $S_{\rm 2i}$ 表示生成的查询集合。

Now, we use $S_{\rm 1p}$ and $S_{\rm 2i}$ to generate a set of conjunctive queries whose denotation sets are disjoint with each other. First, we prepare two empty sets $\mathcal{V}_{\rm seen}=\varnothing$ , and $\mathcal{Q}=\varnothing$ . Then, for every $q\in S_{\rm 1p}$ , if $\mathcal{V}_{\rm seen}\cap\llbracket q\rrbracket=\varnothing$ holds, we let $\mathcal{Q}\leftarrow\mathcal{Q}\cup\{q\}$ and $\mathcal{V}_{\rm seen}\leftarrow\mathcal{V}_{\rm seen}\cup\llbracket q\rrbracket$ . This procedure already gives us $\mathcal{Q}$ , where we have $10,812$ conjunctive queries whose denotation sets are disjoint with each other. We can further apply the analogous procedure for $S_{\rm 2i}$ , which gives us a further increased $\mathcal{Q}$ , where we have $13,365$ conjunctive queries whose denotation sets are disjoint with each other. Therefore, we get $M=13,365$ .
现在，我们使用 $S_{\rm 1p}$ 和 $S_{\rm 2i}$ 生成一组共同查询，其指示集彼此不相交。首先，我们准备两个空集 $\mathcal{V}_{\rm seen}=\varnothing$ 和 $\mathcal{Q}=\varnothing$ 。然后，对于每个 $q\in S_{\rm 1p}$ ，如果 $\mathcal{V}_{\rm seen}\cap\llbracket q\rrbracket=\varnothing$ 成立，我们让 $\mathcal{Q}\leftarrow\mathcal{Q}\cup\{q\}$ 和 $\mathcal{V}_{\rm seen}\leftarrow\mathcal{V}_{\rm seen}\cup\llbracket q\rrbracket$ 。这个过程已经给我们 $\mathcal{Q}$ ，其中我们有 $10,812$ 个指示集彼此不相交的共同查询。我们可以进一步对 $S_{\rm 2i}$ 应用类似的过程，这给我们带来了进一步增加的 $\mathcal{Q}$ ，其中我们有 $13,365$ 个指示集彼此不相交的共同查询。因此，我们得到 $M=13,365$ 。

Appendix C Experiments on Link Prediction
附录 C 链接预测实验

	FB15k		FB15k-237		NELL995
Method	H@3	MRR	H@3	MRR	H@3	MRR
query2box	0.613	0.516	0.331	0.295	0.382	0.303
query2box-1p	0.633	0.531	0.323	0.292	0.415	0.320
TransE	0.611	0.522	0.318	0.289	0.413	0.320

Table 4: Performance comparison on the simple link prediction task on the three datasets.
表 4：在三个数据集上进行简单链接预测任务的性能比较。

In Table 4, we report the link prediction performance (no multi-hop logical reasoning required) following the conventional metrics (taking average over the triples of head, relation, and tail). Here query2box is trained on all five query structures as shown in Figure 4, and query2box-1p is only trained on simple 1p queries. We found that our query2box is comparable or slightly better than TransE on simple link prediction. Note that in the case of simple link prediction, we do not expect a huge performance gain by using box embeddings as link prediction does not involve logical reasoning nor handling a large set of answer entities. Also, we see that even if we train query2box over diverse queries, its performance on link prediction is still comparable to TransE and query2box-1p, which are trained solely on the link prediction task.
在表 4 中，我们报告了链接预测性能（不需要多跳逻辑推理）按照传统指标（对头部、关系和尾部的三元组取平均）计算。这里，query2box 是在图 4 中显示的所有五个查询结构上进行训练的，而 query2box-1p 仅在简单的 1p 查询上进行训练。我们发现，我们的 query2box 在简单链接预测上与 TransE 相当或稍好。需要注意的是，在简单链接预测的情况下，我们不指望通过使用盒子嵌入来获得巨大的性能提升，因为链接预测不涉及逻辑推理或处理大量的答案实体。此外，我们发现，即使我们对 query2box 进行多样化查询的训练，它在链接预测上的性能仍然与仅在链接预测任务上进行训练的 TransE 和 query2box-1p 相当。

Appendix D Details on Query Generation
附录 D 查询生成的详细信息

Given $\mathcal{G}_{\text{train}}$ , $\mathcal{G}_{\text{valid}}$ , and $\mathcal{G}_{\text{test}}$ as defined in Section 4.1, we generate training, validation and test queries of different query structures. During training, we consider the first 5 kinds of query structures. For evaluation, we consider all the 9 query structures in Fig. 4, containing query structures that are both seen and unseen during training time. We instantiate queries in the following way.
在第 4.1 节中定义了 $\mathcal{G}_{\text{train}}$ ， $\mathcal{G}_{\text{valid}}$ 和 $\mathcal{G}_{\text{test}}$ ，我们生成了不同查询结构的训练、验证和测试查询。在训练过程中，我们考虑前 5 种查询结构。对于评估，我们考虑图 4 中的所有 9 种查询结构，其中包含在训练过程中既见过又未见过的查询结构。我们以以下方式实例化查询。

Given a KG and a query structure (which is a DAG), we use pre-order traversal to assign an entity and a relation to each node and edge in the DAG of query structure to instantiate a query. Namely, we start from the root of the DAG (which is the target node), we sample an entity $e$ uniformly from the KG to be the root, then for every node connected to the root in the DAG, we choose a relation $r$ uniformly from the in-coming relations of $e$ in the KG, and a new entity $e^{\prime}$ from the set of entities that reaches $e$ by $r$ in the KG. Then we assign the relation $r$ to the edge and $e^{\prime}$ to the node, and move on the process based on the pre-order traversal. This iterative process stops after we assign an entity and relation to every node and edge in DAG. The leaf nodes in the DAG serve as the anchor nodes. Note that during the entity and relation assignment, we specifically filter out all the degenerated queries, as shown in Fig. 5. Then we perform a post-order traversal of the DAG on the KG, starting from the anchor nodes, to obtain a set of answer entities to this query.
给定一个知识图谱和一个查询结构（即一个有向无环图），我们使用前序遍历为查询结构中的每个节点和边分配一个实体和关系，以实例化一个查询。也就是说，我们从图的根节点（即目标节点）开始，从知识图谱中均匀采样一个实体 $e$ 作为根节点，然后对于与根节点在图中相连的每个节点，我们从知识图谱中 $e$ 的入边关系中均匀选择一个关系 $r$ ，并从能够通过知识图谱中的 $r$ 到达 $e$ 的实体集合中选择一个新实体 $e^{\prime}$ 。然后我们将关系 $r$ 分配给边，将 $e^{\prime}$ 分配给节点，并根据前序遍历继续进行该过程。这个迭代过程在为图中的每个节点和边分配实体和关系后停止。图中的叶节点作为锚节点。请注意，在实体和关系分配过程中，我们特别过滤掉所有退化的查询，如图 5 所示。然后，我们从锚节点开始，在知识图谱上进行后序遍历，以获取这个查询的一组答案实体。

When generating validation/test queries, we explicitly filter out trivial queries that can be fully answered by subgraph matching on $\mathcal{G}_{\text{train}}$ / $\mathcal{G}_{\text{valid}}$ .
在生成验证/测试查询时，我们明确地过滤掉可以通过子图匹配完全回答的琐碎查询 $\mathcal{G}_{\text{train}}$ / $\mathcal{G}_{\text{valid}}$ 。

Appendix E Details of NELL995 Dataset
附录 E NELL995 数据集的详细信息

Here we detail our pre-processing of the NELL995 dataset, which is originally presented by Xiong et al. (2017). Following Allen et al. (2019), we first combine the validation and test sets with the training set to create the whole knowledge graph for NELL995. Then we create new validation and test set splits by randomly selecting 20,000 triples each from the whole knowledge graph. Note that we filter out all the entities that only appear in the validation and test sets but not in the training set.
这里我们详细介绍了我们对 NELL995 数据集的预处理，该数据集最初由 Xiong 等人（2017 年）提出。按照 Allen 等人（2019 年）的方法，我们首先将验证集和测试集与训练集合并，以创建 NELL995 的完整知识图。然后，我们通过从整个知识图中随机选择每个集合中的 20,000 个三元组来创建新的验证集和测试集。请注意，我们过滤掉只出现在验证集和测试集中而不出现在训练集中的所有实体。

Appendix F Dataset Statistics
附录 F 数据集统计

Table 5 summarizes the basic statistics of the three datasets used in our experiments. Table 6 summarizes the basic statistics of the generated logical queries.
表 5 总结了我们实验中使用的三个数据集的基本统计信息。表 6 总结了生成的逻辑查询的基本统计信息。

Dataset	Entities	Relations	Training Edges 训练边缘	Validation Edges 验证边缘	Test Edges	Total Edges
FB15k	14,951	1,345	483,142	50,000	59,071	592,213
FB15k-237	14,505	237	272,115	17,526	20,438	310,079
NELL995	63,361	200	114,213	14,324	14,267	142,804

Table 5: Knowledge graph dataset statistics as well as the split into training, validation, and test sets.
表 5：知识图谱数据集统计以及训练、验证和测试集的划分。

Queries	Training		Validation		Test
Dataset	1p	others	1p	others	1p	others
FB15k	273,710	273,710	59,097	8,000	67,016	8,000
FB15k-237	149,689	149,689	20,101	5,000	22,812	5,000
NELL995	107,982	107,982	16,927	4,000	17,034	4,000

Table 6: Number of training, validation, and test queries generated for different query structures.
表 6：不同查询结构生成的训练、验证和测试查询数量。

Appendix G Hyper-parameters
附录 G 超参数

We use embedding dimensionality of $d=400$ and set $\gamma=24$ , $\alpha=0.2$ for the loss in Eq. 4. We train all types of training queries jointly. In every iteration, we sample a minibatch size of 512 queries for each query structure (details in Appendix D), and we sample 1 answer entity and 128 negative entities for each query. We optimize the loss in Eq. 4 using Adam Optimizer (Kingma & Ba, 2015) with learning rate = 0.0001. We train all models for 250 epochs, monitor the performance on the validation set, and report the test performance.
我们使用嵌入维度 $d=400$ ，并在公式 4 中设置 $\gamma=24$ ， $\alpha=0.2$ 作为损失函数。我们同时训练所有类型的训练查询。在每次迭代中，我们为每个查询结构（详见附录 D）随机抽样 512 个查询的小批量大小，并为每个查询抽样 1 个答案实体和 128 个负面实体。我们使用 Adam 优化器（Kingma＆Ba，2015）优化公式 4 中的损失，学习率为 0.0001。我们将所有模型训练 250 个时期，在验证集上监控性能，并报告测试性能。

Appendix H Analysis of Learned Box Offset size
附录 H 学习到的框偏移大小的分析

Here we study the correlation between the box size (measured by the L1 norm of the box offset) and the average number of entities that are contained in 1p queries using the corresponding relation. Table 7 shows the top 10 relations with smallest/largest box sizes. We observe a clear trend that the size of the box has a strong correlation with the number of entities the box encloses. Specifically, we see that one-to-many relations tend to have larger offset embeddings, which demonstrates that larger boxes are indeed used to model sets of more points (entities).
在这里，我们研究了框大小（通过框偏移的 L1 范数测量）与使用相应关系的 1p 查询中包含的实体的平均数量之间的相关性。表 7 显示了具有最小/最大框大小的前 10 个关系。我们观察到框的大小与框包含的实体数量之间存在明显的趋势。具体而言，我们发现一对多关系倾向于具有较大的偏移嵌入，这表明较大的框确实用于建模更多点（实体）的集合。

Top 10 relations with smallest box size 最小盒子尺寸的前 10 个关系	#Ent	Box size	Top 10 relations with largest box size 最大箱子尺寸的前 10 个关系	#Ent	Box size
/architecture/…/owner /architecture/…/所有者	1.0	2.3	/common/…/topic /common/…/主题	3616.0	147.0
/base/…/dog_breeds /base/…/狗品种	2.0	4.0	/user/…taxonomy /user/…分类法	1.0	137.2
/education/…/campuses /education/…/校园	1.0	4.3	/common/…/category /common/…/类别	1.3	125.6
/education/…/educational_institution /education/…/教育机构	1.0	4.6	/base/…/administrative_area_type /base/…/行政区域类型	1.0	123.6
/base/…/collective /base/…/集体	1.0	5.1	/medicine/…/legal_status /medicine/…/法律地位	1.5	114.9
/base/…/member /base/…/成员	1.0	5.1	/people/…/spouse /people/…/配偶	889.8	114.3
/people/…/appointed_by /people/…/任命者	1.0	5.2	/sports/…/team /sports/…/团队	397.9	113.9
/base/…/fashion_models_with_this_hair_color /基础/…/拥有这种发色的时尚模特	2.0	5.2	/people/…/location_of_ceremony /people/.../婚礼地点	132.0	108.4
/fictional_universe/…/parents /fictional_universe/.../父母	1.0	5.5	/sports/…/team /sports/.../队伍	83.1	104.5
/american_football/…/team /美式橄榄球/…/球队	2.0	6.7	/user/…/subject /用户/…/主题	495.0	104.2

Table 7: Top 10 relations with smallest/largest box size in FB15k.
表 7：FB15k 中具有最小/最大盒子大小的前 10 个关系

Appendix I MRR Results 附录 IMRR 结果

Method	Avg	1p	2p	3p	2i	3i	ip	pi	2u	up
FB15k
q2b	0.41	0.654	0.373	0.274	0.488	0.602	0.194	0.339	0.468	0.301
gqe	0.328	0.505	0.320	0.218	0.439	0.536	0.139	0.272	0.3	0.244
gqe-double	0.326	0.49	0.3	0.222	0.438	0.532	0.142	0.28	0.285	0.242
FB15k-237
q2b	0.235	0.4	0.225	0.173	0.275	0.378	0.105	0.18	0.198	0.178
gqe	0.203	0.346	0.193	0.145	0.25	0.355	0.086	0.156	0.145	0.151
gqe-double	0.205	0.346	0.191	0.144	0.258	0.361	0.087	0.164	0.144	0.149
NELL995
q2b	0.254	0.413	0.227	0.208	0.288	0.414	0.125	0.193	0.266	0.155
gqe	0.21	0.311	0.193	0.175	0.273	0.399	0.078	0.168	0.159	0.13
gqe-double	0.211	0.309	0.192	0.174	0.275	0.408	0.08	0.17	0.156	0.129

Table 8: MRR results of query2box vs. gqe on FB15k, FB15k-237 and NELL995.
表 8：在 FB15k、FB15k-237 和 NELL995 上，查询 2 盒与 gqe 的 MRR 结果。

Method	Avg	1p	2p	3p	2i	3i	ip	pi	2u	up
FB15k
q2b	0.41	0.654	0.373	0.274	0.488	0.602	0.194	0.339	0.468	0.301
q2b-avg	0.396	0.648	0.368	0.27	0.476	0.564	0.182	0.295	0.465	0.3
q2b-deepsets	0.402	0.631	0.371	0.269	0.499	0.605	0.181	0.325	0.437	0.298
q2b-avg-1p	0.324	0.688	0.236	0.159	0.378	0.435	0.122	0.225	0.498	0.178
q2b-sharedoffset	0.296	0.511	0.273	0.199	0.351	0.444	0.132	0.233	0.311	0.213
FB15k-237
q2b	0.235	0.4	0.225	0.173	0.275	0.378	0.105	0.18	0.198	0.178
q2b-avg	0.219	0.398	0.222	0.171	0.236	0.328	0.1	0.145	0.193	0.177
q2b-deepsets	0.23	0.395	0.224	0.172	0.264	0.372	0.101	0.168	0.194	0.176
q2b-avg-1p	0.196	0.41	0.18	0.122	0.217	0.274	0.085	0.127	0.209	0.145
q2b-sharedoffset	0.18	0.328	0.18	0.131	0.207	0.289	0.083	0.136	0.135	0.132
NELL995
q2b	0.254	0.413	0.227	0.208	0.288	0.414	0.125	0.193	0.266	0.155
q2b-avg	0.235	0.406	0.219	0.2	0.251	0.342	0.114	0.174	0.259	0.149
q2b-deepsets	0.246	0.405	0.226	0.207	0.275	0.403	0.107	0.182	0.256	0.153
q2b-avg-1p	0.227	0.468	0.191	0.16	0.234	0.275	0.094	0.162	0.332	0.125
q2b-sharedoffset	0.196	0.318	0.187	0.172	0.228	0.312	0.098	0.156	0.169	0.127

Table 9: MRR results of query2box vs. several variants on FB15k, FB15k-237 and NELL995.
表 9：在 FB15k、FB15k-237 和 NELL995 上，查询 2 盒与几个变体的 MRR 结果。

Query2box: Reasoning over Knowledge Graphs in Vector Space using Box EmbeddingsQuery2box：使用盒子嵌入在向量空间中对知识图谱进行推理

Abstract 摘要

1 Introduction 1 介绍

2 Further Related Work 2 进一步相关工作

3 Query2Box: Logical Reasoning over KGs in Vector Space3Query2Box：在向量空间中对知识图谱进行逻辑推理

3.1 Knowledge Graphs and Conjunctive Queries3.1 知识图谱和合取查询

3.2 Reasoning over Sets of Entities Using Box Embeddings3.2 使用盒子嵌入对实体集进行推理

3.3 Tractable Handling of Disjunction Using Disjunctive Normal Form3.3 使用析取范式处理可解的析取