这是用户在 2024-9-18 20:12 为 https://app.immersivetranslate.com/pdf-pro/33158551-ea47-43a4-80fc-c3341198bfcc 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Semantic Search on Text and Knowledge Bases
文本和知识库语义搜索

Hannah Bast 汉娜-百斯University of Freiburg 弗莱堡大学bast@cs.uni-freiburg.deBjörn Buchhold 比约恩-布赫霍尔德University of Freiburg 弗莱堡大学buchhold@cs.uni-freiburg.deElmar Haussmann 埃尔马-奥斯曼University of Freiburg 弗莱堡大学haussmann@cs.uni-freiburg.de

Published, sold and distributed by: now Publishers Inc.
出版、销售和发行方:now Publishers Inc.

PO Box 1024 邮政信箱 1024
Hanover, MA 02339 马萨诸塞州汉诺威 02339
United States 美国
Tel. +1 -781-985-4510 Tel.+1 -781-985-4510
www.nowpublishers.com
sales@nowpublishers.com
Outside North America: 北美以外地区
now Publishers Inc.
PO Box 179 邮政信箱 179
2600 AD Delft 公元 2600 年代尔夫特
The Netherlands 荷兰
Tel. + 31 6 51115274 + 31 6 51115274 +31-6-51115274+31-6-51115274 电话 + 31 6 51115274 + 31 6 51115274 +31-6-51115274+31-6-51115274
The preferred citation for this publication is
该出版物的首选引文是

H. Bast, B. Buchhold, E. Haussmann. Semantic Search on Text and Knowledge Bases. Foundations and Trends ® ® ^(®){ }^{\circledR} in Information Retrieval, vol. 10, no. 2-3, pp. 119-271, 2016.
H.Bast, B. Buchhold, E. Haussmann.文本和知识库上的语义搜索。信息检索的基础与趋势》,第 10 卷,第 2-3 期,第 119-271 页,2016 年。

This Foundations and Trends ® ® ^(®){ }^{\circledR} issue was typeset in A T E X A T E X ^(A)T_(E)X{ }^{A} T_{E} X using a class file designed by Neal Parikh. Printed on acid-free paper.
本期《基础与趋势》 ® ® ^(®){ }^{\circledR} 采用尼尔-帕里克(Neal Parikh)设计的类文件以 A T E X A T E X ^(A)T_(E)X{ }^{A} T_{E} X 排版。采用无酸纸印刷。

ISBN: 978-1-68083-165-8 国际标准书号:978-1-68083-165-8
© 2016 H. Bast, B. Buchhold, E. Haussmann
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers.
保留所有权利。未经出版商事先书面许可,不得以任何形式或手段(机械、影印、录制或其他方式)复制、在检索系统中存储或传播本出版物的任何部分。

Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com
复印。在美国:本刊已在版权清算中心(CCC)注册,地址为 222 Rosewood Drive, Danvers, MA 01923。Now Publishers Inc 授权在版权清算中心 (CCC) 注册的用户为内部或个人使用,或为特定客户的内部或个人使用而进行影印。为用户提供的 "服务 "可在互联网上找到:www.copyright.com

For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copying, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to photocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1781871 0245; www.nowpublishers.com; sales@nowpublishers.com
对于那些获得影印许可的组织,已安排了一个单独的付费系统。授权不包括其他形式的复制,如用于一般发行、广告或促销目的、创作新的集体作品或转售。在世界其他地区:影印必须获得版权所有者的许可。请向 now Publishers Inc.+1781871 0245; www.nowpublishers.com; sales@nowpublishers.com

now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail: sales@nowpublishers.com
now Publishers Inc. 拥有在全球范围内出版本资料的独家许可。使用这些内容必须获得版权许可持有人的许可。请向 now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; 电子邮件:sales@nowpublishers.com; 电子邮件:sales@nowpublishers.com

Editors-in-Chief 主编

Douglas W. Oard
University of Maryland 马里兰大学
United States 美国
Maarten de Rijke
University of Amsterdam 阿姆斯特丹大学
The Netherlands 荷兰

Editors 编辑

Ben Carterette 本-卡特莱特
University of Delaware 特拉华大学
Charles L.A. Clarke 查尔斯-克拉克
University of Waterloo 滑铁卢大学
ChengXiang Zhai
UIUC
Diane Kelly 黛安-凯利
University of North Carolina
北卡罗来纳大学

Fabrizio Sebastiani 法布里齐奥-塞巴斯蒂亚尼
Qatar Computing Research Institute
卡塔尔计算机研究所

Ian Ruthven 伊恩-鲁斯文
University of Strathclyde
斯特拉思克莱德大学

Ian Ruthven 伊恩-鲁斯文
University of Amsterdam 阿姆斯特丹大学
James Allan 詹姆斯-艾伦
University of Massachusetts, Amherst
马萨诸塞大学阿默斯特分校

Jamie Callan 杰米-卡伦
Carnegie Mellon University
卡内基梅隆大学

Jian-Yun Nie 聂建云
University of Montreal 蒙特利尔大学
Mark Sanderson 马克-桑德森
Royal Melbourne Institute of Technology
墨尔本皇家理工学院

Australia 澳大利亚
Jimmy Lin
University of Maryland 马里兰大学
Leif Azzopardi 莱夫-阿佐帕尔迪
University of Glasgow 格拉斯哥大学
Luo Si 罗思
Purdue University 普渡大学
Marie-Francine Moens 玛丽-弗朗辛-莫恩斯
Catholic University of Leuven
鲁汶天主教大学

Mark D. Smucker 马克-D-斯马克
University of Waterloo 滑铁卢大学
Rodrygo Luis Teodoro Santos
罗德里戈-路易斯-特奥多罗-桑托斯

Federal University of Minas Gerais
米纳斯吉拉斯联邦大学

Ryen White 莱恩-怀特
Microsoft Research 微软研究院
Soumen Chakrabarti 苏曼-查克拉巴蒂
Indian Institute of Technology Bombay
孟买印度理工学院

Tat-Seng Chua
National University of Singapore
新加坡国立大学

William W. Cohen 威廉-科恩
Carnegie Mellon University
卡内基梅隆大学

Editorial Scope 编辑范围

Topics 主题

Foundations and Trends ® ® ^(®){ }^{\circledR} in Information Retrieval publishes survey and tutorial articles in the following topics:
《信息检索的基础与趋势》 ® ® ^(®){ }^{\circledR} 出版以下主题的调查和辅导文章:
  • Applications of IR 红外技术的应用
  • Architectures for IR 红外架构
  • Collaborative filtering and recommender systems
    协同过滤和推荐系统
  • Cross-lingual and multilingual IR
    跨语言和多语言 IR
  • Distributed IR and federated search
    分布式 IR 和联合搜索
  • Evaluation issues and test collections for IR
    IR 的评估问题和测试集
  • Formal models and language models for IR
    IR 的形式模型和语言模型
  • IR on mobile platforms 移动平台上的 IR
  • Indexing and retrieval of structured documents
    结构化文件的索引和检索
  • Information categorization and clustering
    信息分类和聚类
  • Information extraction 信息提取
  • Information filtering and routing
    信息过滤和路由选择
  • Metasearch, rank aggregation, and data fusion
    元搜索、等级聚合和数据融合
  • Natural language processing for IR
    用于 IR 的自然语言处理
  • Performance issues for IR systems, including algorithms, data structures, optimization techniques, and scalability
    红外系统的性能问题,包括算法、数据结构、优化技术和可扩展性
  • Question answering 回答问题
  • Summarization of single documents, multiple documents, and corpora
    单个文件、多个文件和语料库的汇总
  • Text mining 文本挖掘
  • Topic detection and tracking
    主题检测和跟踪
  • Usability, interactivity, and visualization issues in IR
    投资者关系中的可用性、互动性和可视化问题
  • User modelling and user studies for IR
    IR 的用户建模和用户研究
  • Web search 网络搜索

Information for Librarians
为图书管理员提供的信息

Foundations and Trends ® ® ^(®){ }^{\circledR} in Information Retrieval, 2016, Volume 10, 5 issues. ISSN paper version 1554-0669. ISSN online version 1554-0677. Also available as a combined paper and online subscription.
《信息检索的基础与趋势》 ® ® ^(®){ }^{\circledR} ,2016年,第10卷,第5期。ISSN 纸质版 1554-0669.ISSN 在线版 1554-0677。也可以纸质版和在线版合并订阅。

Semantic Search on Text and Knowledge Bases
文本和知识库语义搜索

Hannah Bast 汉娜-百斯University of Freiburg 弗莱堡大学bast@cs.uni-freiburg.de

Björn Buchhold 比约恩-布赫霍尔德University of Freiburg 弗莱堡大学buchhold@cs.uni-freiburg.de

Elmar Haussmann 埃尔马-奥斯曼University of Freiburg 弗莱堡大学haussmann@cs.uni-freiburg.de

Contents 目录

1 Introduction … 2 1 引言 ... 2
1.1 Motivation for this Survey … 2
1.1 本次调查的动机...... 2

1.2 Scope of this Survey … 5
1.3 Overview of this Survey … 8
1.4 Glossary … 10 1.4 术语表 ... 10
2 Classification by Data Type and Search Paradigm … 13
2 按数据类型和搜索范式分类 ............................................................

2.1 Data Types and Common Datasets … 14
2.2 Search Paradigms … 25
2.3 Other Aspects … 30
2.3 其他方面 ... 30

3 Basic NLP Tasks in Semantic Search … 32
3 语义搜索中的基本 NLP 任务 ... 32

3.1 Part-of-Speech Tagging and Chunking … 33
3.2 Named-Entity Recognition and Disambiguation … 35
3.3 Sentence Parsing … 40
3.3 句子解析 ... 40

3.4 Word Vectors … 44
3.4 词向量...... 44

4 Approaches and Systems for Semantic Search … 49
4 语义搜索的方法和系统 ... 49

4.1 Keyword Search in Text … 50
4.1 文本中的关键词搜索 ... 50

4.2 Structured Search in Knowledge Bases … 55
4.2 知识库中的结构化搜索 ... 55

4.3 Structured Data Extraction from Text … 61
4.3 从文本中提取结构化数据 ... 61

4.4 Keyword Search on Knowledge Bases … 72
4.4 知识库中的关键词搜索 ... 72

4.5 Keyword Search on Combined Data … 78
4.6 Semi-Structured Search on Combined Data … 85
4.7 Question Answering on Text … 89
4.8 Question Answering on Knowledge Bases … 94
4.9 Question Answering on Combined Data … 102
4.9 综合数据的问题解答...... 102

5 Advanced Techniques used for Semantic Search … 109
5 用于语义搜索的高级技术 ... 109

5.1 Ranking … 109 5.1 排名 ... 109
5.2 Indexing … 116 5.2 索引 ... 116
5.3 Ontology Matching and Merging … 121
5.4 Inference … 125 5.4 推论...... 125
6 The Future of Semantic Search … 129
6 语义搜索的未来...... 129

6.1 The Present … 129
6.1 当前...... 129

6.2 The Near Future … 131
6.2 不久的将来...... 131

6.3 The Not So Near Future … 132
Acknowledgements … 135 致谢 ... 135
Appendices … 136 附录 ... 136
A Datasets … 136 数据集 ... 136
B Standards … 138 B 类标准 ... 138
References … 139 参考文献 ... 139

Abstract 摘要

This article provides a comprehensive overview of the broad area of semantic search on text and knowledge bases. In a nutshell, semantic search is “search with meaning”. This “meaning” can refer to various parts of the search process: understanding the query (instead of just finding matches of its components in the data), understanding the data (instead of just searching it for such matches), or representing knowledge in a way suitable for meaningful retrieval.
本文全面概述了文本和知识库语义搜索这一广泛领域。简而言之,语义搜索就是 "有意义的搜索"。这种 "意义 "可以指搜索过程的各个部分:理解查询(而不仅仅是在数据中找到与查询内容匹配的内容)、理解数据(而不仅仅是搜索数据中的匹配内容),或者以适合有意义检索的方式表示知识。
Semantic search is studied in a variety of different communities with a variety of different views of the problem. In this survey, we classify this work according to two dimensions: the type of data (text, knowledge bases, combinations of these) and the kind of search (keyword, structured, natural language). We consider all nine combinations. The focus is on fundamental techniques, concrete systems, and benchmarks. The survey also considers advanced issues: ranking, indexing, ontology matching and merging, and inference. It also provides a succinct overview of natural language processing techniques that are useful for semantic search: POS tagging, named-entity recognition and disambiguation, sentence parsing, and word vectors.
语义搜索的研究涉及多个不同领域,对这一问题的看法也各不相同。在本调查中,我们根据两个维度对这些工作进行分类:数据类型(文本、知识库、它们的组合)和搜索类型(关键词、结构化、自然语言)。我们考虑了所有九种组合。重点是基本技术、具体系统和基准。调查还考虑了一些高级问题:排序、索引、本体匹配与合并以及推理。它还简明扼要地概述了有助于语义搜索的自然语言处理技术:POS 标记、命名实体识别和消歧、句子解析和词向量。
The survey is as self-contained as possible, and should thus also serve as a good tutorial for newcomers to this fascinating and highly topical field.
该调查报告尽可能自成一体,因此也可作为这一引人入胜的热门话题领域的新手教程。
DOI: 10.1561 / 1500000032 10.1561 / 1500000032 10.1561//150000003210.1561 / 1500000032.

Introduction 导言

1.1 Motivation for this Survey
1.1 本次调查的动机

This is a survey about the broad field of semantic search. Semantics is the study of meaning. 1 1 ^(1){ }^{1} In a nutshell, therefore, it could be said that semantic search is search with meaning.
这是一份关于语义搜索这一广泛领域的调查报告。语义学是对意义的研究。 1 1 ^(1){ }^{1} 因此,简而言之,可以说语义搜索就是有意义的搜索。
Let us first understand this by looking at the opposite. Only a decade ago, search engines, including the big web search engines, were still mostly lexical. By lexical, we here mean that the search engine looks for literal matches of the query words typed by the user or variants of them, without making an effort to understand what the whole query actually means.
让我们先从反面来理解这一点。仅在十年前,包括大型网络搜索引擎在内的搜索引擎仍以词法为主。这里所说的词法,是指搜索引擎寻找与用户输入的查询词或其变体字面匹配的词,而不去理解整个查询词的实际含义。
Consider the query university freiburg issued to a web search engine. Clearly, the homepage of the University of Freiburg is a good match for this query. To identify this page as a match, the search engine does not need to understand what the two query words university and freiburg actually mean, nor what they mean together. In fact, the university homepage contains these two words in its title (and, as a
请看向网络搜索引擎发出的查询:弗莱堡大学(university freiburg)。显然,弗莱堡大学的主页与该查询匹配。搜索引擎不需要了解 university 和 freiburg 这两个查询词的实际含义,也不需要了解这两个词组合在一起的含义,就能将该网页识别为匹配网页。事实上,该大学主页的标题中就包含了这两个词(而且,作为一个
matter of fact, no other except the frequent word of). Further, the page is at the top level of its domain, as can be seen from its URL: http://www.uni-freiburg.de. Even more, the URL consists of parts of the query words. All these criteria are easy to check, and they alone make this page a very good candidate for the top hit of this query. No deeper understanding of what the query actually “meant” or what the homepage is actually “about” were needed. 2 2 ^(2){ }^{2}
事实上,除了 "of "这个经常出现的词之外,没有其他词)。此外,从该网页的 URL 可以看出,该网页位于其域的顶层:http://www.uni-freiburg.de。此外,URL 还包含了部分查询词。所有这些标准都很容易检查,而且仅凭这些标准就能使该页面成为该查询的热门候选页面。无需深入了解查询的实际 "含义 "或主页的实际 "内容"。 2 2 ^(2){ }^{2}
Modern search engines go more and more in the direction of accepting a broader variety of queries, actually trying to “understand” them, and providing the most appropriate answer in the most appropriate form, instead of just a list of (excerpts from) matching documents.
现代搜索引擎越来越倾向于接受更多种类的查询,实际尝试 "理解 "这些查询,并以最合适的形式提供最合适的答案,而不仅仅是匹配文档的(摘录)列表。
For example, consider the two queries computer scientists and f e f e fef e male computer scientists working on semantic search. The first query is short and simple, the second query is longer and more complex. Both are good examples of what we would call semantic search. The following discussion is independent of the exact form of these queries. They could be formulated as keyword queries like above. They could be formulated in the form of complete natural language queries. Or they could be formulated in an abstract query language. The point here is what the queries are asking for.
例如,考虑计算机科学家和 f e f e fef e 从事语义搜索工作的男性计算机科学家这两个查询。第一个查询短小精悍,第二个查询则更长更复杂。两者都是我们称之为语义搜索的很好的例子。下面的讨论与这些查询的确切形式无关。它们可以像上面那样以关键字查询的形式提出。它们可以以完整的自然语言查询形式提出。也可以用抽象查询语言来表述。这里的重点是查询所要求的内容。
To a human, the intention of both of these queries is quite clear: the user is (most likely) looking for scientists of a certain kind. Probably a list of them would be nice, with some basic information on each (for instance, a picture and a link to their homepage). For the query computer scientists, Wikipedia happens to provide a page with a corresponding list and matching query words. 3 3 ^(3){ }^{3} Correspondingly, the list is also contained in DBpedia, a database containing the structured knowledge from Wikipedia. But in both cases it is a manually compiled list, limited to relatively few better-known computer scientists. For the second query (female computer scientists working on semantic search), there is no single web page or other document with a corresponding
对于人类来说,这两个查询的意图非常明确:用户(很可能)在寻找某一类科学家。最好能提供一份科学家名单,并附带一些基本信息(例如,一张图片和一个主页链接)。对于 "计算机科学家 "这一查询,维基百科恰好提供了一个包含相应列表和匹配查询词的页面。 3 3 ^(3){ }^{3} 相应地,DBpedia 中也包含了该列表,这是一个包含维基百科结构化知识的数据库。但这两种情况下的列表都是人工编制的,仅限于相对较少的知名计算机科学家。对于第二个查询(从事语义搜索的女性计算机科学家),没有一个网页或其他文档与之相对应。
list, let alone one matching the query words. Given the specificity of the query, it is also unlikely that someone will ever manually compile such a list (in whatever format) and maintain it. Note that both lists are constantly changing over time, since new researchers may join any time.
更不用说与查询词匹配的列表了。鉴于查询的特殊性,也不太可能有人手动编制这样一个列表(无论格式如何)并对其进行维护。请注意,这两个列表都会随着时间的推移而不断变化,因为随时都可能有新的研究人员加入。
In fact, even individual web pages matching the query are unlikely to contain most of the query words. A computer scientist does not typically put the words computer scientist on his or her homepage. A female computer scientist is unlikely to put the word female on her homepage. The homepage probably has a section on that particular scientist’s research interests, but this section does not necessarily contain the word working (maybe it contains a similar word, or maybe no such word at all, but just a list of topics). The topic semantic search will probably be stated on a matching web page, though possibly in a different formulation, for example, intelligent search or knowledge retrieval.
事实上,即使与查询相匹配的单个网页也不可能包含大部分查询词。计算机科学家一般不会在自己的主页上写上计算机科学家。女性计算机科学家也不太可能在自己的主页上出现女性一词。主页上可能有一个关于该科学家研究兴趣的部分,但该部分不一定包含工作一词(可能包含一个类似的词,也可能根本没有这样的词,而只是一个主题列表)。主题语义搜索可能会在相匹配的网页上进行说明,不过可能会采用不同的表述方式,例如智能搜索或知识检索。
Both queries are thus good examples, where search needs to go beyond mere lexical matching of query words in order to provide a satisfactory result to the user. Also, both queries (in particular, the second one) require that information from several different sources is brought together to answer the query satisfactorily. Those information sources might be of different kinds: (unstructured) text as well as (structured) knowledge bases.
因此,这两个查询都是很好的例子,在这些查询中,搜索需要超越单纯的查询词词性匹配,才能为用户提供满意的结果。此外,这两个查询(尤其是第二个查询)都需要汇集来自多个不同来源的信息,才能令人满意地回答查询。这些信息源可能是不同类型的:(非结构化)文本和(结构化)知识库。
There is no exact definition of what semantic search is. In fact, semantic search means a lot of different things to different people. And researchers from many different communities are working on a large variety of problems related to semantic search, often without being aware of related work in other communities. This is the main motivation behind this survey.
语义搜索并没有确切的定义。事实上,语义搜索对不同的人意味着很多不同的东西。来自许多不同社区的研究人员正在研究与语义搜索相关的大量问题,而他们往往并不知道其他社区的相关工作。这就是本调查背后的主要动机。
When writing the survey, we had two audiences in mind: (i) newcomers to the field, and (ii) researchers already working on semantic search. Both audiences should get a comprehensive overview of which approaches are currently pursued in which communities, and what the current state of the art is. Both audiences should get pointers for further reading wherever the scope of this survey (defined in Section 1.2
在撰写调查报告时,我们考虑到了两类受众:(i) 该领域的新手;(ii) 已经从事语义搜索工作的研究人员。这两类受众都应全面了解哪些社区目前正在采用哪些方法,以及目前的技术水平如何。在本调查范围(定义见第 1.2 节)内,这两类读者都应获得进一步阅读的指点。

right next) ends. But we also provide explanations of the underlying concepts and technologies that are necessary to understand the various approaches. Thus, this survey should also make a good tutorial for a researcher previously unfamiliar with semantic search.
右下)结束。不过,我们也对理解各种方法所必需的基本概念和技术进行了解释。因此,对于以前不熟悉语义搜索的研究人员来说,本调查报告也是一本很好的教程。

1.2 Scope of this Survey
1.2 本次调查的范围

1.2.1 Kinds of Data 1.2.1 数据种类

This survey focuses on semantic search on text (in natural language) or knowledge bases (consisting of structured records). The two may also be combined. For example, a natural language text may be enriched with semantic markup that identifies mentions of entities from a knowledge base. Or several knowledge bases with different schemata may be combined, like in the Semantic Web. The types of data considered in this survey are explained in detail in Section 2.1 on Data Types and Common Datasets.
本调查侧重于文本(自然语言)或知识库(由结构化记录组成)的语义搜索。两者也可以结合使用。例如,可以用语义标记来丰富自然语言文本,从而识别知识库中提及的实体。或者像在语义网中那样,将几个具有不同模式的知识库结合起来。本调查所考虑的数据类型在第 2.1 节 "数据类型和通用数据集 "中有详细说明。
This survey does not cover search on images, audio, video, and other objects that have an inherently non-textual representation. This is not to say that semantic search is not relevant for this kind of data; quite the opposite is true. For example, consider a user looking for a picture of a particular person. Almost surely, the user is not interested in the precise arrangements of pixels that are used to represent the picture. She might not even be interested in the particular angle, selection, or lighting conditions of the picture, but only in the object shown. This is very much “semantic search”, but on a different kind of data. There is some overlap with search in textual data, including attempts to map non-textual to textual features and the use of text that accompanies the non-textual object (e.g., the caption of an image). But mostly, search in non-textual data is a different world that requires quite different techniques and tools.
本调查不包括对图像、音频、视频和其他本质上非文本表示的对象的搜索。这并不是说语义搜索与这类数据无关;事实恰恰相反。例如,考虑一个正在查找某个人的照片的用户。几乎可以肯定的是,用户对用于表示图片的像素的精确排列并不感兴趣。她甚至可能对图片的特定角度、选择或照明条件都不感兴趣,而只对图片中显示的对象感兴趣。这在很大程度上是 "语义搜索",但搜索的是另一种数据。与文本数据搜索有一些重叠,包括尝试将非文本特征映射到文本特征,以及使用非文本对象的附带文本(如图片说明)。但主要的是,非文本数据搜索是一个不同的世界,需要完全不同的技术和工具。
A special case of image and audio data are scans of text documents and speech. The underlying data is also textual 4 4 ^(4){ }^{4} and can be extracted using optical character recognition (OCR) and automatic speech recognition (ASR) techniques. We do not consider these techniques in this
图像和音频数据的一个特例是文本文件和语音的扫描。基础数据也是文本 4 4 ^(4){ }^{4} ,可以使用光学字符识别 (OCR) 和自动语音识别 (ASR) 技术提取。我们在本文中不考虑这些技术。
survey. However, we acknowledge that “semantic techniques”, as described in this survey, can be helpful in the text recognition process. For example, in both OCR and ASR, a semantic understanding of the possible textual interpretations can help to decide which interpretation is the most appropriate.
调查。不过,我们承认,本调查中所描述的 "语义技术 "在文本识别过程中是有帮助的。例如,在 OCR 和 ASR 中,对可能的文本解释的语义理解有助于决定哪种解释最合适。
There are three types of queries prevailing in semantic search: keyword, structured, and natural language. We cover the whole spectrum in this survey; see Section 2.2 on Search Paradigms.
语义搜索中有三种常见的查询类型:关键词、结构化查询和自然语言查询。我们在本调查中涵盖了所有类型;请参见第 2.2 节 "搜索范式"。
Concerning the kind of results returned, we take a narrower view: we focus on techniques and systems that are extractive in the sense that they return elements or excerpts from the original data. Think of the result screen from a typical web search engine. The results are nicely arranged and partly reformatted, so that we can digest them properly. But it’s all excerpts and elements from the web pages and knowledge bases being searched in the background.
关于返回结果的类型,我们的视野较窄:我们关注的是提取性技术和系统,即返回原始数据的元素或节选。想想典型的网络搜索引擎的结果屏幕。搜索结果排列整齐,并进行了部分格式化,这样我们就能正确地消化它们。但是,这些都是从后台搜索的网页和知识库中摘录的内容和元素。
We only barely touch upon the analysis of query logs (queries asked) and clickthrough data (results clicked). Such data can be used to derive information on what users found relevant for a particular query. Modern web search engines leverage such information to a significant extent. This topic is out of scope for this survey, since an explicit “understanding” of the query or the data is not necessary. We refer the user to the seminal paper of Joachims [2002] and the recent survey of Silvestri [2010].
我们只是勉强谈及了对查询日志(提出的查询)和点击数据(点击的结果)的分析。这些数据可用于获取用户认为与特定查询相关的信息。现代网络搜索引擎在很大程度上利用了这些信息。由于不需要明确 "理解 "查询或数据,因此这一主题不在本调查范围之内。我们建议用户参阅 Joachims [2002] 的开创性论文和 Silvestri [2010] 的最新调查报告。
There is also a large body of research that involves the complex synthesis of new information, in particular, text. For example, in automatic summarization, the goal is to summarize a given (long) text document, preserving the main content and a consistent style. In multidocument summarization, this task is extended to multiple documents on a particular topic or question. For example, compile a report on drug trafficking in the united states over the past decade. Apart from collecting the various bits and pieces of text and knowledge required to answer these questions, the main challenge becomes to compile these into a compact and coherent text that is well comprehensible for hu-
还有大量研究涉及新信息,特别是文本信息的复杂合成。例如,在自动摘要中,目标是对给定的(长)文本文档进行摘要,同时保留主要内容和一致的风格。在多文档摘要中,这项任务扩展到关于特定主题或问题的多个文档。例如,编写一份关于美国过去十年贩毒情况的报告。除了收集回答这些问题所需的各种零碎文本和知识外,如何将这些内容编译成一个紧凑连贯的文本,让人类能够很好地理解,就成了主要的挑战。

mans. Such non-trivial automatic content synthesis is out of scope for this survey.
人。这种非繁琐的自动内容合成不在本次调查的范围之内。

1.2.3 Further inclusion criteria
1.2.3 其他纳入标准

As just explained, we focus on semantic search on text and knowledge bases that retrieves elements and excerpts from the original data. But even there we cannot possibly cover all existing research in depth.
如前所述,我们的重点是对文本和知识库进行语义搜索,检索原始数据中的元素和摘录。但即便如此,我们也不可能深入涵盖所有现有研究。
Our inclusion criteria for this survey are very practically oriented, with a focus on fundamental techniques, datasets, benchmarks, and systems. Systems were selected with a strong preference for those evaluated on one of the prevailing benchmarks or that come with a working software or demo. We provide quantitative information (on the benchmarks and the performance and effectiveness of the various systems) wherever possible.
本次调查的收录标准非常注重实用性,重点关注基础技术、数据集、基准和系统。在系统的选择上,我们更倾向于那些已通过某项主流基准评估的系统,或者那些附带有可运行软件或演示的系统。我们尽可能提供量化信息(关于基准以及各种系统的性能和有效性)。
We omit most of the history and mostly focus on the state of the art. The historical perspective is interesting and worthwhile in its own right, but the survey is already long and worthwhile without this. However, we usually mention the first system of a particular kind. Also, for each of our nine categories (explained right next, in Section 1.3), we describe systems in chronological order and make sure to clarify the improvements of the newer systems over the older ones.
我们省略了大部分历史内容,主要关注技术现状。历史视角本身就很有趣,也很有价值,但如果不谈历史视角,我们的调查报告已经很长了,也很有价值。不过,我们通常会提及某一特定类型的第一个系统。此外,对于九个类别中的每一个类别(接下来将在第 1.3 节中解释),我们都会按照时间顺序进行描述,并确保阐明较新系统相对于较旧系统的改进之处。

1.2.4 Further Reading 1.2.4 延伸阅读

The survey provides pointers for further reading at many places. Additionally, we provide here a list of well-known conferences and journals, grouped by research community, which are generally good sources for published research on the topic of this survey and beyond. In particular, the bibliography of this survey contains (many) references from each of these venues. This list is by no means complete, and there are many good papers that are right on topic but published in other venues.
本调查报告提供了许多可供进一步阅读的资料。此外,我们在此提供了一份知名会议和期刊清单,按研究团体分类,这些会议和期刊通常是本调查及其他主题已发表研究成果的良好来源。特别是,本调查报告的参考书目中包含了来自上述各个领域的(许多)参考文献。这份清单并不完整,还有很多优秀论文与主题相符,但却发表在其他刊物上。
Information Retrieval: SIGIR, CIKM, TREC, TAC, FNTIR.
信息检索:SIGIR、CIKM、TEC、TAC、FNTIR。

Web and Semantic Web: WWW, ISWC, ESWC, AAAI, JWS.
网络和语义网:www, iswc, eswc, aaai, jws.

Computer linguistics: ACL, EMNLP, HLT-NAACL.
计算机语言学:ACL、EMNLP、HLT-NAACL。

Databases / Data Mining: VLDB, KDD, SIGMOD, TKDE.
数据库/数据挖掘:VLDB、KDD、SIGMOD、TKDE。

1.3 Overview of this Survey
1.3 本次调查概述

Section 1.4 provides a Glossary of terms that are strongly related to semantic search. For each of these, we provide a brief description together with a pointer to the relevant passages in the survey. This is useful for readers who specifically look for material on a particular problem or aspect.
第 1.4 节提供了与语义搜索密切相关的术语表。对于其中的每一个术语,我们都提供了简要说明,并指出了调查报告中的相关段落。这对专门查找特定问题或方面资料的读者很有帮助。
Section 2 on Classification by Data Type and Search Paradigm describes the two main dimensions that we use for categorizing research on semantic search:
第 2 节 "按数据类型和搜索范式分类 "介绍了我们用来对语义搜索研究进行分类的两个主要维度:

Data type: text, knowledge bases, and combined data.
数据类型:文本、知识库和组合数据。

Search paradigm: keyword, structured, and natural language search.
搜索范式:关键词搜索、结构化搜索和自然语言搜索。

For each data type, we provide a brief characterization and a list of frequently used datasets. For each search paradigm, we provide a brief characterization and one or two examples.
对于每种数据类型,我们都提供了简要说明和常用数据集列表。对于每种搜索范式,我们都会提供简要说明和一两个示例。
Section 3 on Basic NLP Tasks in Semantic Search gives an overview of: part-of-speech (POS) tagging, named-entity recognition and disambiguation (NER+NED), parsing the grammatical structure of sentences, and word vectors / embeddings. These are used as basic building blocks by various (though not all) of the approaches described in our main Section 4. We give a brief tutorial on each of these tasks, as well as a succinct summary of the state of the art.
第3节 "语义搜索中的基本NLP任务 "概述了以下内容:语音部分(POS)标记、命名实体识别和消歧(NER+NED)、句子语法结构解析以及词向量/嵌入。这些都是第 4 节中介绍的各种(但不是全部)方法的基本构件。我们将对这些任务逐一进行简要介绍,并对相关技术的现状进行简明扼要的总结。
Section 4 on Approaches and Systems for Semantic Search is the core section of this survey. We group the many approaches and systems that exist in the literature by data type (three categories, see above) and search paradigm (three categories, see above). The resulting nine combinations are shown in Figure 1.1. In a sense, this figure is the main signpost for this survey. Note that we use Natural Language Search and Question Answering synonymously in this survey. All nine subsections share the same sub-structure:
第4节 "语义搜索的方法和系统 "是本调查报告的核心部分。我们按照数据类型(三类,见上文)和搜索范式(三类,见上文)对文献中存在的众多方法和系统进行了分组。由此产生的九种组合如图 1.1 所示。从某种意义上说,这张图就是本次调查的主要路标。请注意,我们在本调查中将自然语言搜索和问题解答作为同义词使用。所有九个小节都具有相同的子结构:
Profile … a short characterization of this line of research
简介......对这一研究方向的简短描述

Techniques … what are the basic techniques used
技术......使用哪些基本技术

Systems … a concise description of milestone systems or software Benchmarks … existing benchmarks and the best results on them
系统......里程碑式系统或软件的简明描述 基准......现有基准及其最佳结果
 关键词搜索
Keyword
Search
Keyword Search| Keyword | | :--- | | Search |
 结构化搜索
Structured
Search
Structured Search| Structured | | :--- | | Search |
 自然语言搜索
Natural Lang.
Search
Natural Lang. Search| Natural Lang. | | :--- | | Search |
Text 文本

第 4.1 节 文本关键词搜索
Section 4.1
Keyword Search
on Text
Section 4.1 Keyword Search on Text| Section 4.1 | | :--- | | Keyword Search | | on Text |

第 4.3 节 从文本中提取结构化数据
Section 4.3
Structured Data
Extraction from Text
Section 4.3 Structured Data Extraction from Text| Section 4.3 | | :--- | | Structured Data | | Extraction from Text |

第 4.7 节 文本问题解答
Section 4.7
Question Answering
on Text
Section 4.7 Question Answering on Text| Section 4.7 | | :--- | | Question Answering | | on Text |
 知识库
Knowledge
Bases
Knowledge Bases| Knowledge | | :--- | | Bases |

第 4.4 节 知识库关键词搜索
Section 4.4
Keyword Search on
Knowledge Bases
Section 4.4 Keyword Search on Knowledge Bases| Section 4.4 | | :--- | | Keyword Search on | | Knowledge Bases |

第 4.2 节 在知识库中进行结构化搜索
Section 4.2
Structured Search
on Knowledge Bases
Section 4.2 Structured Search on Knowledge Bases| Section 4.2 | | :--- | | Structured Search | | on Knowledge Bases |

第 4.8 节 知识库中的问题解答
Section 4.8
Question Answering
on Knowledge Bases
Section 4.8 Question Answering on Knowledge Bases| Section 4.8 | | :--- | | Question Answering | | on Knowledge Bases |
 合并数据
Combined
Data
Combined Data| Combined | | :--- | | Data |

第 4.5 节 综合数据关键词搜索
Section 4.5
Keyword Search
on Combined Data
Section 4.5 Keyword Search on Combined Data| Section 4.5 | | :--- | | Keyword Search | | on Combined Data |

第 4.6 节 半结构化数据搜索综合数据搜索
Section 4.6
Semi-Struct. Search
on Combined Data
Section 4.6 Semi-Struct. Search on Combined Data| Section 4.6 | | :--- | | Semi-Struct. Search | | on Combined Data |

第 4.9 节 综合数据的问题解答
Section 4.9
Question Answering
on Combined Data
Section 4.9 Question Answering on Combined Data| Section 4.9 | | :--- | | Question Answering | | on Combined Data |
"Keyword Search" "Structured Search" "Natural Lang. Search" Text "Section 4.1 Keyword Search on Text" "Section 4.3 Structured Data Extraction from Text" "Section 4.7 Question Answering on Text" "Knowledge Bases" "Section 4.4 Keyword Search on Knowledge Bases" "Section 4.2 Structured Search on Knowledge Bases" "Section 4.8 Question Answering on Knowledge Bases" "Combined Data" "Section 4.5 Keyword Search on Combined Data" "Section 4.6 Semi-Struct. Search on Combined Data" "Section 4.9 Question Answering on Combined Data"| | Keyword <br> Search | Structured <br> Search | Natural Lang. <br> Search | | :---: | :---: | :---: | :---: | | Text | Section 4.1 <br> Keyword Search <br> on Text | Section 4.3 <br> Structured Data <br> Extraction from Text | Section 4.7 <br> Question Answering <br> on Text | | Knowledge <br> Bases | Section 4.4 <br> Keyword Search on <br> Knowledge Bases | Section 4.2 <br> Structured Search <br> on Knowledge Bases | Section 4.8 <br> Question Answering <br> on Knowledge Bases | | Combined <br> Data | Section 4.5 <br> Keyword Search <br> on Combined Data | Section 4.6 <br> Semi-Struct. Search <br> on Combined Data | Section 4.9 <br> Question Answering <br> on Combined Data |
Figure 1.1: Our basic classification of research on semantic search by underlying data (rows) and search paradigm (columns). The three data types are explained in Section 2.1, the three search paradigms are explained in Section 2.2. Each of the nine groups is discussed in the indicated subsection of our main Section 4.
图 1.1:我们按基础数据(行)和搜索范式(列)对语义搜索研究进行的基本分类。第 2.1 节解释了三种数据类型,第 2.2 节解释了三种搜索范式。我们将在第 4 节主要部分的指定小节中分别讨论这九组研究。
Section 5 on Advanced Techniques for Semantic Search deals with: ranking (in semantic entity search), indexing (getting not only good results but getting them fast), ontology matching and merging (dealing with multiple knowledge bases), and inference (information that is not directly contained in the data but can be inferred from it). They provide a deeper understanding of the aspects that are critical for results of high quality and/or with high performance.
第5节 "语义搜索的高级技术 "涉及:排序(在语义实体搜索中)、索引(不仅要获得好的结果,而且要快速获得结果)、本体匹配与合并(处理多个知识库)以及推理(数据中没有直接包含但可以从中推断出的信息)。它们让我们对高质量和/或高性能结果的关键方面有了更深入的了解。
Section 6 on The Future of Semantic Search provides a very brief summary of the state of the art in semantic search, as described in the main sections of this survey, and then dares to take a look into the near and the not so near future.
第6节 "语义搜索的未来 "简要概述了本调查报告各主要部分所述的语义搜索技术现状,然后大胆展望了不久的将来和不远的将来。
The article closes with a long list of 218 references. Datasets and standards are not listed as part of the References but separately in the Appendices. In the PDF of this article, all citations in the text are clickable (leading to the respective entry in the References), and so are
文章最后列出了一长串 218 篇参考文献。数据集和标准不作为参考文献的一部分,而是单独列在附录中。在本文的 PDF 文件中,正文中的所有引文都可以点击(指向参考文献中的相应条目),因此

most of the titles in the References (leading to the respective article on the Web). In most PDF readers, Alt+Left brings you back to the place of the citation.
参考文献中的大部分标题(指向网络上的相关文章)。在大多数 PDF 阅读器中,Alt+Left 可让您回到引用处。
The reader may wonder about possible reading orders and which sections depend upon which. In fact, each of the six sections of this survey is relatively self-contained and readable on its own. This is true even for each of the nine subsections (one for each kind of semantic search, according to our basic classification) of the main Section 4. However, when reading such a subsection individually, it is a good idea to prepend a quick read of those subsections from Section 2 that deal with the respective data type and search paradigm: they are short and easy to read, with instructive examples. Readers looking for specific information may find the glossary, which comes right next, useful.
读者可能会想知道可能的阅读顺序,以及哪些章节取决于哪些章节。事实上,本调查报告的六个部分中的每一部分都相对独立,可以单独阅读。即使是主节 4 的九个小节(根据我们的基本分类,每种语义搜索一个小节)也是如此。不过,在单独阅读这些小节时,最好先快速阅读第 2 节中涉及相应数据类型和搜索范式的小节:这些小节简短易读,并附有具有指导意义的示例。读者在查找具体信息时,可以参考接下来的术语表。

1.4 Glossary 1.4 术语表

This glossary provides a list of techniques or aspects that are strongly related to semantic search but non-trivial to find using our basic classification. For each item, we provide a very short description and a pointer to the relevant section(s) of the survey.
本术语表列出了与语义搜索密切相关、但使用我们的基本分类方法难以找到的技术或方面。对于每个项目,我们都提供了非常简短的描述,并指出了调查报告的相关章节。
Deep learning for NLP: natural language processing using (deep) neural networks; used for the word vectors in Section 3.4; some of the systems in Section 4.8 on Question Answering on Knowledge Bases use deep learning or word vectors; apart from that, deep NLP is still used very little in actual systems for semantic search, but see Section 6 on The Future of Semantic Search.
用于 NLP 的深度学习:使用(深度)神经网络进行自然语言处理;用于第 3.4 节中的单词向量;第 4.8 节 "知识库问题解答 "中的一些系统使用了深度学习或单词向量;除此之外,深度 NLP 在语义搜索实际系统中的使用仍然很少,但请参见第 6 节 "语义搜索的未来"。
Distant supervision: technique to derive labeled training data using heuristics in order to learn a (supervised) classifier; the basic principle and significance for semantic search is explained in Section 4.3.2 on Systems for Relationship Extraction from Text.
远距离监督:使用启发式方法获得有标记的训练数据以学习(监督)分类器的技术;基本原理和对语义搜索的意义在第 4.3.2 节 "从文本中提取关系的系统 "中解释。
Entity resolution: identify that two different strings refer to the same entity; this is used in Section 4.3.4 on Knowledge Base Construction and discussed more generally in Section 5.4 on Ontology Matching and Merging.
实体解析:确定两个不同的字符串指的是同一个实体;这在第 4.3.4 节 "知识库构建 "中使用,并在第 5.4 节 "本体匹配与合并 "中进行了更广泛的讨论。
Entity search/retrieval: search on text or combined data that aims at a particular entity or list of entities as opposed to a list of documents; this applies to almost all the systems in Section 4 that work with combined data or natural language queries 5 5 ^(5){ }^{5}; see also Section 5.1, which is all about ranking techniques for entity search.
实体搜索/检索:针对特定实体或实体列表(而非文档列表)的文本或组合数据搜索;这适用于第 4 节中几乎所有使用组合数据或自然语言查询 5 5 ^(5){ }^{5} 的系统;另请参阅第 5.1 节,该节主要介绍实体搜索的排序技术。

Knowledge base construction: constructing or enriching a knowledge base from a given text corpus; basic techniques are explained in Section 4.3.1; systems are described in Section 4.3.4.
知识库构建:从给定的文本语料库中构建或丰富知识库;基本技术在第 4.3.1 节中说明;系统在第 4.3.4 节中说明。
Learning to rank for semantic search: supervised learning of good ranking functions; several applications in the context of semantic search are described in Section 5.1.
为语义搜索学习排序:在监督下学习良好的排序功能;第 5.1 节介绍了在语义搜索方面的一些应用。
Ontology merging and matching: reconciling and aligning naming schemes and contents of different knowledge bases; this is the topic of Section 5.3.
本体合并和匹配:协调和统一不同知识库的命名方案和内容;这是第 5.3 节的主题。
Paraphrasing or synonyms: identifying whether two words, phrases or sentences are synonymous; systems in Section 4.8 on Question Answering on Knowledge Bases make use of this; three datasets that are used by systems described in this survey are: Patty [2013] (paraphrases extracted in an unsupervised fashion), Paralex [2013] (question paraphrases), and CrossWikis [2012] (Wikipedia entity anchors in multiple languages).
转述或同义词:识别两个词、短语或句子是否同义;第 4.8 节 "知识库问题解答 "中的系统利用了这一点;本调查中所述系统使用的三个数据集是Patty [2013](以无监督方式提取的转述)、Paralex [2013](问题转述)和 CrossWikis [2012](多语言维基百科实体锚)。

Question answering: synonymous with natural language search in this survey; see Section 2.2.3 for a definition; see Sections 4.7, 4.8, and 4.9 for research on question answering on each of our three data types. Reasoning/Inference: using reasoning to infer new triples from a given knowledge base; this is the topic of Section 5.4.
问题解答:在本调查中与自然语言搜索同义;有关定义,请参见第 2.2.3 节;有关三种数据类型中问题解答的研究,请参见第 4.7、4.8 和 4.9 节。推理/推断:使用推理从给定的知识库中推断出新的三元组;这是第 5.4 节的主题。

Semantic parsing: finding the logical structure of a natural language query; this is described in Sections 4.8 on Question Answering on Knowledge Bases and used by many of the systems there.
语义解析:找出自然语言查询的逻辑结构;这在第 4.8 节 "知识库问题解答 "中有所描述,许多系统都在使用。

Semantic web: a framework for explicit semantic data on the web; this kind of data is described in Section 2.1.3; the systems described
语义网:网络上明确语义数据的框架;第 2.1.3 节描述了这类数据;描述的系统包括
in Section 4.5 deal with this kind of data; it is important to note that many papers / systems that claim to be about semantic web data are actually dealing only with a single knowledge base (like DBpedia, see Table 2.2), and are hence described in the sections dealing with search on knowledge bases.
值得注意的是,许多自称与语义网数据有关的论文/系统实际上只涉及一个知识库(如 DBpedia,见表 2.2),因此在涉及知识库搜索的章节中进行了介绍。
Information extraction: extracting structured information from text; this is exactly what Section 4.3 on Structured Data Extraction from Text is about.
信息提取:从文本中提取结构化信息;这正是第 4.3 节 "从文本中提取结构化数据 "的内容。
XML retrieval: search in nested semi-structured data (text with tag pairs, which can be arbitrarily nested); the relevance for semantic search is discussed in Section 4.5.3 in the context of the INEX series of benchmarks.
XML 检索:在嵌套的半结构化数据(带有标签对的文本,可任意嵌套)中进行搜索;第 4.5.3 节将结合 INEX 系列基准讨论语义搜索的相关性。

References 参考资料

Abadi, D., P. Boncz, S. Harizopoulus, S. Idreos, and S. Madden (2013). The design and implementation of modern column-oriented database systems. In: Foundations and Trends in Databases 5.3, pp. 197-280.
Abadi, D., P. Boncz, S. Harizopoulus, S. Idreos, and S. Madden (2013).面向列的现代数据库系统的设计与实现。In:数据库基础与趋势 5.3》,第 197-280 页。

Agarwal, A., S. Chakrabarti, and S. Aggarwal (2006). Learning to rank networked entities. In: K D D K D D KDDK D D, pp. 14-23.
Agarwal, A., S. Chakrabarti, and S. Aggarwal (2006).学习对网络实体进行排序。In: K D D K D D KDDK D D ,第 14-23 页。
Agrawal, S., S. Chaudhuri, and G. Das (2002). DBXplorer: enabling keyword search over relational databases. In: SIGMOD, p. 627.
Agrawal, S., S. Chaudhuri, and G. Das (2002).DBXplorer: enabling keyword search over relational databases.In:SIGMOD, p. 627.

Angeli, G., S. Gupta, M. Jose, C. D. Manning, C. Ré, J. Tibshirani, J. Y. Wu, S. Wu, and C. Zhang (2014). Stanford’s 2014 slot filling systems. In: T A C K B P T A C K B P TAC-KBPT A C-K B P.
Angeli, G., S. Gupta, M. Jose, C. D. Manning, C. Ré, J. Tibshirani, J. Y. Wu, S. Wu, and C. Zhang (2014)。斯坦福大学 2014 年插槽填充系统。In: T A C K B P T A C K B P TAC-KBPT A C-K B P
Arasu, A. and H. Garcia-Molina (2003). Extracting structured data from web pages. In: SIGMOD, pp. 337-348.
Arasu, A. and H. Garcia-Molina (2003)。从网页中提取结构化数据。In:SIGMOD, pp.
Armstrong, T. G., A. Moffat, W. Webber, and J. Zobel (2009a). Has adhoc retrieval improved since 1994? In: SIGIR, pp. 692-693.
Armstrong, T. G., A. Moffat, W. Webber, and J. Zobel (2009a).自 1994 年以来临时检索是否有所改进?In:SIGIR, pp.

Armstrong, T. G., A. Moffat, W. Webber, and J. Zobel (2009b). Improvements that don’t add up: ad-hoc retrieval results since 1998. In: CIKM, pp. 601610 .
Armstrong, T. G., A. Moffat, W. Webber, and J. Zobel (2009b).不尽人意的改进:1998 年以来的临时检索结果。In:CIKM, pp.
Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives (2007). DBpedia: a nucleus for a web of open data. In: ISWC/ASWC, pp. 722 735 722 735 722-735722-735.
Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives (2007).DBpedia: a nucleus for a web of open data.In:ISWC/ASWC, pp.
Balakrishnan, S., A. Halevy, B. Harb, H. Lee, J. Madhavan, A. Rostamizadeh, W. Shen, K. Wilder, F. Wu, and C. Yu (2015). Applying WebTables in practice. In: C I D R C I D R CIDRC I D R.
Balakrishnan, S., A. Halevy, B. Harb, H. Lee, J. Madhavan, A. Rostamizadeh, W. Shen, K. Wilder, F. Wu, and C. Yu (2015)。在实践中应用 WebTables。In: C I D R C I D R CIDRC I D R
Balmin, A., V. Hristidis, and Y. Papakonstantinou (2004). ObjectRank: authority-based keyword search in databases. In: V L D B V L D B VLDBV L D B, pp. 564-575.
Balmin, A., V. Hristidis, and Y. Papakonstantinou (2004).ObjectRank:数据库中基于权威的关键词搜索。In: V L D B V L D B VLDBV L D B ,第 564-575 页。
Balog, K. and R. Neumayer (2013). A test collection for entity search in DBpedia. In: SIGIR, pp. 737 740 737 740 737-740737-740.
Balog, K. and R. Neumayer (2013).DBpedia 实体搜索测试集。In:SIGIR, pp.

Balog, K., P. Serdyukov, and A. P. de Vries (2010). Overview of the TREC 2010 Entity Track. In: TREC.
Balog, K., P. Serdyukov, and A. P. de Vries (2010).TREC 2010 实体轨道概述。In:TREC.

Balog, K., P. Serdyukov, and A. P. de Vries (2011). Overview of the TREC 2011 Entity Track. In: TREC.
Balog, K., P. Serdyukov, and A. P. de Vries (2011).TREC 2011 实体轨道概述。In:TREC.
Balog, K., A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld (2009). Overview of the TREC 2009 Entity Track. In: TREC.
Balog, K., A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld (2009).TREC 2009 实体轨道概览。In:TREC.

Balog, K., Y. Fang, M. de Rijke, P. Serdyukov, and L. Si (2012). Expertise retrieval. In: Foundations and Trends in Information Retrieval 6.2-3, pp. 127-256.
Balog, K., Y. Fang, M. de Rijke, P. Serdyukov, and L. Si (2012)。专业知识检索。In:信息检索的基础与趋势》6.2-3,第 127-256 页。

Banko, M., M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni (2007). Open information extraction from the Web. In: IJCAI, pp. 26702676 .
Banko, M., M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni (2007).从网络中提取开放信息。In:IJCAI, pp.
Bast, H. and I. Weber (2006). Type less, find more: fast autocompletion search with a succinct index. In: SIGIR, pp. 364-371.
Bast, H. and I. Weber (2006).少输入,多查找:使用简明索引的快速自动完成搜索。In:SIGIR, pp.

Bast, H., A. Chitea, F. M. Suchanek, and I. Weber (2007). ESTER: efficient search on text, entities, and relations. In: SIGIR, pp. 671-678.
Bast, H., A. Chitea, F. M. Suchanek, and I. Weber (2007).ESTER:文本、实体和关系的高效搜索。In:SIGIR, pp.
Bast, H. and B. Buchhold (2013). An index for efficient semantic full-text search. In: CIKM, pp. 369-378.
Bast, H. and B. Buchhold (2013)。高效语义全文搜索索引。In:CIKM, pp.
Bast, H., B. Buchhold, and E. Haussmann (2015). Relevance scores for triples from type-like relations. In: SIGIR, pp. 243-252.
Bast, H., B. Buchhold, and E. Haussmann (2015)。类类型关系三元组的相关性评分。In:SIGIR, pp.
Bast, H. and E. Haussmann (2013). Open information extraction via contextual sentence decomposition. In: ICSC, pp. 154-159.
Bast, H. 和 E. Haussmann (2013)。通过上下文句子分解进行开放式信息提取。In:ICSC, pp.

Bast, H. and E. Haussmann (2014). More informative open information extraction via simple inference. In: ECIR, pp. 585-590.
Bast, H. 和 E. Haussmann (2014)。通过简单推理提取信息量更大的开放信息。In:ECIR, pp.

Bast, H. and E. Haussmann (2015). More accurate question answering on Freebase. In: CIKM, pp. 1431-1440.
Bast, H. 和 E. Haussmann (2015)。更准确的 Freebase 问题解答。In:CIKM, pp.

Bast, H., F. Bäurle, B. Buchhold, and E. Haussmann (2012). Broccoli: semantic full-text search at your fingertips. In: CoRR abs/1207.2615.
Bast, H., F. Bäurle, B. Buchhold, and E. Haussmann (2012)。西兰花:指尖上的语义全文搜索。In:CoRR abs/1207.2615.

Bast, H., F. Bäurle, B. Buchhold, and E. Haußmann (2014a). Easy access to the Freebase dataset. In: W W W W W W WWWW W W, pp. 95-98.
Bast, H., F. Bäurle, B. Buchhold, and E. Haußmann (2014a).轻松访问 Freebase 数据集。In: W W W W W W WWWW W W ,第 95-98 页。

Bast, H., F. Bäurle, B. Buchhold, and E. Haußmann (2014b). Semantic fulltext search with Broccoli. In: SIGIR, pp. 1265-1266.
Bast, H., F. Bäurle, B. Buchhold, and E. Haußmann (2014b)。西兰花语义全文搜索。In:SIGIR, pp.
Bastings, J. and K. Sima’an (2014). All fragments count in parser evaluation. In: L R E C L R E C LRECL R E C, pp. 78 82 78 82 78-8278-82.
Bastings, J. 和 K. Sima'an(2014 年)。解析器评估中的所有片段计数。In: L R E C L R E C LRECL R E C ,第 78 82 78 82 78-8278-82 页。
Berant, J. and P. Liang (2014). Semantic parsing via paraphrasing. In: A C L A C L ACLA C L, pp. 1415 1425 1415 1425 1415-14251415-1425.
Berant, J. and P. Liang (2014).通过解析进行语义分析。In. A C L A C L ACLA C L , pp: A C L A C L ACLA C L ,第 1415 1425 1415 1425 1415-14251415-1425 页。

Berant, J., A. Chou, R. Frostig, and P. Liang (2013a). Semantic parsing on freebase from question-answer pairs. In: EMNLP, pp. 1533-1544.
Berant, J., A. Chou, R. Frostig, and P. Liang (2013a)。自由库中的问答对语义解析。EMNLP, pp:EMNLP, pp.

Berant, J., A. Chou, R. Frostig, and P. Liang (2013b). The WebQuestions Benchmark. In: Introduced by [Berant, Chou, Frostig, and Liang, 2013a].
Berant, J., A. Chou, R. Frostig, and P. Liang (2013b)。WebQuestions 基准。见:介绍:[Berant、Chou、Frostig 和 Liang,2013a]。

Bhalotia, G., A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan (2002). Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431 440 431 440 431-440431-440.
Bhalotia, G., A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan (2002).使用 BANKS 在数据库中进行关键词搜索和浏览。In. ICDE, pp:ICDE, pp.

Bizer, C. and A. Schultz (2009). The Berlin SPARQL Benchmark. In: IJSWIS 5.2 , pp. 1 24 1 24 1-241-24.
Bizer, C. and A. Schultz (2009).柏林 SPARQL 基准。In:IJSWIS 5.2 , pp.
Blanco, R., P. Mika, and S. Vigna (2011). Effective and efficient entity search in RDF data. In: I S W C I S W C ISWCI S W C, pp. 83-97.
Blanco, R., P. Mika, and S. Vigna (2011).有效、高效的 RDF 数据实体搜索。In: I S W C I S W C ISWCI S W C ,第 83-97 页。

Blanco, R., H. Halpin, D. M. Herzig, P. Mika, J. Pound, H. S. Thompson, and D. T. Tran (2011). Entity search evaluation over structured web data. In: SIGIR-EOS. Vol. 2011.
Blanco, R., H. Halpin, D. M. Herzig, P. Mika, J. Pound, H. S. Thompson, and D. T. Tran (2011).结构化网络数据的实体搜索评估。In:SIGIR-EOS.2011 卷。
Bleiholder, J. and F. Naumann (2008). Data fusion. In: ACM Comput. Surv. 41.1, 1:1-1:41.
Bleiholder, J. and F. Naumann (2008)。Data fusion.In:ACM Comput.Surv.41.1, 1:1-1:41.
Boldi, P. and S. Vigna (2005). MG4J at TREC 2005. In: TREC.
Boldi, P. and S. Vigna (2005).MG4J at TREC 2005.In:TREC.

Bollacker, K. D., C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008). Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247-1250.
Bollacker, K. D., C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008).Freebase: a collaboratively created graph database for structuring human knowledge.In:SIGMOD, pp.
Bordes, A., S. Chopra, and J. Weston (2014). Question answering with subgraph embeddings. In: CoRR abs/1406.3676.
Bordes, A., S. Chopra, and J. Weston (2014)。用子图嵌入回答问题。In:CoRR abs/1406.3676。

Bordes, A. and E. Gabrilovich (2015). Constructing and mining web-scale knowledge graphs: WWW 2015 tutorial. In: W W W W W W WWWW W W, p. 1523.
Bordes, A. and E. Gabrilovich (2015).构建和挖掘网络规模的知识图谱:WWW 2015 tutorial.In: W W W W W W WWWW W W ,第 1523 页。

Broekstra, J., A. Kampman, and F. van Harmelen (2002). Sesame: a generic architecture for storing and querying RDF and RDF schema. In: I S W C I S W C ISWCI S W C, pp. 54 68 54 68 54-6854-68.
Broekstra, J., A. Kampman, and F. van Harmelen (2002).Sesame: a generic architecture for storing and querying RDF and RDF schema.Sesame: a generic architecture for storing and query RDF and RDF schema: I S W C I S W C ISWCI S W C ,第 54 68 54 68 54-6854-68 页。
Bruijn, J. de, M. Ehrig, C. Feier, F. Martín-Recuerda, F. Scharffe, and M. Weiten (2006). Ontology mediation, merging and aligning. In: Semantic Web Technologies, pp. 95-113.
Bruijn, J. de, M. Ehrig, C. Feier, F. Martín-Recuerda, F. Scharffe, and M. Weiten (2006).本体中介、合并与对齐。In. Semantic Web Technologies, pp:语义网技术》,第 95-113 页。
Bruni, E., N. Tran, and M. Baroni (2014). Multimodal Distributional Semantics. In: JAIR 49, pp. 1-47.
Bruni, E., N. Tran, and M. Baroni (2014)。多模态分布语义学》。In. JAIR 49, pp:JAIR 49, pp.
Cafarella, M., A. Halevy, D. Wang, E. Wu, and Y. Zhang (2008). WebTables: exploring the power of tables on the web. In: P V L D B 1.1 P V L D B 1.1 PVLDB 1.1P V L D B 1.1, pp. 538-549.
Cafarella, M., A. Halevy, D. Wang, E. Wu, and Y. Zhang (2008).WebTables: exploring the power of tables on the web.In: P V L D B 1.1 P V L D B 1.1 PVLDB 1.1P V L D B 1.1 ,第 538-549 页。
Cai, Q. and A. Yates (2013). Large-scale semantic parsing via schema matching and lexicon extension. In: A C L A C L ACLA C L, pp. 423-433.
Cai, Q. and A. Yates (2013).通过模式匹配和词典扩展进行大规模语义解析。In: A C L A C L ACLA C L ,第 423-433 页。

Carlson, A., J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell (2010). Toward an architecture for never-ending language learning. In: A A A I A A A I AAAIA A A I, pp. 1306 1313 1306 1313 1306-13131306-1313.
Carlson, A., J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell (2010).Toward an architecture for never-ending language learning.In: A A A I A A A I AAAIA A A I , pp.

Carmel, D., M.-W. Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang (2014). ERD’14: entity recognition and disambiguation challenge. In: SIGIR, p. 1292.
Carmel, D., M.-W. Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang (2014.Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang (2014).ERD'14:实体识别与消歧挑战。In:SIGIR, p. 1292.
Castano, S., A. Ferrara, S. Montanelli, and G. Varese (2011). Ontology and instance matching. In: Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, pp. 167-195.
Castano, S., A. Ferrara, S. Montanelli, and G. Varese (2011)。本体与实例匹配。In:知识驱动的多媒体信息提取与本体演化》,第 167-195 页。
Charniak, E. (2000). A maximum-entropy-inspired parser. In: ANLP, pp. 132139 .
Charniak, E. (2000).最大熵启发解析器。In:ANLP, pp.
Chen, D. and C. D. Manning (2014). A fast and accurate dependency parser using neural networks. In: A C L A C L ACLA C L, pp. 740-750.
Chen, D. and C. D. Manning (2014).使用神经网络的快速准确依赖关系解析器。In: A C L A C L ACLA C L ,第 740-750 页。

Cheng, G., W. Ge, and Y. Qu (2008). Falcons: searching and browsing entities on the semantic web. In: W W W W W W WWWW W W, pp. 1101-1102.
Cheng, G., W. Ge, and Y. Qu (2008.Qu (2008).猎鹰:在语义网上搜索和浏览实体。In: W W W W W W WWWW W W ,第 1101-1102 页。
Choi, J. D., J. R. Tetreault, and A. Stent (2015). It depends: dependency parser comparison using a web-based evaluation tool. In: A C L A C L ACLA C L, pp. 387396 .
Choi, J. D., J. R. Tetreault, and A. Stent (2015).这取决于:使用基于网络的评估工具对依赖关系解析器进行比较。In: A C L A C L ACLA C L , pp.
Cimiano, P., V. Lopez, C. Unger, E. Cabrio, A.-C. N. Ngomo, and S. Walter (2013). Multilingual question answering over linked data (QALD-3): lab overview. In: C L E F C L E F CLEFC L E F, pp. 321-332.
Cimiano, P., V. Lopez, C. Unger, E. Cabrio, A.-C.N. Ngomo 和 S. Walter (2013)。链接数据上的多语言问题解答(QALD-3):实验室概述。In: C L E F C L E F CLEFC L E F ,第 321-332 页。
Coffman, J. and A. C. Weaver (2010). A framework for evaluating database keyword search strategies. In: CIKM, pp. 729-738.
Coffman, J. and A. C. Weaver (2010).数据库关键词搜索策略评估框架。In:CIKM, pp.
Coffman, J. and A. C. Weaver (2014). An empirical performance evaluation of relational keyword search techniques. In: TKDE 26.1, pp. 30-42.
Coffman, J. and A. C. Weaver (2014)。关系关键词搜索技术的实证性能评估。In:TKDE 26.1, pp.

Cornolti, M., P. Ferragina, M. Ciaramita, H. Schütze, and S. Rüd (2014). The SMAPH system for query entity recognition and disambiguation. In: E R D E R D ERDE R D, pp. 25 30 25 30 25-3025-30.
Cornolti, M., P. Ferragina, M. Ciaramita, H. Schütze, and S. Rüd (2014)。用于查询实体识别和消歧的 SMAPH 系统。In: E R D E R D ERDE R D ,第 25 30 25 30 25-3025-30 页。
Corro, L. D. and R. Gemulla (2013). ClausIE: clause-based open information extraction. In: W W W W W W WWWW W W, pp. 355-366.
Corro, L. D. and R. Gemulla (2013).ClausIE:基于条款的开放式信息提取。In: W W W W W W WWWW W W ,第 355-366 页。

Craven, M., D. DiPasquo, D. Freitag, A. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery (1998). Learning to extract symbolic knowledge from the world wide web. In: A A A I A A A I AAAIA A A I, pp. 509-516.
Craven, M., D. DiPasquo, D. Freitag, A. McCallum, T. M. Mitchell, K. Nigam, and S. Slattery (1998).Learning to extract symbolic knowledge from the world wide web.In: A A A I A A A I AAAIA A A I ,第 509-516 页。
Cucerzan, S. (2012). The MSR system for entity linking at TAC 2012. In: TAC.
Cucerzan, S. (2012).2012 年 TAC 会议上的实体链接 MSR 系统。In:TAC.
Cucerzan, S. (2007). Large-scale named entity disambiguation based on Wikipedia data. In: E M N L P C o N L L E M N L P C o N L L EMNLP-CoNLLE M N L P-C o N L L, pp. 708-716.
Cucerzan, S. (2007).基于维基百科数据的大规模命名实体消歧。In: E M N L P C o N L L E M N L P C o N L L EMNLP-CoNLLE M N L P-C o N L L ,第 708-716 页。

Cucerzan, S. (2014). Name entities made obvious: the participation in the ERD 2014 evaluation. In: ERD, pp. 95-100.
Cucerzan, S. (2014)。名称实体显而易见:参与ERD 2014评估。In:ERD,第 95-100 页。

Dang, H. T., D. Kelly, and J. J. Lin (2007). Overview of the TREC 2007 Question Answering Track. In: TREC.
Dang, H. T., D. Kelly, and J. J. Lin (2007).TREC 2007 Question Answering Track 综述。In:TREC.

Dang, H. T., J. J. Lin, and D. Kelly (2006). Overview of the TREC 2006 Question Answering Track. In: TREC.
Dang, H. T., J. J. Lin, and D. Kelly (2006).TREC 2006 Question Answering Track 综述。In:TREC.
Delbru, R., S. Campinas, and G. Tummarello (2012). Searching web data: An entity retrieval and high-performance indexing model. In: J. Web Sem. 10, pp. 33 58 33 58 33-5833-58.
Delbru, R., S. Campinas, and G. Tummarello (2012).搜索网络数据:实体检索和高性能索引模型。In:J. Web Sem.10, pp.
Dill, S., N. Eiron, D. Gibson, D. Gruhl, R. V. Guha, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien (2003). A case for automated large-scale semantic annotation. In: J. Web Sem. 1.1, pp. 115-132.
Dill, S., N. Eiron, D. Gibson, D. Gruhl, R. V. Guha, A. Jhingran, T. Kanungo, K. S. McCurley, S. Rajagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien (2003).自动大规模语义注释案例。J. Web Sem:J. Web Sem.1.1, pp.
Ding, L., T. W. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs (2004). Swoogle: a search and metadata engine for the semantic web. In: CIKM, pp. 652-659.
Ding, L., T. W. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs (2004).Swoogle: a search and metadata engine for the semantic web.In:CIKM, pp.
Doan, A. and A. Y. Halevy (2005). Semantic integration research in the database community: a brief survey. In: AI Magazine, pp. 83-94.
Doan, A. and A. Y. Halevy (2005).数据库界的语义整合研究:简要调查。In. AI Magazine, pp:AI Magazine, pp.

Dong, X., E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang (2014). Knowledge Vault: a web-scale approach to probabilistic knowledge fusion. In: K D D K D D KDDK D D, pp. 601-610.
Dong, X., E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang (2014).Knowledge Vault: a web-scale approach to probabilistic knowledge fusion.In: K D D K D D KDDK D D ,第 601-610 页。

Elbassuoni, S., M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum (2009). Language-model-based ranking for queries on RDF-graphs. In: CIKM, pp. 977 986 977 986 977-986977-986.
Elbassuoni, S., M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum (2009).基于语言模型的 RDF 图查询排序。In:CIKM, pp.

Elliott, B., E. Cheng, C. Thomas-Ogbuji, and Z. M. Özsoyoglu (2009). A complete translation from SPARQL into efficient SQL. In: IDEAS, pp. 3142 .
Elliott, B., E. Cheng, C. Thomas-Ogbuji, and Z. M. Özsoyoglu (2009).从 SPARQL 到高效 SQL 的完整翻译。In. IDEAS, pp:IDEAS, pp.
Elmagarmid, A. K., P. G. Ipeirotis, and V. S. Verykios (2007). Duplicate record detection: a survey. In: T K D E T K D E TKDET K D E, pp. 1-16.
Elmagarmid, A. K., P. G. Ipeirotis, and V. S. Verykios (2007).重复记录检测:调查。In: T K D E T K D E TKDET K D E , pp.

Etzioni, O., A. Fader, J. Christensen, S. Soderland, and Mausam (2011). Open information extraction: the second generation. In: IJCAI, pp. 3-10.
Etzioni, O., A. Fader, J. Christensen, S. Soderland, and Mausam (2011).开放式信息提取:第二代。In:IJCAI, pp.
Euzenat, J., A. Ferrara, C. Meilicke, J. Pane, F. Scharffe, P. Shvaiko, H. Stuckenschmidt, O. Sváb-Zamazal, V. Svátek, and C. dos Santos (2010). Results of the ontology alignment evaluation initiative 2010. In: O M O M OMO M, pp. 85-117.
Euzenat, J., A. Ferrara, C. Meilicke, J. Pane, F. Scharffe, P. Shvaiko, H. Stuckenschmidt, O. Sváb-Zamazal, V. Svátek, and C. dos Santos (2010).本体对齐评估倡议 2010 的结果。In: O M O M OMO M ,第 85-117 页。

Euzenat, J., C. Meilicke, H. Stuckenschmidt, P. Shvaiko, and C. dos Santos (2011a). Ontology alignment evaluation initiative: six years of experience. In: J. Data Semantics 15, pp. 158-192.
Euzenat, J., C. Meilicke, H. Stuckenschmidt, P. Shvaiko, and C. dos Santos (2011a).本体对齐评估倡议:六年经验。In:J. Data Semantics 15, pp.
Euzenat, J., A. Ferrara, W. R. van Hage, L. Hollink, C. Meilicke, A. Nikolov, D. Ritze, F. Scharffe, P. Shvaiko, H. Stuckenschmidt, O. Sváb-Zamazal, and C. dos Santos (2011b). Results of the ontology alignment evaluation initiative 2011. In: O M O M OMO M, pp. 158-192.
Euzenat, J., A. Ferrara, W. R. van Hage, L. Hollink, C. Meilicke, A. Nikolov, D. Ritze, F. Scharffe, P. Shvaiko, H. Stuckenschmidt, O. Sváb-Zamazal, and C. dos Santos (2011b).2011 年本体对齐评估倡议的结果。In: O M O M OMO M ,第 158-192 页。

Fader, A., S. Soderland, and O. Etzioni (2011). Identifying relations for open information extraction. In: EMNLP, pp. 1535-1545.
Fader, A., S. Soderland, and O. Etzioni (2011)。为开放式信息提取识别关系。EMNLP, pp:EMNLP, pp.
Fader, A., L. S. Zettlemoyer, and O. Etzioni (2013). Paraphrase-driven learning for open question answering. In: A C L A C L ACLA C L, pp. 1608-1618.
Fader, A., L. S. Zettlemoyer, and O. Etzioni (2013)。开放式问题解答的转述驱动学习。In: A C L A C L ACLA C L ,第 1608-1618 页。

Fang, Y., L. Si, Z. Yu, Y. Xian, and Y. Xu (2009). Entity retrieval with hierarchical relevance model, exploiting the structure of tables and learning homepage classifiers. In: TREC.
Fang, Y., L. Si, Z. Yu, Y. Xian, and Y. Xu (2009).利用表格结构和学习主页分类器的分层相关性模型实体检索。In:TREC.
Ferrucci, D. A., E. W. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. M. Prager, N. Schlaefer, and C. A. Welty (2010). Building Watson: An Overview of the DeepQA Project. In: AI Magazine 31.3, pp. 59-79.
Ferrucci, D. A., E. W. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. M. Prager, N. Schlaefer, and C. A. Welty (2010).构建沃森:DeepQA 项目概述。In:AI Magazine 31.3, pp.

Ferrucci, D. A., A. Levas, S. Bagchi, D. Gondek, and E. T. Mueller (2013). Watson: Beyond Jeopardy! In: Artif. Intell. 199, pp. 93-105.
Ferrucci, D. A., A. Levas, S. Bagchi, D. Gondek, and E. T. Mueller (2013).Watson: Beyond Jeopardy! In:Artif.Intell.199, pp.
Finkel, J. R., T. Grenager, and C. D. Manning (2005). Incorporating nonlocal information into information extraction systems by gibbs sampling. In: A C L A C L ACLA C L, pp. 363 370 363 370 363-370363-370.
Finkel, J. R., T. Grenager, and C. D. Manning (2005).通过吉布斯采样将非本地信息纳入信息提取系统。In: A C L A C L ACLA C L ,第 363 370 363 370 363-370363-370<