The AI Stack: A blueprint for developing and deploying Artificial Intelligence
人工智慧系統：開發和部署人工智慧的藍圖

ANDREW W. MOORE, MARTIAL HEBERT, SHANE SHANEMAN
安德魯·W·摩爾、馬修·赫伯特、夏恩·沙納曼Carnegie Mellon University, School of Computer Science
卡內基梅隆大學，計算機科學學院5000 Forbes Ave, Pittsburgh PA 15213
5000 福布斯大道,匹茲堡,PA 15213

Abstract 摘要

This paper provides an abstract technology model called the AI Stack for the development and deployment of Artificial Intelligence, and the strategic investment in research, technology, and organizational resources required to achieve asymmetric capability. Over the past five years, there has been a drastic acceleration in the development of artificial intelligence fueled by exponential increases in computational power and machine learning. This has resulted in corporations, institutions, and nation-states vastly accelerating their investment in AI to (a) perceive and synthesize massive amounts of data, (b) understand the contextual importance of the data and potential tactical/strategic impacts, © accelerate and optimize decision-making, and (d) enable human augmentation and deploy autonomous systems. From a national security and defense perspective, AI is a crucial technology to enhance situational awareness and accelerate the realization of timely and actionable intelligence that can save lives. For many current defense applications, this often requires the processing of visual data, images, or full motion video from legacy platforms and sensors designed decades before recent advances in machine learning, computer vision, and AI. The AI Stack - and the fusion of the interdependent technology layers contained within it - provides a streamlined approach to visualize, plan, and prioritize strategic investments in commercial technologies and transformational research to leverage and continuously advance AI across operational domains, and achieve asymmetric capability through human augmentation and autonomous systems. One application of AI for the Department of Defense is to provide automation and human augmentation for analyzing full motion video to drastically enhance the safety of our deployed soldiers by enhancing their situational awareness and enabling them to make faster decisions on more timely information to save lives.
本文提供了一個名為「AI 堆棧」的抽象技術模型,用於人工智能的開發和部署,以及實現不對稱能力所需的研究、技術和組織資源的戰略投資。在過去五年里,由於計算能力和機器學習的指數級增長,人工智能的發展出現了急劇加速。這導致了企業、機構和國家大幅加速了對人工智能的投資,以(a)感知和綜合大量數據、(b)理解數據的上下文重要性及潛在的戰術/戰略影響、(c)加速和優化決策制定,以及(d)實現人機協作和部署自主系統。從國家安全和國防的角度來看,人工智能是增強形勢感知和加快實現及時有效情報的關鍵技術,可以拯救生命。對於許多現有的國防應用,這通常需要處理來自數十年前設計的傳統平台和傳感器的視覺數據、圖像或全動態視頻。「AI 堆棧」及其內部相互依賴的技術層融合,提供了一種簡化的方法來可視化、規劃和優先考慮對商業技術和變革性研究的戰略投資,以便跨操作領域利用和不斷推進人工智能,並通過人機協作和自主系統實現不對稱能力。國防部門人工智能的一個應用是為全動態視頻分析提供自動化和人機協作,大幅提升我們部署的士兵的安全性,增強他們的形勢感知,並使他們能夠根據更及時的信息做出更快的決策以拯救生命。

Keywords: Artificial Intelligence, Human Augmentation, Autonomous Systems, Computer Vision, Machine Learning, Human Machine Teaming
人工智能、人類增強、自主系統、電腦視覺、機器學習、人機協作

1. INTRODUCTION 一、前言

The pace of technological innovation and advances in computational power and machine learning is at an unprecedented level - and it continues to accelerate each year. The combination of availability and access to large data sets through cloud computing and high performance computing has enabled a rapid transformation in machine learning, computer vision, and deep neural networks. Over the past five years, this has enabled breakthroughs in speech recognition by reducing error rates by

30 %

[1, 2, 3], improving image recognition by over

20 %

[4, 5, 6], and enabling an Artificial Intelligence to consistently beat the reigning human champions of the board game Go from around the world [7]. Over the past decade, GPU-based compute has propelled the performance of high performance computing systems, machine learning and data science in large-scale cloud installations [8, 9, 10]. In 2015, Google deployed a custom ASIC-called a Tensor Processing Unit (TPU) - in datacenters that drastically accelerates the inference phase of deep neural networks and enhances computational power even further [1].
科技創新的步伐以及計算能力和機器學習的進步已達到前所未有的水平,且每年持續加速。透過雲端運算和高效能運算取得大量資料集的可用性和存取權,已使機器學習、電腦視覺和深度神經網路經歷了快速的轉變。在過去 5 年內,這使得語音辨識的錯誤率降低了

30 %

%[1, 2, 3]、影像辨識的准確度提升了

20 %

%[4, 5, 6],並使人工智慧能夠一直擊敗來自全球的圍棋人類冠軍[7]。在過去十年間,基於 GPU 的運算已推動了大規模雲端部署中高性能運算系統、機器學習和數據科學的性能[8, 9, 10]。2015 年,Google 在數據中心部署了一款名為張量處理單元(TPU)的定制 ASIC,大幅加速了深度神經網路的推理階段,進一步提升了計算能力[1]。

By leveraging parallelization and distributed computing, machine learning can now be applied to massive models (up to 100s of billions of parameters) on big data (up to terabytes or petabytes) on modestly sized compute clusters that less than 4 years ago would have been unachievable [11]. The impact of parallelization on neural networks and deep learning has been profound - enabling what used to take weeks or months of processing time to be completed in minutes, and at a fraction of the cost [12]. These innovations in computing capability have fueled tremendous advances in Artificial Intelligence and accelerated investment in AI research and development by industry, academia, and nation-state governments around the world [13].
藉由利用並行計算和分散式運算,機器學習現已可應用於巨大的模型(高達數百億的參數)以及大量數據(高達數千兆位元組或數百兆位元組)之上,而這在不到 4 年前是無法實現的[11]。並行計算對神經網路和深度學習的影響是深遠的,它使原本需要數週或數月的處理時間,如今只需數分鐘即可完成,且成本大幅降低[12]。這些計算能力的創新推動了人工智慧的巨大進步,並加速了產業、學術界以及各國政府在人工智慧研發方面的投資[13]。

2. INTRODUCING THE AI STACK $^{1}$
2. 介紹 AI 堆疊

Artificial Intelligence has been defined many different ways by many different people. The term “artificial intelligence” was first devised as part of a proposal in 1955 for a " 2 month, 10 man study of artificial intelligence" submitted by John McCarthy (Dartmouth College), Marvin Minsky (Harvard University), Nathaniel Rochester (IBM), and Claude Shannon (Bell Telephone Laboratories) [14]. The workshop, which took place a year later, in July and August 1956, is generally recognized as the official birthdate of AI [15]. The proposal defined Artificial Intelligence through the authors intent for the study, “An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. … For the present purpose the artificial intelligence problem is taken to be that of making a machine behave in ways that would be called intelligent if a human were so behaving.” In 1958, Herb Simon (Carnegie Mellon) expounded, “It is not my aim to surprise or shock you - but the simplest way I can summarize is to say that there are now in the world machines that think, that learn and that create. Moreover, their ability to do these things is going to increase rapidly until - in a visible future - the range of problems they can handle will be coextensive with the range to which the human mind has been applied” [16]
人工智能已被許多不同的人定義為許多不同的方式。 "人工智能"這一術語是由 John McCarthy (達特茅斯學院)、Marvin Minsky (哈佛大學)、Nathaniel Rochester (IBM)和 Claude Shannon (貝爾電話實驗室)在 1955 年提交的一項"為期 2 個月、10 人研究人工智能"的提案中首次設計的[14]。這個工作坊於 1956 年 7 月和 8 月舉行,通常被認為是人工智能的正式誕生日[15]。該提案通過作者對研究的意圖來定義人工智能,"將試圖找出如何使機器使用語言、形成抽象和概念、解決目前僅供人類處理的各種問題,並提高自己。 ... 就目前目的而言,人工智能問題被認為是使機器表現出如果人類如此行為會被稱為智能的方式。"1958 年,赫伯特·西蒙(卡內基梅隆大學)闡述說:"我的目的不是讓您感到驚訝或震驚 - 但我概括起來最簡單的方式是說,現在世界上有能思考、學習和創造的機器。而且,它們執行這些事情的能力將迅速提高,直到 - 在可見的未來 - 它們能夠處理的問題範圍將與人類頭腦已經應用的範圍一樣廣泛"[16]。

Even with all of the complex advances and technological innovations over the past 60 years, defining artificial intelligence is fairly simple: AI must understand the world, and it must make smart decisions based on what its understanding is by leveraging what it has learned [17]. This definition can apply to AI being leveraged for human augmentation or AI being integrated into an autonomous system. For an AI system to be truly useful, it must do more than passively observe or collect data; it must either make decisions or help humans make decisions [18]. While this is a straightforward definition, AI isn’t just one thing or a single piece of software; it is a massive collection of interrelated technology blocks called the AI Stack. The AI Stack provides a model to visualize and organize all of the technologies that comprise an AI system - and how those technologies fit and work together.
即使在過去 60 年中經歷了許多複雜的進步和技術創新，定義人工智能仍相當簡單：AI 必須理解世界,並且必須根據其獲得的理解做出明智的決策[17]。此定義可適用於人類增強的人工智能,或集成到自主系統中的人工智能。要使人工智能系統真正有用,它必須做的不只是被動地觀察或收集數據;它必須做出決策或幫助人類做出決策[18]。儘管這是一個簡單的定義,但人工智能並非單一的事物或單一的軟體;它是一個被稱為 AI 棧的相互關聯的技術塊的大集合。AI 棧提供了一個模型,可以將組成 AI 系統的所有技術可視化和組織起來,以及這些技術如何銜接和協作。

Figure 1. The AI Stack as envisioned and defined by Andrew Moore, Dean of the School of Computer Science, Carnegie Mellon University.
圖 1. 由卡內基梅隆大學計算機科學學院院長 Andrew Moore 構想和定義的人工智慧技術棧。

The AI Stack is similar in concept to the Open Systems Interconnection model (OSI model) established by ISO in 1984 [19]. The OSI Model established an abstract model of networking where each of the seven abstraction layers served the layer above it, and is served by the layer below it [20]. Similarly, the AI Stack establishes a model of an AI system wherein each layer uses results from the layers beneath it, and passes its results up to the layers above it to build for other things needed to achieve AI [18]. Whereas the OSI model was developed primarily focused on interfaces and protocols [19], the AI Stack focuses heavily on the technologies and resulting capabilities in each layer that contribute and work together to make AI possible.
人工智慧棧與 1984 年由 ISO 建立的開放系統互連參考模型(OSI 模型)概念類似[19]。OSI 模型建立了網路的抽象模型,其中每個七個抽象層都為上層服務,同時也接受下層的服務[20]。同樣地,人工智慧棧建立了一個人工智慧系統的模型,其中每一層都利用下層的成果,並將其結果傳遞給上層,以構建實現人工智慧所需的其他東西[18]。而 OSI 模型主要關注接口和協議[19],人工智慧棧則大量關注每一層的技術和所產生的能力,它們共同作用以實現人工智慧。

3. BUILDING THE AI STACK
3. 搭建人工智慧架構

The AI Stack is made up of several technology layers that are interdependent upon one another, and synergistically contribute to AI. At Carnegie Mellon, we view it as a toolbox - each layer houses a set of technologies that scientists and researchers can reach for as they work on new initiatives. Expertise is not required in all areas. Instead, each layer depends on the others for support. AI systems or endeavors that ignore parts of the stack simply won’t succeed and synergistically contribute to AI:
人工智能技術棧由多個相互依賴且協同為人工智能做出貢獻的技術層組成。在卡內基梅隆大學,我們將其視為一個工具箱 - 每一層都包含有科學家和研究人員在新計劃中可以使用的一組技術。並非需要在所有領域都具備專業知識。相反,每一層都依賴於其他層的支持。忽視技術棧部分的人工智能系統或嘗試將無法成功,也無法協同為人工智能做出貢獻。

Computing Layer: All artificial intelligence is built on the computer systems that came before it. This includes the systems, networks, programming languages, operating systems and interactions between devices that make computing possible.
計算層：所有人工智慧都建基於先前的電腦系統之上。這包括系統、網路、程式設計語言、作業系統以及使設計運算成為可能的設備之間的互動。
Device Layer: The device layer is all of the sensors and components needed for machines to perceive the world around them. Traffic lights, for example, can observe traffic levels and negotiate with each other to improve traffic flow. Facial recognition systems can detect and match a contact from 600 meters away.
裝置層：裝置層包含了機器感知周圍環境所需的所有感測器和元件。舉例來說，交通信號燈可以觀察交通量並彼此協調以改善車流。人臉辨識系統可以從 600 公尺外偵測和比對聯絡人。
Massive Data Management Layer: There’s so much data in the world, and it continues to grow at an explosive rate [21]. At this layer, experts work to ensure that good information is accessible, and they develop ways to use that data to locate valuable information in giant datasets. Some exciting work at this level includes CMU spinoff Petuum, which created data management software that aims to democratize AI and put practical AI tools into the hands of organizations that need them.
海量資料管理層:世界上有如此之多的資料,而且持續以驚人的速度增長[21]。在這一層中,專家努力確保可以訪問優質信息,並開發利用該數據找到巨型數據集中寶貴信息的方法。在這一層面上的一些令人興奮的工作包括來自 CMU 的 Petuum 公司,該公司創造了數據管理軟件,旨在使 AI 民主化,並將實用的 AI 工具置於需要它們的組織手中。
Machine Learning Layer: Machine learning focuses on creating programs that learn from experience. It advances computing through exposure to new scenarios, testing and adaptation, while using pattern- and trenddetection to help the computer make better decisions in similar, subsequent situations. A relevant example of work in machine learning is using speech recognition technologies to identify the age, sex and location of the hoax callers that plague the U.S. Coast Guard.
機器學習層: 機器學習專注於創造能夠從經驗中學習的程式。它透過接觸新的情境、測試和適應,而且利用模式和趨勢偵測來幫助電腦在類似的後續情況下做出更好的決策,從而推動了電腦科學的進步。機器學習的一個相關案例是使用語音辨識技術來識別困擾美國海岸警衛隊的惡作劇電話的年齡、性別和位置。
Modeling Layer: AI systems at the top of the stack rely on computer modeling to understand information. Models use computers to construct and manipulate abstract representations of situations and natural phenomenon in the world. For example, new research has allowed scientist to analyze photos of people to track their facial features and recognize their emotional states.
模型層：AI 系統位於堆棧的頂層,依賴計算機建模來瞭解信息。模型使用計算機構建和操縱對現實世界中的事物和自然現象的抽象表示。例如,新研究使科學家能夠分析人們的照片,追蹤他們的面部特徵並識別他們的情感狀態。
Decision Support Layer: This layer includes technologies that help humans make decisions. Where should 500 Lyft drivers be deployed, based on information we know about events and demand? How should emergency services be distributed after a disaster? Exciting examples of research in this area include work to identify instances of human trafficking, help locate victims, and collect and synthesize enough information that trends and patterns can be discovered and used to combat the problem.
決策支持層：這一層包括有助於人類做出決策的技術。根據我們所知的事件和需求信息，500 輛 Lyft 車應該如何部署？災難後應如何分配緊急服務？這個領域的研究包括工作來識別人口販賣的實例、幫助找到受害者,並收集和合成足夠的信息,以發現和利用趨勢和模式來打擊這一問題。
Planning and Acting Layer: Systems in this part of the stack rely on optimization, safety, the knowledge network and strategic reasoning to make the best possible decision available and learn from the information researchers give them. Though slightly less sophisticated than systems employing the blocks at the very top of the stack, planning and acting technologies still rely on advanced systems and algorithms to positively impact the world. One great example of technology that falls into this category is the national kidney exchange - a sophisticated algorithm that matches potential kidney donors with people who need transplants.
規劃和行動層:此堆棧部分的系統依賴於優化、安全性、知識網絡和戰略推理,以做出可用的最佳決策,並從研究人員提供的信息中學習。雖然不如最頂層系統那樣複雜,但規劃和行動技術仍依賴先進的系統和算法來對世界產生積極影響。此類技術的一個很好的例子是全國腎臟交換計劃 - 一種複雜的算法,可將潛在的腎臟捐獻者與需要移植手術的人相匹配。
Human-AI Interaction Layer: When we create artificial intelligence in this part of the stack, we’re augmenting what humans can do. These technologies make our lives easier or allow us to make faster and better decisions. One good example of work in this area is robotic arms attached to motorized wheelchairs that people with spinal cord injuries can direct with their gaze. Another exciting example is research that allows a computer system to interview a patient remotely to determine if they’re depressed and need human intervention.
人工智慧與人類互動層：當我們在堆疊的這個部分中創造人工智慧時，我們正在增強人類的能力。這些技術使我們的生活更輕鬆，或允許我們做出更快和更好的決策。這個領域的一個好例子是連接到電動輪椅上的機械手臂，可以讓脊髓受傷的人用目光來操控。另一個令人興奮的例子是研究允許電腦系統遠程接見病人，以判斷他們是否有抑鬱症需要人工干預。
Autonomy Layer: AI technologies at this level focus on creating systems that make their own decisions without human intervention. These systems solve problems when humans cannot. For example, robots can search through rubble for disaster survivors, and sensors in self-driving cars can respond more quickly to impending accidents than human drivers.
自主層：此級別的人工智慧技術專注於創造自行作出決策、無需人類干預的系統。這些系統可解決人類無法處理的問題。例如機器人可在災難現場的瓦礫中搜索幸存者,自駕車的感應器可比人類司機更快對即將發生的事故做出反應。

Ethics: Ethics permeates the entire AI Stack. The decisions people make as they build AI systems involve serious ethical questions that we can’t ignore. A vital component of AI is giving tomorrow’s scientists the tools they need to perform ethical reasoning and the skills to create AI for good.
倫理學:倫理學滲透於人工智能整個技術棧。人們在構建人工智能系統時所做的決定涉及嚴重的倫理問題,我們不能忽視。人工智能的重要組成部分是為未來的科學家提供工具,讓他們能夠進行倫理推理,並創造有益的人工智能。

The AI Stack provides a means to visualize and evaluate the impact of technological deficiencies and advances as it relates to the development of AI. Deficiencies in one or multiple layer(s) of the AI Stack can be offset or compensated for by strengths or technological advances in other layer(s). For example, many biometrics systems have traditionally relied upon high-end or proprietary sensors, cameras, and enhanced lighting (aka the Device Layer) to achieve improvements in facial detection and matching performance [22]. By leveraging enhanced leading research in computer vision algorithms, deep learning, and GPU compute (aka Compute Layer and Machine Learning Layer), Biometrics systems today can exponentially increase their performance using much less expensive, commercial off-the-shelf cameras and sensors [23].
人工智能堆棧提供了一種可視化和評估技術缺陷和進步對人工智能發展的影響的方法。人工智能堆棧的一個或多個層面的缺陷可以通過其他層面的優勢或技術進步來彌補或補償。例如,許多生物特徵識別系統傳統上依賴於高端或專有的傳感器、攝像頭和增強照明(即設備層)來實現面部檢測和匹配性能的改進。通過利用對計算機視覺算法、深度學習和 GPU 計算(即計算層和機器學習層)的先進研究成果,生物特徵識別系統今天可以使用更加便宜的商用現成攝像頭和傳感器,大幅提升其性能。

The AI Stack also provides a structured approach to analyze, organize, and strategically plan for the development and integration of AI for new or existing systems, programs, and even organizations. The structure of the AI Stack can easily be leveraged as a common blueprint and lexicon in any agency or organization to (a) analyze current technological capabilities in each layer and its impact on the layers above and below it, (b) plan for development and investment in each layer over time to achieve AI, © assess which layers over time have (or will have) technical debt that the agency or organization will have to ‘pay’ in order to enable AI, and (d) plan and prioritize investment in research and acquisition of new technologies for key layers and capabilities across the AI Stack. As part of the acquisition of technology, organizations and agencies can use the AI Stack to assess which specific layers the technology addresses or fits into, and how it will enhance or impact the layers above and below it.
AI 堆棧還提供了一種結構化的方法來分析、組織和戰略規劃新或現有系統、計畫和組織的 AI 開發和集成。AI 堆棧的結構可以很容易地作為任何機構或組織的通用藍圖和詞彙被利用來(a)分析每一層的當前技術能力及其對上下層的影響;(b)隨時間規劃每一層的開發和投資以實現 AI;(c)評估隨時間產生的、組織必須'支付'的技術負債的各層;以及(d)規劃和優先考慮在 AI 堆棧的關鍵層和能力上的研究和新技術的獲取。作為技術獲取的一部分,組織和機構可以使用 AI 堆棧來評估技術解決或符合哪些特定層,以及它將如何增強或影響上下層。

Finally, the AI Stack can also be leveraged for organizational development and workforce training and acquisition. By assessing where an organization’s current and planned technology sophistication level is on the various layers across the AI Stack, they can also assess where they need to add to or augment their workforce and with what specific skills or technical expertise.
最後,AI 棧也可用於組織發展、工作培訓和招聘。通過評估組織目前和計劃的技術成熟度在各層面的 AI 棧上的狀況,他們也可以評估需要添加或增強哪些具體技能或技術專長的人員。

4. AI AND NATIONAL SECURITY
4.人工智慧與國家安全

Artificial Intelligence will have a transformational impact on every aspect of our lives - from education and healthcare, to energy and finance, to engineering and manufacturing, to public safety and defense [24]. Nation-states around the world recognize the disruptive impact that AI will have on their economies and national security, and have announced plans to invest billions of dollars in new research and development and technology transfer programs to accelerate the benefits that AI and to achieve a competitive advantage on the global stage - both economically and militarily [25]. In the near-term, AI will most significantly impact defense and security through human augmentation in key domains such as intelligence and cyber operations where the data being generated on a daily basis exponentially exceeds the human analytical workforce available to process and analyze it. This is also what most current defense-related R&D efforts as well as the first applications of AI in the military realm are focused on to enhance situational awareness and increase the speed of decision-making [26].
人工智能將對我們生活的每個方面產生變革性的影響-從教育和醫療保健,到能源和金融,到工程和製造,到公共安全和國防[24]。世界各國都認識到人工智能將對其經濟和國家安全產生 disrupting 影響,並宣布投入數十億美元用於新的研究和開發以及技術轉移計劃,以加速人工智能帶來的效益,並在全球舞台上獲得經濟和軍事上的競爭優勢[25]。在短期內,人工智能將通過增強情報和網絡作戰等關鍵領域中的人力,最大程度地影響國防和安全,因為每天產生的數據呈指數級增長,遠超人類分析人員的能力。這也是當前大多數國防相關的研究和開發工作以及軍事領域人工智能的首批應用重點,旨在增強情況感知能力並提高決策速度[26]。

With the increased utilization and sensorization of unmanned aerial vehicles (UAVs) for intelligence, surveillance and reconnaissance operations, defense agencies are drowning in data - especially full motion video (FMV) that requires human analysts to manually review and annotate in order to process, assimilate, and disseminate contextual intelligence contained within it [37]. As a result of the current manpower-intensive process required to process, exploit, and disseminate (PED) intelligence from the FMV data, it could take days, weeks, or even months to produce potential intelligence - that is no longer timely or actionable [38]. The ability to leverage AI to augment human analysts, automate the analysis of FMV data, and to drastically accelerate the speed of the PED process is a critical focus for national security.
隨著無人駕駛航空器(UAV)在情報、監視和偵察行動中的使用和傳感器增加,國防機構正淹沒在數據中 - 特別是需要人類分析師手動檢查和註解以處理、吸收和傳播其中包含的情報內容的全動態視頻(FMV)。由於處理、利用和傳播(PED)來自 FMV 數據的情報所需的人力密集型流程,可能需要數天、數周甚至數月才能產生潛在的情報 - 這已經不再及時或可採取行動。利用人工智能來輔助人類分析師、自動分析 FMV 數據,並大大加快 PED 流程的速度,是國家安全的關鍵重點。

5. CONCEPT OF OPERATIONS USE CASE: LEVERAGING THE AI STACK TO ENHANCE SITUATIONAL AWARENESS FROM FULL MOTION VIDEO
利用 AI 技術提升全動態影像的情況意識

An important scenario for national security is the analysis of airborne full motion video to enhance situational awareness, replacing many of the analysis functions that are currently performed manually. This scenario encapsulate many of the components of the AI Stack. At the lowest level it requires basic perception capabilities in the form of object detectors, for which efficient data acquisition tools need to be developed for their training. Next, it requires new learning tools to enable practical systems that can be trained with minimal supervision. At the “autonomy” level, the video interpretation scenario demands systematic strategies for automatically assessing performance in the context of mission goals. Finally, at the highest level, beyond recognition and scene understanding, i.e., labeling and tracking objects, it requires understanding of activities, including predictive models. In the remainder of this section we describe key development challenges in these four areas, focusing on those that are most critical to bring video analysis tools the levels of performance required in application scenarios.
對國家安全來說一個重要的情境,是利用空中全動態視頻分析來增強局勢意識,取代目前手動執行的許多分析功能。這一情境包含人工智能系統的多個組件。在最底層需要物體偵測等基本感知能力,因此需要開發用於訓練的高效數據獲取工具。接下來需要新的學習工具,使能夠以最少監督就能實際操作的系統。在"自主性"層面上,視頻解釋情境需要有系統的策略,自動評估在任務目標背景下的性能。最後,在最高層面上,除了識別和理解場景,即標註和跟蹤物體,還需要理解活動,包括預測模型。在本節的其餘部分,我們將描述這四個領域的關鍵開發挑戰,重點關注使視頻分析工具達到應用情境所需性能水平最關鍵的那些挑戰。

5.1 Training object detectors
5.1 物件偵測器的訓練

Acquiring labeled data is central to the success of current technology for training object detectors. This is a time consuming, error prone process which must be optimized for systems to be practically used, especially for rapid retraining and adaptation. A first direction is to better use metadata, in particular geometry. Information about viewing angles, altitude and even approximate 3D terrain can be used to quickly guide the annotation. A second direction is to use temporal information by exploiting the fact that objects move smoothly from frame to frame. The challenge in temporal approaches is to properly tradeoff between reducing the number of frames manually handled and avoiding drift in tracking in order to maintain localization accuracy. These approaches provide some reduction in the amount of labeling, but they do not optimize the labels to be as efficient as possible for training. This can be achieved through active learning techniques in which, essentially, the utility of new samples is predicted based on the current performance of the model, e.g., an object detector.
獲取標記數據對於訓練物體檢測器的成功至關重要。這是一個耗時且容易出錯的過程,必須優化該過程,才能真正應用於系統,尤其是對於快速重新訓練和適應。一種方向是更好地利用元數據,特別是幾何信息。可以使用關於視角、高度甚至大約 3D 地形的信息,快速指導註釋。第二種方向是利用時間信息,利用物體從一幀到另一幀平滑移動的事實。時間方法的挑戰在於在減少手動處理的幀數和避免跟蹤漂移之間達到適當的平衡,以維持定位精度。這些方法可以減少標記的數量,但它們並沒有優化標記以使其對於訓練盡可能有效。可以通過主動學習技術來實現這一目標,即基於模型當前的性能(如物體檢測器)來預測新樣本的效用。

5.2 Minimizing supervision
5.2 減少監督

While considerable advances have been achieved in learning-based techniques for recognition and scene understanding, they depend critically on supervision in the form of large amount of annotated data. Acquiring such data is difficult, time consuming, and simply does not scale as more and more concepts, e.g., types of objects or object categories are incorporated in the use cases. Further, even if the resources in manpower and time could be scaled, a large number of rare concepts would still suffer from the sparsity of training data. Finally, once would anticipate that the analysis systems would be deployed in different conditions, e.g., different geolocations, observation conditions, sensors, etc. Even considering only common concepts, it is unpractical to re-label large data sets every time the system needs to be adapted to new conditions. All of these limitations point to the need to minimizing direct supervision in training but requiring a smaller number of labeled samples to train a system for new concepts, or to retrain to adapt to new conditions, that is critical to operational feasibility. The research community has investigated many aspects of this challenge, which we now briefly describe.
儘管基於學習的技術在物品識別和場景理解方面取得了重大進展,但它們仍然嚴重依賴於大量標註數據的監督。獲取這種數據是困難和耗時的,隨著更多概念(如物體類型或物品類別)被納入使用案例,這個問題也無法擴展。此外,即使人力和時間資源可以擴展,稀有概念仍然會受到訓練數據稀缺的影響。最後,分析系統通常需要部署在不同的條件下,如不同的地理位置、觀測條件和傳感器等。即使只考慮常見的概念,每次系統需要適應新條件時重新標註大量數據集也是不切實際的。所有這些限制都指向了在訓練時減少直接監督的需求,同時只需要少量標註樣本就能訓練一個新概念的系統,或者重新訓練以適應新條件,這對於實際操作是至關重要的。研究界已經研究了這一挑戰的許多方面,現在我們將簡要描述一下。

A natural approach is the use synthetic data instead of, or in addition to real data [29, 30, 31, 32]. This is an attractive approach since it enables, in principle, generating arbitrary amounts of data and, importantly, it allows generation of rare concepts that would be hard to observe in real data. The challenge here is to ensure the synthetic data indeed matches the real data used when the system is used, both in the raw data, e.g., realistic videos, and on the its distribution to avoid training bias. Accordingly, current focus is on maximizing the level of matching and on developing transfer approaches to transfer the models learned on synthetic data to real data.
使用合成數據而非或者輔以真實數據[29, 30, 31, 32]是一種自然的方法。這種方法很有吸引力,因為它可以原則上生成任意數量的數據,並且更重要的是,它可以生成在真實數據中很難觀察到的稀有概念。這裡的挑戰是確保合成數據確實與系統使用時所用的真實數據匹配,無論是原始數據(例如,逼真的視頻)還是其分佈,以避免訓練偏差。因此,當前的重點是最大程度地提高匹配度,並開發將在合成數據上學習的模型轉移到真實數據上的方法。

A complementary set of approaches is based on the observation that it is often possible to generate annotation signals for free, i.e., which do not require direct manual supervision [33, 34, 35, 36, 37, 38]. For example, metadata is readily available from the video sources, annotations at the frame or at the video segment level may be available, textual descriptions for other intelligence sources may be available as well. Each of these signals is much weaker than the type of annotation that is normally used for training, e.g., object outline in each frame, but, together, they provide powerful “free” information at training. Effective use of this type of weakly annotated training is crucial not only to reduce supervision, but also to ensure that all of the available information is used in the system.
基於觀察的一種互補的方法是能夠免費產生註解訊號，即不需要直接的人工監督[33, 34, 35, 36, 37, 38]。例如,視頻源的元數據是容易獲取的,在每幀或視頻片段級別的註解也可能可用,並且其他情報來源的文字描述也可能可用。這些訊號比通常用於訓練的註解類型(如每幀的物體輪廓)要弱得多,但是它們一起提供了強大的"免費"信息,用於訓練。有效利用這種弱註解訓練,不僅可以減少監督,還可以確保系統使用所有可用的信息。

Domain adaptation techniques address specifically the scenarios in which the concepts in the system remain the same but the operation conditions change [39, 40, 41]. In that setting, recognition models trained in a source domain are adapted to a target domain. Current results show that adaption can be achieved with minimal, or, in some cases no supervised data on the target domain. These techniques are critical to the operation of systems in rapidly changing domains. Finally, an increasing body work attempts to construct generic strategies for learning new models from limited data. These approaches, under headings such as meta-learning and learning to learn, assume a set of tasks, for example in this case a set of concepts, objects or actions, and a set of observations conditions, for which strongly supervised data is available [42, 43, 44, 45]. They then automatically develop a model of how the learning algorithm used in the system behaves on that corpus of tasks, enabling then to adapt it to a new task with limited supervision. For example, using these ideas, it is possible to learn how to generate for a small dataset the classification model that would have been learned from a large labeled data set

[46, 47, 48]

.
領域適應技術特別針對系統中概念保持不變但操作條件發生變化的情況[39, 40, 41]。在這種情況下,在源域中訓練的識別模型被適應到目標域。目前的結果顯示,在目標域上僅需要少量或甚至無監督數據就可以實現適應。這些技術對於在快速變化的領域中系統的運行至關重要。最後,越來越多的工作嘗試構建從有限數據學習新模型的通用策略。這些方法,如元學習和學習學習,假設存在一組任務(例如在本例中為一組概念、對象或動作)和一組觀察條件,對此強監督數據可用[42, 43, 44, 45]。然後,它們自動開發一個模型來描述系統中所使用的學習算法在該任務集合上的行為,從而使其能夠適應有限監督的新任務。例如,使用這些思想,可以學習如何從小數據集生成將從大型標記數據集學習到的分類模型

[46, 47, 48]

。

5.3 Evaluating performance
5.3 評估績效

Object detection is the key building block of the video analysis systems. Much progress has been made in learning-based techniques for object detection, owing in part to extensive testing and evaluation on common datasets. A key issue is now to bring these techniques to the level of performance that is necessary for real world operation, in particular designing evaluation methodologies tuned to the end task, and understanding and characterizing system performance, including especially false positives.
物體偵測是視頻分析系統的關鍵基礎。基於學習的物體偵測技術已取得長足進步,這在一定程度上歸功於對常用數據集的廣泛測試和評估。當前的關鍵是將這些技術提升到足以應對實際應用需求的性能水平,特別是在設計針對最終任務的評估方法、理解和描述系統性能(尤其是假陽性)方面。

For the first part, new scoring procedures are needed to characterize system-level performance, for example, based on the relative importance of different false positive false negative rates (and/or ranking rates or confusion matrix rates) according to specific mission scenarios, rather than a fixed loss function. New scoring procedures are also needed to evaluate the stability of a classification system as to how robust it is and how good at falling back when the distribution of data in the world changes (e.g. testing what happens to the algorithms when they are all trained on morning imagery but are tested on dusk imagery. This is essential for the system to be operational in conditions different than the ones under which it was originally trained. In addition, the individual vision elements, detection, tracking, etc. are used in decision making tasks. This requires new approaches for the design vision modules based on objectives tuned to the target task, beyond the standard detection tools.
對於第一部分,需要新的評分程序來表徵系統層面的性能,例如,根據特定任務場景下不同假陽性假陰性率（及／或排名率或混淆矩陣率）的相對重要性,而非固定的損失函數。還需要新的評分程序來評估分類系統的穩定性,即它的強韌性如何,以及當世界數據分佈發生變化時的後備能力如何（例如,測試它們在早晨影像上訓練,但在黃昏影像上進行測試時會發生什麼情況）。這對於該系統在與其最初訓練條件不同的條件下運行至關重要。此外,個別視覺元素、檢測、跟蹤等都用於決策任務。這需要根據針對目標任務調整的目標來設計視覺模塊的新方法,超越標準的檢測工具。

For the second part, formal tools for failure analysis need to the developed to automatically generate the attributes associated with each failure mode - Hoeim’s work is a good example of this [49]. New approaches for introspection of vision system, which generate a quality score of the input relative to a learned detection model, are essential in order to predict conditions under which the system performs poorly [50]. These approaches have been used in robotics, in assessing the performance of vision models for autonomous systems and for selecting across different algorithms for autonomous control based on the input data [51].
對於第二部分,需要開發正式的失敗分析工具,以自動生成與每種失敗模式相關的屬性 - Hoeim 的工作就是一個很好的例子 [49]。對視覺系統內省的新方法,可以對輸入相對於學習的檢測模型生成質量分數,這對於預測系統在哪些條件下表現不佳是至關重要的 [50]。這些方法已經在機器人領域、評估自主系統視覺模型的性能,以及基於輸入資料選擇不同自主控制算法中使用 [51]。

5.4 Understanding activities
了解活動

Beyond detecting and tracking objects, a higher level understanding of video input involve modeling of behavior and interactions. The challenge here is to model a vocabulary of actions that is most relevant to the end task, and to train models for this vocabulary, implying even more complex scalability and data acquisition issues. In particular, this requires new tools for labeling that are tuned to detecting human activities.
除了檢測和跟蹤物體之外，視頻輸入的更高級理解涉及行為和互動的建模。這裡的挑戰是要建立與最終任務最相關的動作詞彙模型,並培訓這個詞彙的模型,意味著更複雜的擴展性和數據獲取問題。特別是這需要針對檢測人類活動進行調整的新標記工具。

In addition to the basic problem of detecting individual actions or events in videos, one is faced with the problem of describing scenes at varying levels of resolutions. More precisely, while individual, discrete object concepts can be defined, there is no single temporal scale defining actions. They can be described at a coarse level, e.g., “a car is being loaded with luggage”, or a fine-grained level, e.g., “a man is lifting a bag from the ground”. This adds a level of complexity in requiring hierarchical representations of actions and correspondingly structured training.
除了檢測影片中個別動作或事件的基本問題外，還面臨著以不同解析度描述場景的問題。更確切地說，雖然個別的離散物體概念可以定義，但沒有單一的時間尺度來定義動作。它們可以被粗略地描述，例如「一輛汽車正在裝載行李」，或者被細緻地描述，例如「一個男人正在從地上提起一個袋子」。這增加了需要對動作進行層次性表示以及相應的結構化訓練的複雜度。

Finally, while static descriptions of current actions is immensely valuable, the real payoff comes through predictive models in which the system is able to generate explicit predictions of future actions and activities. Such predictive models would enable far greater robustness, e.g., by allowing tracking through obscuring occlusions, they would reduce the overall amount of computation by generating prediction of attention regions on which to focus computation in space and time, and they would provide additional information to the analyst. Powerful techniques have been develop to work with such predictive models, in particular based on inverse reinforcement learning, and their migration to use cases will eventually have a significant impact [52].
最後,儘管當前行動的靜態描述非常有價值,但真正的回報來自於預測模型,系統能夠產生未來行動和活動的明確預測。這種預測模型將使系統更加穩健,例如通過允許跟蹤遮擋的情況,從而減少整體計算量,生成注意力區域的預測並集中計算在時空上,並為分析師提供額外的信息。已經開發了強大的技術來處理這種預測模型,特別是基於逆強化學習,它們將逐步應用於實際案例中,並將產生重大影響。

6. CONCLUSION 結論

Even with the tremendous advances in networking, computational power, and machine learning that we have seen over the past decade, technology and innovations are not slowing down - they are accelerating [53]. This is fueling a tremendous increase in the focus and development of Artificial Intelligence. The AI Stack provides an abstract model to visualize and organize all of the technologies that comprise an AI system - and how those technologies fit together. The AI Stack - and the fusion of the interdependent technology layers contained within it - provides a streamlined approach to analyze, plan, and prioritize strategic investments in commercial technologies and transformational research to leverage and continuously advance AI in any operational domain.
即便在過去十年中,我們見證了網路、計算能力和機器學習方面的巨大進步,技術和創新也絕未放緩 - 反而正不斷加速[53]。這推動了人工智慧的焦點與發展驚人地增加。人工智慧架構提供了一個抽象模型,可以將構成人工智慧系統的所有技術可視化和組織起來,以及這些技術如何融合在一起。人工智慧架構 - 以及其中包含的相互依賴的技術層融合 - 為分析、規劃和優先考慮在商業技術和轉型性研究方面的策略性投資提供了一種簡化的方法,以利用和不斷推進任何運營領域的人工智慧。

7. REFERENCES 7. 參考文獻

Jouppi, N. P., Young, C. and Patil, N., “In-Datacenter Performance Analysis of a Tensor Processing Unit $^{TM}$ ,” Proc. IEEE International Symposium on Computer Architecture, (2017).
朱皮、楊誠及帕蒂爾,「資料中心張量處理器的效能分析」,IEEE 國際電腦架構研討會論文集,(2017)。
Lavin, A. and Gray, S., “Fast Algorithms for Convolutional Neural Networks,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, 4013-4021. (2016).
拉文, A. and 格雷, S., "卷積神經網絡的快速算法," 電機與電子工程師學會計算機視覺與模式識別學術會議論文集, 4013-4021. (2016).
Dean, J. “Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems,” ACM Webinar, (2017).
傑,J."使用 TensorFlow 進行大規模深度學習以建立智能系統,"ACM 網路研討會,(2017)。
Krizhevsky, A., Sutskever, I., and Hinton, G., “Imagenet classification with deep convolutional neural networks,” Proc. Neural Information Processing Systems, (2012).
克里日夫斯基、蘇茨克夫和漢頓,〈使用深度卷積神經網路進行 Imagenet 分類〉,《神經信息處理系統學術論文集》,(2012 年)。
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., “Going deeper with convolutions,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, (2015).
施金傑、劉威、賈郁、塞爾馬內特、里德、安圭洛夫、艾哈邁德、凡豪克、拉比諾維奇,「深度卷積神經網絡」,IEEE 電腦視覺與模式識別會議論文集,(2015)。
He, K., Zhang, X., Ren, S. and Sun, J., “Identity mappings in deep residual networks,” (2016)
何恺明、张学政、任赛、孙剑."深度残差网络中的恒等映射".(2016)
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D., “Mastering the game of Go with deep neural networks and tree search,” Nature. 529. 484-489. 10.1038/nature16961, (2016).
銀, D., 黃, A., 馬丁森, C., 古茲, A., 斯弗雷, L., 范登德里斯切, G., 施里特維塞, J., 安東格盧, I., 潘尼爾希爾瓦姆, V., 藍特科特, M., 迪勒曼, S., 格雷維, D., 南, J., 卡爾布倫納, N., 斯特凱維尤爾, I., 利利克拉普, T., 利奇, M., 卡武克喬盧, K., 格雷珀, T., 哈薩比斯, D., "用深度神經網絡和樹搜索掌握圍棋遊戲," 《自然》. 529. 484-489. 10.1038/nature16961, (2016).
Arunkumar, A. and Bolotin, E., “MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability,” Proc. IEEE International Symposium on Computer Architecture, (2017).
阿倫庫馬爾、A. 和布洛汀、E.：「MCM-GPU: 多芯片模組 GPU 以持續的性能擴充性」，IEEE 國際電腦體系結構研討會論文集 (2017)。
Feldman, M., Willard, C. G. and Snell, A., “HPC Application Support for GPU Computing,” (2015). http://www.intersect360.com/industry/reports.php?id=131
費爾德曼、威拉德和斯奈爾,《GPU 計算的 HPC 應用支持》(2015 年)。http://www.intersect360.com/industry/reports.php?id=131
NVIDIA cuDNN, GPU Accelerated Deep Learning. (2016). https://developer.nvidia.com/cudnn
NVIDIA cuDNN，GPU 加速深度學習。(2016)。https://developer.nvidia.com/cudnn
Xing, E., “Petuum: A New Platform for Distributed Machine Learning on Big Data,” (2015)
邢恩,"Petuum:一個新的大數據分布式機器學習平台,"(2015)
McKenna, B., “Artificial Intelligence could be about to start a processing chip arms race,” Computer Weekly, (2017). http://www.computerweekly.com/blog/Data-Matters/Artificial-Intelligence-could-be-about-to-start-a-processing-chip-arms-race-and-why-thats-a-good-t
麥肯納, B.,"人工智能可能會引發處理器晶片軍備競賽," Computer Weekly, (2017)。http://www.computerweekly.com/blog/Data-Matters/Artificial-Intelligence-could-be-about-to-start-a-processing-chip-arms-race-and-why-thats-a-good-t
Allen, G., and Chan, T., “Artificial Intelligence and National Security,” Belfer Center Study, (2017).
艾倫, G., 與陳, T., 「人工智能與國家安全」, 貝爾福中心研究, (2017)。
Potember, R., “Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD,” MITRE, (2017).
波特姆博士, R.。"國防部相關人工智慧和人工通用智慧研究的洞察"。MITRE, (2017)。
“One Hundred Year Study on Artificial Intelligence (AI100),” Stanford University, (2018). https://ai100.stanford.edu.
「百年人工智能研究(AI100)」，史丹福大學，(2018)。https://ai100.stanford.edu.
Creier, D. [AI: The Tumultuous History of the Search for Artificial Intelligence], Basic Books Publishing, Hachette Book Group, (1993).
克雷伊爾, D. [人工智能:尋求人工智能的波瀾壯闊歷史], Basic Books Publishing, Hachette Book Group, (1993)。
Moore, A. W. “Verifying and Validating Machine Intelligence”, World Economic Forum Annual Meeting 2016 - Ideas Lab
摩爾，A. W.。"驗證與確認機器智能"，2016 年世界經濟論壇年會 - 創意實驗室。
Laskowski, N., “Trying to wrap your brain around AI? CMU has an AI stack for that,” TechTarget, (2018). http://searchcio.techtarget.com/podcast/Trying-to-wrap-your-brain-around-AI-CMU-has-an-AI-stack-for-that
拉斯科夫斯基, N.,"試圖理解人工智慧?卡內基梅隆大學有一個人工智慧棧可供參考," TechTarget, (2018). http://searchcio.techtarget.com/podcast/Trying-to-wrap-your-brain-around-AI-CMU-has-an-AI-stack-for-that
“OSI model” Wikipedia, Accessed April 1, (2018). https://en.wikipedia.org/wiki/OSI model
開放系統互連模型
Zimmermann, H., “OSI Reference Model — The ISO Model of Architecture for Open Systems Interconnection,” Proc. IEEE Transactions on Communications, (1980).
齊默曼（H. Zimmermann)，"OSI 參考模型 — 開放系統互連的 ISO 架構模型"，IEEE 通信交易處理集（1980 年）。
Coates, A., “Deep learning with COTS HPC systems,” Proc. International Conference on Machine Learning, (2013).
柯茨,A.,"使用商用高性能計算系統進行深度學習,"機器學習國際會議論文集,(2013)。
Kawulok, M., Celebi, M. E. and Smolka, B., “Advances in Face Detection and Facial Image Analysis,” 10.1007/978-3-319-25958-1, (2016).
卡武洛克、塞利比、斯莫爾卡，《人臉檢測與臉部影像分析的最新進展》，10.1007/978-3-319-25958-1, (2016)。
Luu, K., Zhu, C., Bhagavatula, C., Le, T. H. N. and Savvides, M.,“A Deep Learning Approach to Joint Face Detection and Segmentation.” 1-12. 10.1007/978-3-319-25958-1_1, (2016).
呂敬德、朱剛、Bhagavatula, C.、黎天華和 Savvides, M.，"深度學習應用於人臉檢測及分割"。1-12。10.1007/978-3-319-25958-1_1，(2016)。
“Artificial Intelligence: Emerging Opportunities, Challenges, and Implications,” GAO report, GAO-18-142SP, (2018).
人工智能：新興機會、挑戰和影響，GAO 報告，GAO-18-142SP，(2018)。
Rosemain, M.and Rose M. “France to spend $1.8-billion on AI to compete with U.S., China,” Reuters, (2018).
法國將斥資 18 億美元推動人工智能發展,以與美國和中國競爭。
Spiegeleire, S., Maas, M., and Sweijs, T. “Artificial Intelligence and the Future of Defense,” The Hague Center for Strategic Studies, (2017).
斯皮格勒伊斯、馬斯和施韋斯。《人工智能與防務未來》,海牙戰略研究中心(2017 年)。
Magnuson, S., “Military 'Swimming In Sensors and Drowning in Data,” National Defense Maganzine, (2010).
馬格努森,S.,"軍隊'沉浸於感測器並淹沒於數據之中'," National Defense 雜誌, (2010)。
Pomerleau, M., “DoD stands up team to take on PED/intel problem,” C4ISR Magazine, (2017).
波梅洛, M.,"美國國防部成立團隊來解決感知-分析-解讀/情報問題," C4ISR Magazine, (2017)。
Chen, Q. and Koltun, V., “Photographic image synthesis with cascaded refinement networks,” Proc. International Conference on Computer Vision, (2017).
陳, Q. and 柯爾頓, V., "使用級聯精煉網絡進行攝影影像合成," 計算機視覺國際會議論文集, (2017)。
Karras, T., Aila, T, Laine, S., and Lehtinen, J., “Progressive growing of gans for improved quality, stability, and variation,” Proc. International Conference on Learning Representations, (2018).
卡拉斯, T.、艾拉, T.、萊恩, S. 和雷提寧, J.，「漸進式 GAN 的成長為改善品質、穩定性和變異性」，國際學習表徵會議論文集 (2018)。
Zhu, J., Park, T., Isola, P., and Efros, A. A., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” Proc. International Conference on Computer Vision, (2017).
朱、朴、伊索拉和埃夫羅斯,「使用循環一致對抗網絡進行未配對圖像到圖像翻譯」,國際計算機視覺大會論文集,(2017 年)。
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., and Webb, R., “Learning from simulated and unsupervised images through adversarial training,” Proc. Computer Vision and Pattern Recognition, (2017).
史瑞瓦斯塔瓦、佩夫斯特、図渓、素斯金德、王、韋伯,「通過對抗性訓練從仿真和無監督的影像中學習」,《計算機視覺與模式識別》,(2017)。
Hu, R., Dollár, P., He, K., Darrell, T., and Girshick,R., "Learning to Segment Every Thing,"Proc. Computer Vision and Pattern Recognition, (2018).
胡銳、多爾、何凱、達雷爾、吉爾希克.「學習分割每一個物體」,電腦視覺與模式識別學術會議論文,(2018 年).
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., and Kavukcuoglu, K., “Reinforcement learning with unsupervised auxiliary tasks,” Proc. International Conference on Learning Representations, (2017).
賈德伯格,M.,米尼,V.,察羅基,W. M.,沙烏爾,T.,萊博,J. Z.,西爾弗,D.,卡武克珂盧,K.,"無監督輔助任務的強化學習",國際表徵學習會議論文集,(2017)。
Papadopoulos, D. P., Uijlings, J. R. R., Keller, F., and Ferrari, V., “Training object class detectors with click supervision,” Proc. Computer Vision and Pattern Recognition, (2017).
帕帕多普洛斯、Uijlings、凱勒和費拉里，"使用點擊監督訓練物體類別檢測器"，電腦視覺與模式識別國際會議論文集，(2017 年)。
Pinto, L., Gandhi, D., Han, Y., Park, Y. L., and Gupta, A., “The curious robot: Learning visual representations via physical interactions,” Proc. European Conference on Computer Vision, (2016).
柏桐奥、甘地、韩宇、朴赞沂和古普塔,"好奇的机器人:通过物理交互学习视觉表征",欧洲计算机视觉大会论文集,(2016 年)。
Tulsiani, S., Efros, A. A., and Malik, J., “Multi-view Consistency as Supervisory Signal for Learning Shape and Pose Prediction,” Proc. Computer Vision and Pattern Recognition, (2018).
杜爾西尼, S., 艾佛斯, A. A., 和馬立克, J., "多視角一致性作為學習形狀和姿態預測的監督信號," 2018 年計算機視覺與模式識別學術會議論文集。
Doersch, C. and Zisserman, A., “Multi-task self-supervised visual learning,” Proc. International Conference on Computer Vision, (2017).
多爾奇, C.和齊斯曼, A., "多任務自監督式視覺學習," 國際電腦視覺會議論文集, (2017)。
Tzeng, E., Hoffman, J., Darrell, T., and Saenko, K., “Adversarial discriminative domain adaptation,” Proc. Computer Vision and Pattern Recognition, (2017).
曾, E.、Hoffman, J.、Darrell, T.和 Saenko, K.，「對抗性的判別式域自適應」，計算機視覺與模式識別學術論文集，(2017)。
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M. and Lempitsky, V., “Domain-adversarial training of neural networks,” The Journal of Machine Learning Research 17(1): 2096-2030, (2016).
甘宁、尤斯蒂诺娃、阿亚肯、热尔曼、拉罗谢、拉维奥莱特、马尔尚和列姆皮茨基,"神经网络对抗性域训练,"《机器学习研究杂志》17(1): 2096-2030, (2016)。
Rebuffi, S., Bilen, H., and Vedaldi, A., “Learning multiple visual domains with residual adapters,” Proc. Advances in Neural Information Processing Systems, (2017).
雷普弗，S.、畢連，H.、與韋達爾迪，A.，「利用剩餘適配器學習多個視覺領域」，神經資訊處理系統進展論文集，（2017）。
Finn, C., Abbeel, P. and Levine, S., “Model-agnostic meta-learning for fast adaptation of deep networks,” Proc. International Conference on Machine Learning, (2017).
芬恩、阿比爾和列維 n，"用於快速適應深度網絡的模型不可知元學習"，機器學習國際會議論文集，(2017)。
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., and de Freitas, N., “Learning to learn by gradient descent by gradient descent,” In Advances in Neural Information Processing Systems, (2016).
安德烈喬維奇、德尼爾、戈麥斯、霍夫曼、普福、香鐘、德佛里塔斯, "通過梯度下降學習 ", 神經信息處理系統進展, (2016)。
Vinyals, O., Blundell, C., Lillicrap, T., and Wierstra, D., “Matching networks for one shot learning,” Proc. Advances in Neural Information Processing Systems, (2016).
溫奇亞爾斯,O.、布倫德爾,C.、利卡普,T.和韋爾斯特拉,D.，「一次學習的匹配網路」，神經資訊處理系統進展研討會論文集 (2016)。
Ravi, S. and Larochelle, H., “Optimization as a model for few-shot learning,” Proc. International Conference on Learning Representations, (2017).
拉維,S.和拉羅舍勒,H.,"少樣本學習的模型:優化",國際學習表徵大會論文集,(2017 年)。
Wang, Y. and Hebert, M., “Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs,” Proc. Neural Information Processing Systems, (2016).
王, Y. 和 Hebert, M., "通过结合无监督元训练和 CNNs 从小样本集中学习," 神经信息处理系统论文集, (2016)。
Wang, Y. and Hebert, M., “Learning to Learn: Model Regression Networks for Easy Small Sample Learning,” Proc. European Conference on Computer Vision, (2016).
王毓清和赫伯特, "學習去學習: 簡單小樣本學習的模型迴歸網路," 歐洲電腦視覺會議論文集, (2016).
Wang, Y., Ramanan, D. K. and Hebert, M., “Learning to model the tail,” Proc. Neural Information Processing Systems, (2017).
王, 揚, 拉曼, D. K. 和赫伯特, M., "學習建模尾部," 神經資訊處理系統程序, (2017)。
Hoiem, D., Chodpathumwan, Y., and Dai, Q., “Diagnosing Error in Object Detectors,” Proc. European Conference on Computer Vision, (2012).
霍伊姆、楚帕圖蒙和戴,「診斷物體偵測器的錯誤」,歐洲計算機視覺會議論文集,(2012)。
Zhang, P., Wang, J., Farhadi, A., Hebert, M. and Parikh, D., “Predicting Failures of Vision Systems,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, (2014).
張, P., 王, J., Farhadi, A., Hebert, M. 和 Parikh, D., "預測視覺系統的失敗," 在 IEEE 計算機視覺和模式識別會議上論文集, (2014)。
Saxena, D. M., Kurtz, V. and Hebert, M., “Learning robust failure response for autonomous vision based flight,” Conference Paper, IEEE International Conference on Robotics and Automation5824-5829, (2017).
薩克賽納, D. M.、柯茨, V. 和赫伯特, M.,"學習自主視覺飛行的強健故障響應"，會議論文，IEEE 國際機器人與自動化大會 5824-5829, (2017)。
Ma, W. C., Huang, D. A., Lee, N., and Kitani, K. M., “Forecasting Interactive Dynamics of Pedestrians with Fictitious Play,” Proc. Conference on Computer Vision and Pattern Recognition, (2017).
馬文晴、黃冬安、李楠和北谷光昭。「使用虛擬遊戲預測行人的互動動態」。計算機視覺與模式識別大會論文集 (2017)。
“Nvidia’s Huang Sees AI 'Cambrian Explosion”, Datanami.com, https://www.datanami.com/2017/05/24/nvidias-huang, (2018).
輝達 Huang 看到 AI 的"寒武紀大爆發"
The New NVIDIA Pascal Architecture. (2016). http://www.nvidia.com/object/gpu-architecture.html
新 NVIDIA Pascal 架構。(2016)。http://www.nvidia.com/object/gpu-architecture.html
Laskowski, N., “How could an AI stack benefit your company?” Transcript from “Schooled in AI” podcast, (2017)
拉斯科夫斯基，"人工智慧堆疊如何為您的企業帶來好處？"來自"Schooled in AI"播客的文字記錄，(2017)
Alderman, D. and Ray, J., “Best Frenemies Forever: Artificial Intelligence, Emerging Technologies, and China-US Strategic Competition,” University of California-San Diego Institute on Global Conflict and Cooperation SITC Research Briefs, (2017).
艾德曼, D.和雷, J.,"最好的仇敵永遠:人工智能,新興技術和中美戰略競爭,"加州大學聖地亞哥分校全球衝突與合作研究所 SITC 研究簡報,(2017 年)。
Chen, T., Goodfellow, I., and Shlens, J., "Net2net: Accelerating learning via knowledge transfer,"Proc. International Conference on Learning Representations, (2016).
陳, T., Goodfellow, I., 及 Shlens, J., "Net2net: 透過知識轉移加速學習", 國際表徵學習會議論文集, (2016)。
Gupta, S., Hoffman, J., and Malik, J., “Cross modal distillation for supervision transfer,” Proc. Computer Vision and Pattern Recognition, (2016).
古普塔、霍夫曼和馬利克,「跨模態蒸餾以傳遞監督」,計算機視覺與模式識別會議論文,(2016)。
Li, Z. and Hoiem, D., “Learning without forgetting,” Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2017).
李志和 Hoiem,D.,"Learning without forgetting,"IEEE 模式分析和機器智能論文集,(2017)。

Ground/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR IX, edited by Michael A. Kolodny, Dietrich M. Wiegmann, Tien Pham, Proc. of SPIE Vol. 10635,
地/空多傳感器互操作性、整合和持續 ISR IX 網絡,由 Michael A. Kolodny、Dietrich M. Wiegmann、Tien Pham 編輯,SPIE 論文集第 10635 卷。

106350C $\cdot$ © 2018 SPIE $\cdot$ CCC code: 0277-786X/18/$18 $\cdot$ doi: 10.1117/12.2309483
$^{1}$ The following sections of this paper summarize a new framework developed by Carnegie Mellon University’s School of Computer Science - specifically Dean Andrew Moore - called the AI Stack, and discuss its ability to act as a ‘blueprint’ for the development and deployment of AI. As a result, citations will be diminished.
$^{1}$ 本文以下部分總結了卡內基梅隆大學計算機科學學院（由安德魯·摩爾院長專門研發）所提出的一個新框架，稱為「AI 堆棧」，並討論了它作為 AI 開發和部署的「藍圖」的能力。因此，引用將相對減少。

The AI Stack: A blueprint for developing and deploying Artificial Intelligence 人工智慧系統：開發和部署人工智慧的藍圖

Abstract 摘要

1. INTRODUCTION 一、前言

2. INTRODUCING THE AI STACK 1 1 ^(1){ }^{1}2. 介紹 AI 堆疊

3. BUILDING THE AI STACK3. 搭建人工智慧架構

4. AI AND NATIONAL SECURITY4.人工智慧與國家安全

5. CONCEPT OF OPERATIONS USE CASE: LEVERAGING THE AI STACK TO ENHANCE SITUATIONAL AWARENESS FROM FULL MOTION VIDEO利用 AI 技術提升全動態影像的情況意識

5.1 Training object detectors5.1 物件偵測器的訓練

5.2 Minimizing supervision5.2 減少監督

5.3 Evaluating performance5.3 評估績效

5.4 Understanding activities了解活動