这是用户在 2024-6-3 10:39 为 https://ar5iv.labs.arxiv.org/html/2101.11174?_immersive_translate_auto_translate=1 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Graph Neural Network for Traffic Forecasting: A Survey
用于交通预测的图神经网络:调查

Weiwei Jiang  蒋伟伟 Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
清华大学电子工程系, 北京, 100084
Jiayun Luo  罗佳云 School of Computer Science and Engineering, Nanyang Technological University, 639798, Singapore
南洋理工大学计算机科学与工程学院, 639798, 新加坡
Abstract 抽象的

Traffic forecasting is important for the success of intelligent transportation systems. Deep learning models, including convolution neural networks and recurrent neural networks, have been extensively applied in traffic forecasting problems to model spatial and temporal dependencies. In recent years, to model the graph structures in transportation systems as well as contextual information, graph neural networks have been introduced and have achieved state-of-the-art performance in a series of traffic forecasting problems. In this survey, we review the rapidly growing body of research using different graph neural networks, e.g. graph convolutional and graph attention networks, in various traffic forecasting problems, e.g. road traffic flow and speed forecasting, passenger flow forecasting in urban rail transit systems, and demand forecasting in ride-hailing platforms. We also present a comprehensive list of open data and source codes for each problem and identify future research directions. To the best of our knowledge, this paper is the first comprehensive survey that explores the application of graph neural networks for traffic forecasting problems. We have also created a public GitHub repository where the latest papers, open data, and source codes will be updated.
交通预测对于智能交通系统的成功非常重要。深度学习模型,包括卷积神经网络和循环神经网络,已广泛应用于交通预测问题中,以对空间和时间依赖性进行建模。近年来,为了对交通系统中的图结构以及上下文信息进行建模,引入了图神经网络,并在一系列交通预测问题中取得了最先进的性能。在这项调查中,我们回顾了使用不同图神经网络的快速增长的研究,例如图卷积和图注意力网络,用于各种流量预测问题,例如道路交通流量和速度预测、城市轨道交通系统客流预测、网约车平台需求预测。我们还提供了每个问题的开放数据和源代码的完整列表,并确定了未来的研究方向。据我们所知,本文是第一篇探索图神经网络在交通预测问题中的应用的综合调查。我们还创建了一个公共 GitHub 存储库,其中将更新最新论文、开放数据和源代码。

keywords:
Traffic Forecasting , Graph Neural Networks , Graph Convolution Network , Graph Attention Network , Deep Learning
关键词:流量预测、图神经网络、图卷积网络、图注意力网络、深度学习
journal: Journal of  Templates
期刊:LaTeX 模板期刊

1 Introduction 1简介

Transportation systems are among the most important infrastructure in modern cities, supporting the daily commuting and traveling of millions of people. With rapid urbanization and population growth, transportation systems have become more complex. Modern transportation systems encompass road vehicles, rail transport, and various shared travel modes that have emerged in recent years, including online ride-hailing, bike-sharing, and e-scooter sharing. Expanding cities face many transportation-related problems, including air pollution and traffic congestion. Early intervention based on traffic forecasting is seen as the key to improving the efficiency of a transportation system and to alleviate transportation-related problems. In the development and operation of smart cities and intelligent transportation systems (ITSs), traffic states are detected by sensors (e.g. loop detectors) installed on roads, subway and bus system transaction records, traffic surveillance videos, and even smartphone GPS (Global Positioning System) data collected in a crowd-sourced fashion. Traffic forecasting is typically based on consideration of historical traffic state data, together with the external factors which affect traffic states, e.g. weather and holidays.
交通系统是现代城市最重要的基础设施之一,支撑着数百万人的日常通勤和出行。随着快速城市化和人口增长,交通系统变得更加复杂。现代交通系统包括公路车辆、轨道交通以及近年来兴起的网约车、共享单车、共享电动车等多种共享出行方式。不断扩张的城市面临着许多与交通相关的问题,包括空气污染和交通拥堵。基于交通预测的早期干预被视为提高交通系统效率和缓解交通相关问题的关键。在智慧城市和智能交通系统(ITS)的开发和运营中,交通状态通过安装在道路上的传感器(例如环路探测器)、地铁和公交系统交易记录、交通监控视频,甚至智能手机GPS(全球定位系统)来检测。 )以众包方式收集的数据。交通预测通常基于历史交通状态数据以及影响交通状态的外部因素(例如,交通状况)的考虑。天气和假期。

Both short-term and long-term traffic forecasting problems for various transport modes are considered in the literature. This survey focuses on the data-driven approach, which involves forecasting based on historical data. The traffic forecasting problem is more challenging than other time series forecasting problems because it involves large data volumes with high dimensionality, as well as multiple dynamics including emergency situations, e.g. traffic accidents. The traffic state in a specific location has both spatial dependency, which may not be affected only by nearby areas, and temporal dependency, which may be seasonal. Traditional linear time series models, e.g. auto-regressive and integrated moving average (ARIMA) models, cannot handle such spatiotemporal forecasting problems. Machine learning (ML) and deep learning techniques have been introduced in this area to improve forecasting accuracy, for example, by modeling the whole city as a grid and applying a convolutional neural network (CNN) as demonstrated by Jiang & Zhang [2018]. However, the CNN-based approach is not optimal for traffic foresting problems that have a graph-based form, e.g. road networks.
文献中考虑了各种运输方式的短期和长期交通预测问题。这项调查的重点是数据驱动的方法,其中涉及基于历史数据的预测。交通预测问题比其他时间序列预测问题更具挑战性,因为它涉及高维的大数据量,以及包括紧急情况在内的多种动态,例如紧急情况。交通意外。特定位置的交通状态既有空间依赖性(可能不仅仅受附近区域的影响),也有时间依赖性(可能是季节性的)。传统的线性时间序列模型,例如自回归和综合移动平均(ARIMA)模型无法处理此类时空预测问题。该领域已引入机器学习(ML)和深度学习技术来提高预测准确性,例如,通过将整个城市建模为网格并应用卷积神经网络(CNN),如Jiang和Zhang [2018]所示。然而,基于 CNN 的方法对于具有基于图的形式的流量森林问题来说并不是最佳的,例如道路网络。

In recent years, graph neural networks (GNNs) have become the frontier of deep learning research, showing state-of-the-art performance in various applications [Wu et al., 2020b]. GNNs are ideally suited to traffic forecasting problems because of their ability to capture spatial dependency, which is represented using non-Euclidean graph structures. For example, a road network is naturally a graph, with road intersections as the nodes and road connections as the edges. With graphs as the input, several GNN-based models have demonstrated superior performance to previous approaches on tasks including road traffic flow and speed forecasting problems. These include, for example, the diffusion convolutional recurrent neural network (DCRNN) [Li et al., 2018b] and Graph WaveNet [Wu et al., 2019] models. The GNN-based approach has also been extended to other transportation modes, utilizing various graph formulations and models.
近年来,图神经网络(GNN)已成为深度学习研究的前沿,在各种应用中显示出最先进的性能 [Wu et al., 2020b]。 GNN 非常适合交通预测问题,因为它们能够捕获空间依赖性(使用非欧几里得图结构表示)。例如,道路网络本质上是一个图,以道路交叉口为节点,以道路连接为边。以图为输入,一些基于 GNN 的模型在道路交通流量和速度预测问题等任务上表现出了优于之前方法的性能。例如,其中包括扩散卷积递归神经网络 (DCRNN) [Li et al., 2018b] 和 Graph WaveNet [Wu et al., 2019] 模型。基于 GNN 的方法还利用各种图形公式和模型扩展到其他交通模式。

To the best of the authors’ knowledge, this paper presents the first comprehensive literature survey of GNN-related approaches to traffic forecasting problems. While several relevant traffic forecasting surveys exist [Shi & Yeung, 2018, Pavlyuk, 2019, Yin et al., 2021, Luca et al., 2020, Fan et al., 2020, Boukerche & Wang, 2020a, Manibardo et al., 2021, Ye et al., 2020a, Lee et al., 2021, Xie et al., 2020a, George & Santra, 2020, Haghighat et al., 2020, Boukerche et al., 2020, Tedjopurnomo et al., 2020, Varghese et al., 2020], most of them are not GNN-focused with only one exception [Ye et al., 2020a]. For this survey, we reviewed 212 papers published in the years 2018 to 2020. Additionally, because this is a very rapidly developing research field, we also included preprints that have not yet gone through the traditional peer review process (e.g., arXiv papers) to present the latest progress. Based on these studies, we identify the most frequently considered problems, graph formulations, and models. We also investigate and summarize publicly available useful resources, including datasets, software, and open-sourced code, for GNN-based traffic forecasting research and application. Lastly, we identify the challenges and future directions of applying GNNs to the traffic forecasting problem.
据作者所知,本文首次对与 GNN 相关的交通预测问题方法进行了全面的文献调查。虽然存在一些相关的交通预测调查 [Shi & Yeung, 2018, Pavlyuk, 2019, Yin et al., 2021, Luca et al., 2020, Fan et al., 2020, Boukerche & Wang, 2020a, Manibardo et al., 2021,Ye 等人,2020a,Lee 等人,2021,Xie 等人,2020a,George & Santra,2020,Haghighat 等人,2020,Boukerche 等人,2020,Tedjopurnomo 等人,2020, Varghese 等人,2020],其中大多数都不是以 GNN 为中心,只有一个例外 [Ye 等人,2020a]。在本次调查中,我们回顾了 2018 年至 2020 年发表的 212 篇论文。此外,由于这是一个非常快速发展的研究领域,我们还纳入了尚未经过传统同行评审流程的预印本(例如 arXiv 论文),以呈现最新进展。根据这些研究,我们确定了最常考虑的问题、图表公式和模型。我们还调查和总结了公开的有用资源,包括数据集、软件和开源代码,用于基于 GNN 的流量预测研究和应用。最后,我们确定了将 GNN 应用到流量预测问题的挑战和未来方向。

Instead of giving a whole picture of traffic forecasting, our aim is to provide a comprehensive summary of GNN-based solutions. This paper is useful for both the new researchers in this field who want to catch up with the progress of applying GNNs and the experienced researchers who are not familiar with these latest graph-based solutions. In addition to this paper, we have created an open GitHub repository on this topic 111https://github.com/jwwthu/GNN4Traffic
我们的目标不是提供流量预测的全貌,而是提供基于 GNN 的解决方案的全面总结。本文对于该领域想要了解 GNN 应用进展的新研究人员以及不熟悉这些最新的基于图的解决方案的经验丰富的研究人员来说都是有用的。除了本文之外,我们还创建了一个关于此主题的开放 GitHub 存储库 1
, where relevant content will be updated continuously.
,相关内容将持续更新。

Our contributions are summarized as follows:
我们的贡献总结如下:

1) Comprehensive Review: We present the most comprehensive review of graph-based solutions for traffic forecasting problems in the past three years (2018-2020).
1)全面回顾:我们对过去三年(2018-2020)基于图的流量预测问题解决方案进行了最全面的回顾。

2) Resource Collection: We provide the latest comprehensive list of open datasets and code resources for replication and comparison of GNNs in future work.
2)资源收集:我们提供最新的开放数据集和代码资源的综合列表,以便在未来的工作中复制和比较GNN。

3) Future Directions: We discuss several challenges and potential future directions for researchers in this field, when using GNNs for traffic forecasting problems.
3)未来方向:我们讨论了该领域研究人员在使用 GNN 解决流量预测问题时面临的几个挑战和潜在的未来方向。

The remainder of this paper is organized as follows. In Section 2, we compare our work with other relevant research surveys. In Section 3, we categorize the traffic forecasting problems that are involved with GNN-based models. In Section 4.1, we summarize the graphs and GNNs used in the reviewed studies. In Section 5, we outline the open resources. Finally, in Section 6, we point out challenges and future directions.
本文的其余部分安排如下。在第二节中,我们将我们的工作与其他相关研究调查进行比较。在第 3 节中,我们对基于 GNN 的模型涉及的流量预测问题进行了分类。在第 4.1 节中,我们总结了所审查研究中使用的图表和 GNN。在第 5 节中,我们概述了开放资源。最后,在第六节中,我们指出了挑战和未来的方向。

2 Related Research Surveys
2相关研究调查

In this section, we introduce the most recent relevant research surveys (most of which were published in 2020). The differences between our study and these existing surveys are pointed out when appropriate. We start with the surveys addressing wider ITS topics, followed by those focusing on traffic prediction problems and GNN application in particular.
在本节中,我们介绍最新的相关研究调查(其中大部分发表于 2020 年)。我们的研究与这些现有调查之间的差异会在适当的时候指出。我们从针对更广泛的 ITS 主题的调查开始,然后是针对流量预测问题和特别是 GNN 应用的调查。

Besides traffic forecasting, machine learning and deep learning methods have been widely used in ITSs as discussed in Haghighat et al. [2020], Fan et al. [2020], Luca et al. [2020]. In Haghighat et al. [2020], GNNs are only mentioned in the task of traffic characteristics prediction. Among the major milestones of deep-learning driven traffic prediction (summarized in Figure 2 of Fan et al. [2020]), the state-of-the-art models after 2019 are all based on GNNs, indicating that GNNs are indeed the frontier of deep learning-based traffic prediction research.
正如 Haghighat 等人所讨论的,除了交通预测之外,机器学习和深度学习方法也已广泛应用于 ITS。 [2020],范等人。 [2020],卢卡等人。 [2020]。在 Haghighat 等人中。 [2020],GNN仅在流量特征预测任务中被提及。在深度学习驱动的流量预测的主要里程碑中(Fan等人[2020]的图2总结),2019年后最先进的模型都是基于GNN的,这表明GNN确实是前沿基于深度学习的交通预测研究。

Roughly speaking, five different types of traffic prediction methods are identified and categorized in previous surveys [Xie et al., 2020a, George & Santra, 2020], namely, statistics-based methods, traditional machine learning methods, deep learning-based methods, reinforcement learning-based methods, and transfer learning-based methods. Some comparisons between different categories have been considered, e.g., statistics-based models have better model interpretability, whereas ML-based models are more flexible as discussed in Boukerche et al. [2020]. Machine learning models for traffic prediction are further categorized in Boukerche & Wang [2020a], which include the regression model, example-based models (e.g., k-nearest neighbors), kernel-based models (e.g. support vector machine and radial basis function), neural network models, and hybrid models. Deep learning models are further categorized into five different generations in Lee et al. [2021], in which GCNs are classified as the fourth generation and other advanced techniques that have been considered but are not yet widely applied are merged into the fifth generation. These include transfer learning, meta learning, reinforcement learning, and the attention mechanism. Before these advanced techniques become mature in traffic prediction tasks, GNNs remain the state-of-the-art technique.
粗略地说,之前的调查识别并分类了五种不同类型的流量预测方法 [Xie et al., 2020a, George & Santra, 2020],即基于统计的方法、传统机器学习方法、基于深度学习的方法、基于强化学习的方法和基于迁移学习的方法。已经考虑了不同类别之间的一些比较,例如,基于统计的模型具有更好的模型可解释性,而基于 ML 的模型更加灵活,如 Boukerche 等人所述。 [2020]。 Boukerche & Wang [2020a] 对流量预测的机器学习模型进行了进一步分类,包括回归模型、基于示例的模型(例如 k 最近邻)、基于内核的模型(例如支持向量机和径向基函数) 、神经网络模型和混合模型。 Lee 等人将深度学习模型进一步分为五个不同的代。 [2021],其中GCN被归类为第四代,其他已经考虑过但尚未广泛应用的先进技术被合并到第五代。其中包括迁移学习、元学习、强化学习和注意力机制。在这些先进技术在流量预测任务中变得成熟之前,GNN 仍然是最先进的技术。

Some of the relevant surveys only focus on the progress of deep learning-based methods [Tedjopurnomo et al., 2020], while the others prefer to compare them with the statistics-based and machine learning methods [Yin et al., 2021, Manibardo et al., 2021]. In Tedjopurnomo et al. [2020], 37 deep neural networks for traffic prediction are reviewed, categorized, and discussed. The authors conclude that encoder-decoder long short term-memory (LSTM) combined with graph-based methods is the state-of-the-art prediction technique. A detailed explanation of various data types and popular deep neural network architectures is also provided, along with challenges and future directions for traffic prediction. Conversely, it is found that deep learning is not always the best modeling technique in practical applications, where linear models and machine learning techniques with less computational complexity can sometimes be preferable [Manibardo et al., 2021].
一些相关调查仅关注基于深度学习的方法的进展 [Tedjopurnomo et al., 2020],而其他调查则更喜欢将它们与基于统计和机器学习的方法进行比较 [Yin et al., 2021, Manibardo等人,2021]。在 Tedjopurnomo 等人中。 [2020],对 37 个用于交通预测的深度神经网络进行了回顾、分类和讨论。作者得出的结论是,编码器-解码器长短期记忆 (LSTM) 与基于图的方法相结合是最先进的预测技术。还提供了各种数据类型和流行的深度神经网络架构的详细解释,以及流量预测的挑战和未来方向。相反,人们发现深度学习在实际应用中并不总是最好的建模技术,其中计算复杂度较低的线性模型和机器学习技术有时可能更可取 [Manibardo et al., 2021]。

Additional research surveys consider aspects other than model selection. In Pavlyuk [2019], spatiotemporal feature selection and extraction pre-processing methods, which may also be embedded as internal model processes, are reviewed. A meta-analysis of prediction accuracy when applying deep learning methods to transport studies is given in Varghese et al. [2020]. In this study, apart from the models themselves, additional factors including sample size and prediction time horizon are shown to have a significant influence on prediction accuracy.
其他研究调查考虑了模型选择以外的其他方面。 Pavlyuk [2019] 回顾了时空特征选择和提取预处理方法,这些方法也可以作为内部模型过程嵌入。 Varghese 等人给出了将深度学习方法应用于交通研究时预测准确性的荟萃分析。 [2020]。在这项研究中,除了模型本身之外,样本大小和预测时间范围等其他因素也被证明对预测精度有显着影响。

To the authors’ best knowledge, there are no existing surveys focusing on the application of GNNs for traffic forecasting. Graph-based deep learning architectures are reviewed in Ye et al. [2020a], for a series of traffic applications, namely, traffic congestion, travel demand, transportation safety, traffic surveillance, and autonomous driving. Specific and practical guidance for constructing graphs in these applications is provided. The advantages and disadvantages of both GNNs and other deep learning models ,e.g. recurrent neural network (RNN), temporal convolutional network (TCN), Seq2Seq, and generative adversarial network (GAN), are examined. While the focus is not limited to traffic prediction problems, the graph construction process is universal in the traffic domain when GNNs are involved.
据作者所知,目前还没有针对 GNN 在流量预测中的应用的调查。 Ye 等人对基于图的深度学习架构进行了综述。 [2020a],针对交通拥堵、出行需求、交通安全、交通监控、自动驾驶等一系列交通应用。提供了在这些应用程序中构建图形的具体且实用的指导。 GNN 和其他深度学习模型的优缺点,例如研究了循环神经网络(RNN)、时间卷积网络(TCN)、Seq2Seq 和生成对抗网络(GAN)。虽然重点不限于流量预测问题,但当涉及 GNN 时,图构建过程在流量领域是通用的。

3 Problems 3问题

In this section, we discuss and categorize the different types of traffic forecasting problems considered in the literature. Problems are first categorized by the traffic state to be predicted. Traffic flow, speed, and demand problems are considered separately while the remaining types are grouped together under “other problems”. Then, the problem-types are further broken down into levels according to where the traffic states are defined. These include road-level, region-level, and station-level categories.
在本节中,我们对文献中考虑的不同类型的流量预测问题进行讨论和分类。首先根据要预测的交通状况对问题进行分类。交通流、速度和需求问题分别考虑,而其余类型则归为“其他问题”。然后,根据交通状态的定义,将问题类型进一步细分为多个级别。其中包括道路级、区域级和车站级类别。

Different problem types have different modelling requirements for representing spatial dependency. For the road-level problems, the traffic data are usually collected from sensors, which are associated with specific road segments, or GPS trajectory data, which are also mapped into the road network with map matching techniques. In this case, the road network topology can be seen as the graph to use, which may contain hundreds or thousands of road segments potentially. The spatial dependency may be described by the road network connectivity or spatial proximity. For the station-level problems, the metro or bus station topology can be taken as the graph to use, which may contain tens or hundreds of stations potentially. The spatial dependency may be described by the metro lines or bus routes. For the region-level problem, the regular or irregular regions are used as the nodes in a graph. The spatial dependency between different regions can be extracted from the land use purposes, e.g., from the points-of-interest data.
不同的问题类型对于表示空间依赖性有不同的建模要求。对于道路层面的问题,交通数据通常是从与特定路段相关的传感器收集的,或者是GPS轨迹数据,这些数据也通过地图匹配技术映射到道路网络中。在这种情况下,道路网络拓扑可以被视为要使用的图,其中可能包含数百或数千个路段。空间依赖性可以通过道路网络连通性或空间邻近性来描述。对于车站级问题,可以将地铁或公交车站拓扑作为图来使用,其中可能包含数十或数百个车站。空间依赖性可以通过地铁线路或公交路线来描述。对于区域级问题,规则或不规则区域被用作图中的节点。不同区域之间的空间依赖性可以从土地使用目的中提取,例如从兴趣点数据中提取。

A full list of the traffic forecasting problems considered in the surveyed studies is shown in Table LABEL:tab:problems. Instead of giving the whole picture of traffic forecasting research, only those problems with GNN-based solutions in the literature are listed in Table LABEL:tab:problems.
表 LABEL:tab:problems 显示了调查研究中考虑的交通预测问题的完整列表。表 LABEL:tab:problems 中仅列出了文献中基于 GNN 的解决方案的问题,而不是给出流量预测研究的全貌。

Table 1: Traffic forecasting problems in the surveyed studies.
表 1:调查研究中的交通预测问题。
Problem Relevant Studies 相关研究
Road Traffic Flow 道路交通流量  Zhang et al. [2018b], Wei et al. [2019], Xu et al. [2020a], Guo et al. [2020a], Zheng et al. [2020b], Pan et al. [2020, 2019], Lu et al. [2019a], Mallick et al. [2020], Zhang et al. [2020j, l], Bai et al. [2020], Fang et al. [2019], Huang et al. [2020a], Wang et al. [2018b], Zhang et al. [2020e], Song et al. [2020a], Xu et al. [2020b], Wang et al. [2020g], Chen et al. [2020e], Lv et al. [2020], Kong et al. [2020], Fukuda et al. [2020], Zhang & Guo [2020], Boukerche & Wang [2020b], Tang et al. [2020b], Kang et al. [2019], Guo et al. [2019c], Li et al. [2019b], Xu et al. [2019], Zhang et al. [2019d], Wu et al. [2018a], Sun et al. [2020], Wei & Sheng [2020], Li et al. [2020f], Cao et al. [2020], Yu et al. [2018, 2019b], Li et al. [2020b], Yin et al. [2020], Chen et al. [2020g], Zhang et al. [2020a], Wang et al. [2020a], Xin et al. [2020], Qu et al. [2020], Wang et al. [2020b], Xie et al. [2020d], Huang et al. [2020b], Guo et al. [2020b], Zhang et al. [2020h], Fang et al. [2020a], Li & Zhu [2021], Tian et al. [2020], Xu et al. [2020c], Chen et al. [2020c
张等人 2018b,徐等人 2020a,郑等人 2020,2019,卢等人 2020,张等人 2020j,Bai 等人 2020,黄等人 2020a,Zhang 等人 2020a,Xu 等人 2020b, Wang 等人 2020g,Lv 等人 2020,Fukuda 等人 2020,Boukerche 等人 2020b,Kang 等人。 2019,Li 等人 2019b,Zhang 等人 2019d,Sun 等人 2020,Li 等人 2020f等人 2020,Yu 等人 2020b,Yin 等人 2020g,Wang 等人 2020a,Qu等人 2020,Xie 等人 2020b,Guo 等人 2020h,Fang 等人 2020a,Tian 等人。 2020,徐等人 2020c,陈等人 2020c
]
Road OD Flow 道路外径流量  Xiong et al. [2020], Ramadan et al. [2020
熊等人。 2020 年,斋月等。 2020年
]
Intersection Traffic Throughput
路口交通吞吐量
 Sánchez et al. [2020 桑切斯等人。 2020年]
Regional Taxi Flow 区域出租车流量  Zhou et al. [2020d], Sun et al. [2020], Chen et al. [2020d], Wang et al. [2018a], Peng et al. [2020], Zhou et al. [2019], Wang et al. [2020e], Qiu et al. [2020]
Zhou 等人 2020d,Chen 等人 2020d,Peng 等人 2020,Wang 等人 2020
Regional Bike Flow 区域自行车流量  Zhou et al. [2020d], Sun et al. [2020], Wang et al. [2018a, 2020e
周等人。 2020d,Sun 等人。 2020,王等人。 2018年a, 2020年e
]
Regional Ride-hailing Flow
区域网约车流量
 Zhou et al. [2019 周等人。 2019年]
Regional Dockless E-Scooter Flow
区域无桩电动滑板车流程
 He & Shin [2020a 和信 2020a]
Regional OD Taxi Flow 区域 OD 出租车流量  Wang et al. [2020e], Yeghikyan et al. [2020
王等人。 2020e,Yeghikyan 等人。 2020年
]
Regional OD Bike Flow 区域 OD 自行车流量  Wang et al. [2020e 王等人。 2020年预计]
Regional OD Ride-hailing Flow
区域OD网约车流量
 Shi et al. [2020], Wang et al. [2020h, 2019
石等人。 2020,王等人。 2020h, 2019
]
Station-level Subway Passenger Flow
车站级地铁客流量
 Fang et al. [2019, 2020a], Peng et al. [2020], Ren & Xie [2019], Li et al. [2018a], Zhao et al. [2020a], Han et al. [2019], Zhang et al. [2020b, c], Li et al. [2020e], Liu et al. [2020b], Ye et al. [2020b], Ou et al. [2020
方等人。 2019、2020a,Peng 等人。 2020,Ren & Xie 2019,Li 等人。 2018a,赵等人。 2020a,Han 等人。 2019,张等人。 2020b,c,Li 等人。 2020e,刘等人。 2020b,叶等人。 2020b,Ou 等人。 2020年
]
Station-level Bus Passenger Flow
车站级公交客流
 Fang et al. [2019, 2020a], Peng et al. [2020
方等人。 2019、2020a,Peng 等人。 2020年
]
Station-level Shared Vehicle Flow
车站级共享车流
 Zhu et al. [2019 朱等人。 2019年]
Station-level Bike Flow 车站级自行车流量  He & Shin [2020b], Chai et al. [2018
He & Shin 2020b,Chai 等人。 2018年
]
Station-level Railway Passenger Flow
车站级铁路客流
 He et al. [2020 他等人。 2020年]
Road Traffic Speed 道路交通速度  Li et al. [2018b], Wu et al. [2019], Zhang et al. [2018b], Wei et al. [2019], Xu et al. [2020a], Guo et al. [2020a], Zheng et al. [2020b], Pan et al. [2020, 2019], Lu et al. [2019a], Mallick et al. [2020], Zhang et al. [2020j], Lv et al. [2020], Li et al. [2020f], Yin et al. [2020], Guo et al. [2020b], Li & Zhu [2021], Chen et al. [2020d], Zhao et al. [2020a], Bai et al. [2021], Tang et al. [2020a], James [2020], Shin & Yoon [2020], Liu et al. [2020a], Zhang et al. [2018a, 2019f], Yu & Gu [2019], Xie et al. [2019], Zhang et al. [2019a], Guo et al. [2019a], Diao et al. [2019], Cirstea et al. [2019], Lu et al. [2019b], Zhang et al. [2019c], James [2019], Ge et al. [2019a, b], Zhang et al. [2019b], Lee & Rhee [2022], Shleifer et al. [2019], Yu et al. [2020a], Ge et al. [2020], Lu et al. [2020b], Yang et al. [2020], Zhao et al. [2019], Cui et al. [2019], Chen et al. [2019], Zhang et al. [2019e], Yu et al. [2019a], Lee & Rhee [2019], Bogaerts et al. [2020], Wang et al. [2020f], Cui et al. [2020b, a], Guo et al. [2020c], Zhou et al. [2020a], Cai et al. [2020], Zhou et al. [2020b], Wu et al. [2020c], Chen et al. [2020f], Opolka et al. [2019], Mallick et al. [2021], Oreshkin et al. [2021], Jia et al. [2020], Sun et al. [2021], Guo & Yuan [2020], Xie et al. [2020b], Zhang et al. [2020i], Zhu et al. [2021], Feng et al. [2020], Zhu et al. [2020], Fu et al. [2020], Zhang et al. [2020d], Xie et al. [2020c], Park et al. [2020], Agafonov [2020], Chen et al. [2020a], Lu et al. [2020a], Jepsen et al. [2019, 2020], Bing et al. [2020], Lewenfus et al. [2020], Zhu et al. [2022], Liao et al. [2018], Maas & Bloem [2020], Li et al. [2020d], Song et al. [2020b], Zhao et al. [2020b], Guopeng et al. [2020], Kim et al. [2020
李等人。 2018b,吴等人。 2019,张等人。 2018b,Wei 等人。 2019,徐等人。 2020a,郭等人。 2020a,郑等人。 2020b,潘等人。 2020、2019,Lu 等人。 2019a,Mallick 等人。 2020,张等人。 2020j,Lv 等人。 2020,李等人。 2020f,尹等人。 2020,郭等人。 2020b,Li & Zhu 2021,Chen 等人。 2020d,赵等人。 2020a,Bai 等人。 2021,唐等人。 2020a,James 2020,Shin & Yoon 2020,Liu 等人。 2020a,张等人。 2018a,2019f,Yu & Gu 2019,Xie 等人。 2019,张等人。 2019a,郭等人。 2019a,Diao 等人。 2019,Cirstea 等人。 2019,卢等人。 2019b,Zhang 等人。 2019c,James 2019,Ge 等人。 2019a、b,Zhang 等人。 2019b,Lee & Rhee 2022,Shleifer 等人。 2019,Yu 等人。 2020a,Ge 等人。 2020,卢等人。 2020b,杨等人。 2020,赵等人。 2019,崔等人。 2019,陈等人。 2019,张等人。 2019e,Yu 等人。 2019a,Lee & Rhee 2019,Bogaerts 等人。 2020,王等人。 2020f,崔等人。 2020b,a,郭等人。 2020c,Zhou 等人。 2020a,蔡等人。 2020,Zhou 等人。 2020b,吴等人。 2020c,陈等人。 2020f,Opolka 等人。 2019,马利克等人。 2021,奥列什金等人。 2021,贾等人。 2020,孙等人。 2021,Guo & Yuan 2020,Xie 等人。 2020b,张等人。 2020i,朱等人。 2021,冯等人。 2020,朱等人。 2020,傅等人。 2020,张等人。 2020d,Xie 等人。 2020c,Park 等人。 2020,阿加福诺夫 2020,陈等人。 2020a,Lu 等人。 2020a,杰普森等人。 2019、2020,Bing 等人。 2020,Lewenfus 等人。 2020,朱等人。 2022,廖等人。 2018 年,Maas 和 Bloem 2020 年,Li 等人。 2020d,宋等人。 2020b,赵等人。 2020b,郭鹏等人。 2020,金等人。 2020年
]
Road Travel Time 公路旅行时间  Guo et al. [2020a], Hasanzadeh et al. [2019], Fang et al. [2020b], Shao et al. [2020], Shen et al. [2020]
郭等人。 2020a,哈桑扎德等人。 2019,方等人。 2020b,Shao 等人。 2020,沉等人。 2020年
Traffic Congestion 交通拥堵  Dai et al. [2020], Mohanty & Pozdnukhov [2018], Mohanty et al. [2020], Qin et al. [2020a], Han et al. [2020
戴等人。 2020 年,Mohanty 和 Pozdnukhov 2018 年,Mohanty 等人。 2020,秦等人。 2020a,Han 等人。 2020年
]
Time of Arrival 到达时间  Hong et al. [2020 洪等人。 2020年]
Regional OD Taxi Speed 区域 OD 出租车速度  Hu et al. [2018 胡等人。 2018年]
Ride-hailing Demand 网约车需求  Pian & Wu [2020], Jin et al. [2020b], Li & Axhausen [2020], Jin et al. [2020a], Geng et al. [2019b], Lee et al. [2019], Bai et al. [2019b], Geng et al. [2019a], Bai et al. [2019a], Ke et al. [2021a], Li et al. [2020c
Pian & Wu 2020,Jin 等人。 2020b,Li 和 Axhausen 2020,Jin 等人。 2020a,耿等人。 2019b,李等人。 2019,白等人。 2019b,耿等人。 2019a,Bai 等人。 2019a,柯等人。 2021a,Li 等人。 2020c
]
Taxi Demand 出租车需求  Lee et al. [2019], Bai et al. [2019b, a], Ke et al. [2021b], Hu et al. [2020], Zheng et al. [2020a], Xu & Li [2019], Davis et al. [2020], Chen et al. [2020h], Du et al. [2020], Li & Moura [2020], Wu et al. [2020a], Ye et al. [2021
李等人。 2019,白等人。 2019b,a,Ke 等人。 2021b,胡等人。 2020,郑等人。 2020a,Xu & Li 2019,Davis 等人。 2020,陈等人。 2020h,杜等人。 2020,Li & Moura 2020,Wu 等人。 2020a,叶等人。 2021年
]
Shared Vehicle Demand 共享汽车需求  Luo et al. [2020 罗等人。 2020年]
Bike Demand 自行车需求  Lee et al. [2019], Bai et al. [2019b, a], Du et al. [2020], Ye et al. [2021], Chen et al. [2020b], Wang et al. [2020d], Qin et al. [2020b], Xiao et al. [2020], Yoshida et al. [2019], Guo et al. [2019b], Kim et al. [2019], Lin et al. [2018
Lee 等人 2019b,Du 等人 2021,Chen 等人 2020d,Qin 等人 2020吉田等人 2019,郭等人 2019b,金等人 2018
]
Traffic Accident 交通意外  Zhou et al. [2020e], Yu et al. [2020b], Zhang et al. [2020k], Zhou et al. [2020f]
周等人。 2020e,Yu 等人。 2020b,张等人。 2020k,Zhou 等人。 2020年f
Traffic Anomaly 流量异常  Liu et al. [2020c 刘等人。 2020c]
Parking Availability 停车位  Zhang et al. [2020g], Yang et al. [2019], Zhang et al. [2020f
张等人。 2020g,杨等人。 2019,张等人。 2020年f
]
Transportation Resilience
运输弹性
 Wang et al. [2020c 王等人。 2020c]
Urban Vehicle Emission 城市车辆排放  Xu et al. [2020d 徐等人。 2020d]
Railway Delay 铁路延误  Heglund et al. [2020 赫格伦德等人。 2020年]
Lane Occupancy 车道占用率  Wright et al. [2019 赖特等人。 2019年]

Generally speaking, traffic forecasting problems are challenging, not only for the complex temporal dependency, but only for the complex spatial dependency. While many solutions have been proposed for dealing with the time dependency, e.g., recurrent neural networks and temporal convolutional networks, the problem to capture and model the spatial dependency has not been fully solved. The spatial dependency, which refers to the complex and nonlinear relationship between the traffic state in one particular location with other locations. This location could be a road intersection, a subway station, or a city region. The spatial dependency may not be local, e.g., the traffic state may not only be affected by nearby areas, but also those which are far away in the spatial range but connected by a fast transportation tool. The graphs are necessary to capture such kind of spatial information as we would discuss in the next section.
一般来说,交通预测问题不仅对于复杂的时间依赖性而且对于复杂的空间依赖性都具有挑战性。虽然已经提出了许多解决时间依赖性的解决方案,例如循环神经网络和时间卷积网络,但捕获和建模空间依赖性的问题尚未完全解决。空间依赖性,是指某一特定地点的交通状态与其他地点之间的复杂且非线性的关系。该位置可以是道路交叉口、地铁站或城市区域。空间依赖性可能不是局部的,例如,交通状态可能不仅受到附近区域的影响,而且还受到空间范围较远但通过快速交通工具连接的区域的影响。这些图表对于捕获我们将在下一节中讨论的此类空间信息是必要的。

Before the usage of graph theories and GNNs, the spatial information is usually extracted by multivariate time series models or CNNs. Within a multivariate time series model, e.g., vector autoregression, the traffic states collected in different locations or regions are combined together as multivariate time series. However, the multivariate time series models can only extract the linear relationship among different states, which is not enough for modeling the complex and nonlinear spatial dependency. CNNs take a step further by modeling the local spatial information, e.g., the whole spatial range is divided into regular grids as the two-dimensional image format and the convolution operation is performed in the neighbor grids. However, the CNN-based approach is bounded to the case of Euclidean structure data, which cannot model the topological structure of the subway network or the road network.
在使用图论和 GNN 之前,通常通过多元时间序列模型或 CNN 来提取空间信息。在多元时间序列模型(例如向量自回归)中,在不同位置或区域收集的交通状态被组合在一起作为多元时间序列。然而,多元时间序列模型只能提取不同状态之间的线性关系,不足以对复杂的非线性空间依赖性进行建模。 CNN 更进一步,对局部空间信息进行建模,例如,将整个空间范围划分为规则网格作为二维图像格式,并在相邻网格中执行卷积运算。然而,基于CNN的方法仅限于欧几里得结构数据的情况,无法对地铁网络或道路网络的拓扑结构进行建模。

Graph neural networks bring new opportunities for solving traffic forecasting problems, because of their strong learning ability to capture the spatial information hidden in the non-Euclidean structure data, which are frequently seen in the traffic domain. Based on graph theories, both nodes and edges have their own attributes, which can be used further in the convolution or aggregation operations. These attributes describe different traffic states, e.g., volume, speed, lane numbers, road level, etc. For the dynamic spatial dependency, dynamic graphs can be learned from the data automatically. For the case of hierarchical traffic problems, the concepts of super-graphs and sub-graphs can be defined and further used.
图神经网络具有强大的学习能力,能够捕获交通领域中常见的非欧几里得结构数据中隐藏的空间信息,为解决交通预测问题带来了新的机遇。基于图论,节点和边都有自己的属性,可以在卷积或聚合操作中进一步使用。这些属性描述了不同的交通状态,例如流量、速度、车道数量、道路等级等。对于动态空间依赖性,可以从数据中自动学习动态图。对于分层流量问题的情况,可以定义并进一步使用超图和子图的概念。

3.1 Traffic Flow

Traffic flow is defined as the number of vehicles passing through a spatial unit, such as a road segment or traffic sensor point, in a given time period. An accurate traffic flow prediction is beneficial for a variety of applications, e.g., traffic congestion control, traffic light control, vehicular cloud, etc [Boukerche & Wang, 2020a]. For example, traffic light control can reduce vehicle staying time at the road intersections, optimizing the traffic flow, and reducing traffic congestion and vehicle emission.
交通流量定义为在给定时间段内通过某个空间单元(例如路段或交通传感器点)的车辆数量。准确的交通流预测有利于各种应用,例如交通拥堵控制、交通灯控制、车辆云等[Boukerche & Wang,2020a]。例如,交通灯控制可以减少车辆在道路交叉口的停留时间,优化交通流量,减少交通拥堵和车辆排放。

We consider three levels of traffic flow problems in this survey, namely, road-level flow, region-level flow, and station-level flow.
本次调查考虑三个层面的交通流问题,即道路级流量、区域级流量和车站级流量。

Road-level flow problems are concerned with traffic volumes on a road and include road traffic flow, road origin-destination (OD) Flow, and intersection traffic throughput. In road traffic flow problems, the prediction target is the traffic volume that passes a road sensor or a specific location along the road within a certain time period (e.g. five minutes). In the road OD flow problem, the target is the volume between one location (the origin) and another (the destination) at a single point in time. The intersection traffic throughput problem considers the volume of traffic moving through an intersection.
道路级流量问题与道路上的交通量有关,包括道路交通流量、道路起点-目的地 (OD) 流量和交叉路口交通吞吐量。在道路交通流问题中,预测目标是在一定时间段(例如五分钟)内通过道路传感器或道路沿线特定位置的交通量。在道路 OD 流量问题中,目标是单个时间点一个位置(起点)和另一位置(目的地)之间的流量。交叉口交通吞吐量问题考虑通过交叉口的交通量。

Region-level flow problems consider traffic volume in a region. A city may be divided into regular regions (where the partitioning is grid-based) or irregular regions (e.g. road-based or zip-code-based partitions). These problems are classified by transport mode into regional taxi flow, regional bike flow, regional ride-hailing flow, regional dockless e-scooter flow, regional OD taxi flow, regional OD bike flow, and regional OD ride-hailing flow problems.
区域级流量问题考虑区域内的流量。一个城市可以分为规则区域(基于网格的分区)或不规则区域(例如基于道路或基于邮政编码的分区)。这些问题按交通方式分为区域出租车流量、区域自行车流量、区域网约车流量、区域无桩电动滑板车流量、区域OD出租车流量、区域OD自行车流量和区域OD网约车流量问题。

Station-level flow problems relate to the traffic volume measured at a physical station, for example, a subway or bus station. These problems are divided by station type into station-level subway passenger flow, station-level bus passenger flow, station-level shared vehicle flow, station-level bike flow, and station-level railway passenger flow problems.
车站级流量问题与在物理车站(例如地铁站或公交车站)测量的交通量有关。这些问题按车站类型分为车站级地铁客流、车站级公交客流、车站级共享车流、车站级自行车流和车站级铁路客流问题。

Road-level traffic flow problems are further divided into cases of unidirectional and bidirectional traffic flow, whereas region-level and station-level traffic flow problems are further divided into the cases of inflow and outflow, based on different problem formulations.
根据问题表述的不同,道路级交通流问题进一步分为单向交通流和双向交通流情况,而区域级和车站级交通流问题根据问题表述的不同又进一步分为流入和流出两种情况。

While traffic sensors have been successfully used, data collection for traffic flow information is still a challenge when considering the high costs in deployment and maintenance of traffic sensors. Another potential approach is using the pervasive mobile and IoT devices, which have a lower cost generally, e.g., GPS sensors. However, challenges still exist when considering the data quality problems frequently seen in GPS data, e.g., missing data caused by unstable communication links.
尽管交通传感器已被成功使用,但考虑到交通传感器的部署和维护成本高昂,交通流信息的数据收集仍然是一个挑战。另一种潜在的方法是使用普遍的移动和物联网设备,它们通常成本较低,例如 GPS 传感器。然而,考虑到GPS数据中常见的数据质量问题,例如由于通信链路不稳定导致的数据丢失,挑战仍然存在。

The traffic light is another source of challenges for various traffic prediction tasks. Short-term traffic flow fluctuation and the spatial relation change between two road segments can be caused by the traffic light. The way of controlling the traffic light may be different in different time periods, causing an inconsistent traffic flow pattern.
交通信号灯是各种交通预测任务的另一个挑战来源。交通灯可以引起短期交通流量波动以及两个路段之间的空间关系变化。不同时间段控制交通灯的方式可能不同,导致交通流模式不一致。

3.2 Traffic Speed 3.2交通速度

Traffic speed is another important indicator of traffic state with potential applications in ITS systems, which is defined as the average speed of vehicles passing through a spatial unit in a given time period. The speed value on the urban road can reflect the crowdedness level of road traffic. For example, Google Maps visualizes this crowdedness level from crowd-sourcing data collected from individual mobile devices and in-vehicle sensors. A better traffic speed prediction is also useful for route navigation and estimation-of-arrival applications.
交通速度是智能交通系统中具有潜在应用前景的另一重要交通状态指标,其定义为给定时间段内车辆通过某一空间单元的平均速度。城市道路上的速度值可以反映道路交通的拥挤程度。例如,谷歌地图通过从个人移动设备和车载传感器收集的众包数据来可视化这种拥挤程度。更好的交通速度预测对于路线导航和预计到达应用也很有用。

We consider two levels of traffic speed problems in this survey, namely, road-level and region-level problems. We also include travel time and congestion predictions in this category because they are closely correlated to traffic speed. Travel time prediction is useful for passengers to plan their commuting time and for drivers to select fast routes, respectively. Traffic congestion is one of the most important and urgent transportation problems in cities, which brings significant time loss, air pollution and energy waste. The congestion prediction results can be used to control the road conditions and optimize vehicle flow, e.g., with traffic signal control. In several studies, traffic congestion is judged by a threshold-based speed inference. The specific road-level speed problem categories considered are road traffic speed, road travel time, traffic congestion, and time of arrival problems; while the region-level speed problem considered is regional OD taxi speed.
本次调查我们考虑两个层面的交通速度问题,即道路层面和区域层面的问题。我们还将行程时间和拥堵预测纳入此类别,因为它们与交通速度密切相关。出行时间预测分别有助于乘客规划通勤时间和驾驶员选择快捷路线。交通拥堵是城市最重要、最紧迫的交通问题之一,它带来了巨大的时间损失、空气污染和能源浪费。拥堵预测结果可用于控制道路状况并优化车流,例如通过交通信号控制。在一些研究中,交通拥堵是通过基于阈值的速度推断来判断的。考虑的具体道路级速度问题类别有道路交通速度、道路行驶时间、交通拥堵和到达时间问题;而考虑的区域级速度问题是区域OD出租车速度。

Traffic speed is concerned in both urban roads and freeways. However, the challenges differ in these two different scenarios. Freeways have a few traffic signals or on/off-ramps, making the prediction easier than the urban case. And the challenge mainly comes from the complex temporal dependency. More complex traffic networks exist in urban roads with more complicated connection patterns and abrupt changes. For example, different road segments may have different speed limit values and the allowed vehicle types. Besides the complex temporal dependency, modeling the spatial dependency becomes a bigger challenge for urban traffic speed forecasting.
城市道路和高速公路都关注交通速度。然而,这两种不同场景所面临的挑战有所不同。高速公路有一些交通信号或入口/出口匝道,使得预测比城市情况更容易。挑战主要来自复杂的时间依赖性。城市道路交通网络更加复杂,连接方式更加复杂,变化突变。例如,不同的路段可能具有不同的速度限制值和允许的车辆类型。除了复杂的时间依赖性之外,对空间依赖性进行建模也成为城市交通速度预测的更大挑战。

3.3 Traffic Demand 3.3流量需求

Traffic demand prediction is a key component for taxi and ride-hailing services to be successful, which benefits these service providers to allocate limited available transportation resources to those urban areas with a higher demand. For passengers, traffic demand prediction encourages the consideration of various transportation forms, e.g., taking the public transit service when taxi or ride-hailing services are in short supply.
交通需求预测是出租车和叫车服务成功的关键组成部分,这有利于这些服务提供商将有限的可用交通资源分配给需求较高的城市地区。对于乘客来说,交通需求预测鼓励考虑各种交通方式,例如在出租车或网约车服务短缺时乘坐公共交通服务。

Traffic demand refers to the potential demand for travel, which may or may not be fulfilled completely. For example, on an online ride-hailing platform, the ride requests sent by passengers represent the demand, whereas only a subset of these requests may be served depending on the supply of drivers and vehicles, especially during rush hours. Accurate prediction of travel demand is a key element of vehicle scheduling systems (e.g. online ride-hailing or taxi dispatch platforms). However, in some cases, it is difficult to collect the potential travel demand from passengers and a compromise method using transaction records as an indication of the traffic demand is used. In such cases the real demand may be underestimated. Based on transport mode, the traffic demand problems considered include ride-hailing demand, taxi demand, shared vehicle demand, and bike demand.
交通需求是指潜在的出行需求,可能会也可能不会完全满足。例如,在在线乘车平台上,乘客发送的乘车请求代表了需求,而根据司机和车辆的供应情况,尤其是在高峰时段,可能只能满足这些请求的一部分。准确预测出行需求是车辆调度系统(例如网约车或出租车调度平台)的关键要素。然而,在某些情况下,很难收集乘客的潜在出行需求,因此采用了以交易记录作为交通需求指示的折衷方法。在这种情况下,实际需求可能会被低估。根据交通方式,考虑的交通需求问题包括网约车需求、出租车需求、共享车需求和自行车需求。

3.4 Other Problems 3.4其他问题

In addition to the above three categories of traffic forecasting problems, GNNs are also being applied to the following problems.
除了上述三类流量预测问题外,GNN 还被应用于以下问题。

Traffic accident and Traffic anomaly: the target is to predict the traffic accident number reported to the police system. Traffic anomaly is the major cause of traffic delay and a timely detection and prediction would help the administrators to identify the situation and turn the traffic situation back to normal as quickly as possible. A traffic accident is usually an accident in road traffic involving different vehicles, which may cause significant loss of life and property. The traffic anomaly has a broader definition that deviates from the normal traffic state, e.g., the traffic jam caused by a traffic accident or a public procession.
交通事故和交通异常:目标是预测向警察系统报告的交通事故数量。交通异常是造成交通延误的主要原因,及时的检测和预测有助于管理人员及时发现情况,尽快使交通状况恢复正常。交通事故通常是道路交通中涉及不同车辆的事故,可能造成重大生命和财产损失。交通异常的定义更广泛,偏离正常交通状态,例如交通事故或公众游行造成的交通拥堵。

Parking availability: the target is to predict the availability of vacant parking space for cars in the streets or in a car parking lot.
停车位可用性:目标是预测街道或停车场的空置停车位的可用性。

Urban vehicle emission: while not directly related to traffic states, the prediction of urban vehicle emission is considered in Xu et al. [2020d]. Urban vehicle emission refers to the emission produced by motor vehicles, e.g., those use internal combustion engines. Urban vehicle emission is a major source of air pollutants and its amount is affected by different traffic states, e.g., the excess emission would be created in traffic congestion situations.
城市车辆排放:虽然与交通状况没有直接关系,但 Xu 等人考虑了城市车辆排放的预测。 [2020d]。城市车辆排放是指机动车辆(例如使用内燃机的车辆)产生的排放。城市车辆排放是空气污染物的主要来源,其排放量受不同交通状态的影响,例如在交通拥堵的情况下会产生超标排放。

Railway delay: the delay time of specific routes in the railway system is considered in  Heglund et al. [2020].
铁路延误:Heglund等人考虑了铁路系统中特定路线的延误时间。 [2020]。

Lane occupancy: With simulated traffic data, lane occupancy has been measured and predicted [Wright et al., 2019].
车道占用率:通过模拟交通数据,测量并预测了车道占用率 [Wright et al., 2019]。

4 Graphs and Graph Neural Networks
4图和图神经网络

In this section, we summarize the types of graphs and GNNs used in the surveyed studies, focusing on GNNs that are frequently used for traffic forecasting problems. The contributions of this section include an organized approach for classifying the different traffic graphs based on the domain knowledge, and a summary of the common ways for constructing adjacency matrices, which may not be encountered in other neural networks before and would be very helpful for those who would like to use graph neural networks. The different GNN structures already used for traffic forecasting problems are briefly introduced in this section too. For a wider and deeper discussion of GNNs, refer to Wu et al. [2020b], Zhou et al. [2020c], Zhang et al. [2020m].
在本节中,我们总结了调查研究中使用的图和 GNN 的类型,重点关注经常用于流量预测问题的 GNN。本节的贡献包括一种基于领域知识对不同流量图进行分类的有组织的方法,以及构建邻接矩阵的常用方法的总结,这些方法以前在其他神经网络中可能不会遇到,对于那些人来说非常有帮助谁想使用图神经网络。本节还简要介绍了已用于流量预测问题的不同 GNN 结构。有关 GNN 的更广泛、更深入的讨论,请参阅 Wu 等人。 [2020b],Zhou 等人。 [2020c],Zhang 等人。 [2020米]。

4.1 Traffic Graphs 4.1流量图

4.1.1 Graph Construction 4.1.1图的构建

A graph is the basic structure used in GNNs. It is defined as G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A), where V𝑉V is the set of vertices or nodes, E𝐸E is the set of edges between the nodes, and A𝐴A is the adjacency matrix. Both nodes and edges can be associated with different attributes in different GNN problems. Element aijsubscript𝑎𝑖𝑗a_{ij} of A𝐴A represents the “edge weight” between nodes i𝑖i and j𝑗j. For a binary connection matrix A𝐴A, aij=1subscript𝑎𝑖𝑗1a_{ij}=1 if there is an edge between nodes i𝑖i and j𝑗j in E𝐸E, and aij=0subscript𝑎𝑖𝑗0a_{ij}=0 otherwise. If A𝐴A is symmetric, the corresponding graph G𝐺G is defined as undirected. Otherwise, G𝐺G is directed, when the edge only exists in one direction between a node pair.
图是 GNN 中使用的基本结构。它定义为 G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A) ,其中 V𝑉V 是顶点或节点的集合, E𝐸E 是节点之间的边的集合, A𝐴A 的元素 aijsubscript𝑎𝑖𝑗a_{ij} 表示节点 i𝑖ij𝑗j 之间的“边权重”。对于二元连接矩阵 A𝐴Aaij=1subscript𝑎𝑖𝑗1a_{ij}=1 如果 E𝐸E 中的节点 i𝑖ij𝑗j 之间存在边,否则 aij=0subscript𝑎𝑖𝑗0a_{ij}=0 。如果 A𝐴A 是对称的,则对应的图 G𝐺G 被定义为无向。否则,当边仅存在于节点对之间的一个方向上时, G𝐺G 是有向的。

For simplicity, we assume that the traffic state is associated with the nodes. The other case with edges can be derived similarly. In practice, the traffic state is collected or aggregated in discrete time steps, e.g. five minutes or one hour, depending on the specific scenario.
为了简单起见,我们假设流量状态与节点相关联。其他有边缘的情况也可以类似推导。在实践中,交通状态是在离散的时间步长中收集或聚合的,例如五分钟或一小时,根据具体情况而定。

For a single time step t𝑡t, we denote the node feature matrix as χtRN×dsubscript𝜒𝑡superscript𝑅𝑁𝑑\chi_{t}\in{R}^{N\times d}, where N𝑁N is the number of nodes and d𝑑d is the dimension of the node features, i.e., the number of traffic state variables. Now we are ready to give a formal definition of traffic graph.
对于单个时间步 t𝑡t ,我们将节点特征矩阵表示为 χtRN×dsubscript𝜒𝑡superscript𝑅𝑁𝑑\chi_{t}\in{R}^{N\times d} ,其中 N𝑁N 是节点数, d𝑑d 是节点特征的维度,即交通状态变量的数量。现在我们准备给出流量图的正式定义。

Definition 4.1 (Traffic Graph).

A traffic graph (with node features) is defined as a specific type of graph G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A), where V𝑉V is the node set, E𝐸E is the edge set, and A𝐴A is the adjacency matrix. For a single time step t𝑡t, the node feature matrix χtRN×dsubscript𝜒𝑡superscript𝑅𝑁𝑑\chi_{t}\in{R}^{N\times d} for G𝐺G contains specific traffic states, where N𝑁N is the number of nodes and d𝑑d is the number of traffic state variables.


定义 4.1(流量图)。流量图(具有节点特征)被定义为特定类型的图 G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A) ,其中 V𝑉V 是节点集, E𝐸E 是边集, A𝐴A 是邻接矩阵。对于单个时间步 t𝑡tG𝐺G 的节点特征矩阵 χtRN×dsubscript𝜒𝑡superscript𝑅𝑁𝑑\chi_{t}\in{R}^{N\times d} 包含特定的流量状态,其中 N𝑁N 是节点数量 d𝑑d 是交通状态变量的数量。

Then we give a formal definition of graph-based traffic forecasting problem without leveraging external factors firstly.
然后我们在不首先利用外部因素的情况下给出基于图的流量预测问题的正式定义。

Definition 4.2 (Graph-based Traffic Forecasting).

A graph-based traffic forecasting (without external factors) is defined as follows: find a function f𝑓f which generates y=f(χ;G)𝑦𝑓𝜒𝐺y=f(\mathbf{\chi};G), where y𝑦y is the traffic state to be predicted, χ={χ1,χ2,,χT}𝜒subscript𝜒1subscript𝜒2subscript𝜒𝑇\mathbf{\chi}=\{\chi_{1},\chi_{2},...,\chi_{T}\} is the historical traffic state defined on graph G𝐺G, and T𝑇T is the number of time steps in the historical window size.


定义 4.2(基于图的流量预测)。基于图的流量预测(没有外部因素)定义如下:找到一个生成 y=f(χ;G)𝑦𝑓𝜒𝐺y=f(\mathbf{\chi};G) 的函数 f𝑓f ,其中 y𝑦y 是要预测的流量状态预测, χ={χ1,χ2,,χT}𝜒subscript𝜒1subscript𝜒2subscript𝜒𝑇\mathbf{\chi}=\{\chi_{1},\chi_{2},...,\chi_{T}\} 是图 G𝐺G 上定义的历史流量状态, T𝑇T 是历史窗口大小中的时间步数。

In single step forecasting, the traffic state in the next time step only is predicted, whereas in multiple step forecasting the traffic state several time steps later is the prediction target. As mentioned in Section 1, traffic states can be highly affected by external factors, e.g. weather and holidays. The forecasting problem formulation, extended to incorporate these external factors, takes the form y=f(χ,ε;G)𝑦𝑓𝜒𝜀𝐺y=f(\mathbf{\chi},\varepsilon;G), where ε𝜀\varepsilon represents the external factors. Figure 1 demonstrates the graph-based traffic forecasting problem, where different color patches represent different traffic variables.
在单步预测中,仅预测下一个时间步的交通状态,而在多步预测中,将几个时间步之后的交通状态作为预测目标。正如第 1 节中提到的,交通状态可能会受到外部因素的很大影响,例如。天气和假期。预测问题公式扩展为包含这些外部因素,采用 y=f(χ,ε;G)𝑦𝑓𝜒𝜀𝐺y=f(\mathbf{\chi},\varepsilon;G) 形式,其中 ε𝜀\varepsilon 代表外部因素。图 1 演示了基于图的流量预测问题,其中不同的色块代表不同的流量变量。

Refer to caption
Figure 1: The single-step graph-based traffic forecasting problem. Adapted from Ye et al. [2020a] with external factors added.
图 1:基于图的单步流量预测问题。改编自 Ye 等人。 [2020a] 添加了外部因素。

Various graph structures are used to model traffic forecasting problems depending on both the forecasting problem-type and the traffic datasets available. These graphs can be pre-defined static graphs, or dynamic graphs continuously learned from the data. The static graphs can be divided into two types, namely, natural graphs and similarity graphs. Natural graphs are based on a real-world transportation system, e.g. the road network or subway system; whereas similarity graphs are based solely on the similarity between different node attributes where nodes may be virtual stations or regions.
根据预测问题类型和可用的交通数据集,使用各种图结构对交通预测问题进行建模。这些图可以是预先定义的静态图,也可以是从数据中不断学习的动态图。静态图可以分为两类,即自然图和相似图。自然图基于现实世界的交通系统,例如道路网络或地铁系统;而相似性图仅基于不同节点属性之间的相似性,其中节点可以是虚拟站或区域。

We categorize the existing traffic graphs into the same three levels used in Section 3, namely, road-level, region-level and station-level graphs.
我们将现有的交通图分类为与第 3 节中使用的相同的三个级别,即道路级、区域级和车站级图。

Road-level graphs. These include sensor graphs, road segment graphs, road intersection graphs, and road lane graphs. Sensor graphs are based on traffic sensor data (e.g. the PeMS dataset) where each sensor is a node, and the edges are road connections. The other three graphs are based on road networks with the nodes formed by road segments, road intersections, and road lanes, respectively. The real-world case and example of road-level graphs are shown in Figure 2. In some cases, road-level graphs are the most suitable format, e.g., when vehicles can move only through pre-defined roads.
道路级别的图表。其中包括传感器图、路段图、道路交叉口图和道路车道图。传感器图基于交通传感器数据(例如 PeMS 数据集),其中每个传感器是一个节点,边缘是道路连接。其他三个图基于道路网络,其节点分别由路段、道路交叉口和道路车道形成。现实世界的情况和道路级别图的示例如图 2 所示。在某些情况下,道路级别图是最合适的格式,例如,当车辆只能通过预定义的道路移动时。

Refer to caption
Refer to caption
Figure 2: The real-world case and example of road-level graphs. 2 The road network in the Performance Measurement System (PeMS) where each sensor is a node. Source: http://pems.dot.ca.gov/. 2 The road-level graph examples. Adapted from Ye et al. [2020a].
图 2:现实世界案例和道路级别图示例。 2 性能测量系统 (PeMS) 中的道路网络,其中每个传感器都是一个节点。资料来源:http://pems.dot.ca.gov/。 2 道路级图示例。改编自 Ye 等人。 [2020a]。

Region-level graphs. These include irregular region graphs, regular region graphs, and OD graphs. In both irregular and regular region graphs the nodes are regions of the city. Regular region graphs, which have grid-based partitioning, are listed separately because of their natural connection to previous widely used grid-based forecasting using CNNs, in which the grids may be seen as image pixels. Irregular region graphs include all other partitioning approaches, e.g. road based, or zip code based Ke et al. [2021b]. In the OD graph, the nodes are origin region - destination region pairs. In these graphs, the edges are usually defined with a spatial neighborhood or other similarities, e.g., functional similarity derived from point-of-interests (PoI) data. The real-world case and example of region-level graphs are shown in Figure 3.
区域级图表。其中包括不规则区域图、规则区域图和 OD 图。在不规则区域图中和规则区域图中,节点都是城市的区域。具有基于网格划分的常规区域图被单独列出,因为它们与之前广泛使用的使用 CNN 的基于网格的预测有天然的联系,其中网格可以被视为图像像素。不规则区域图包括所有其他划分方法,例如基于道路,或基于邮政编码 Ke 等人。 [2021b]。在OD图中,节点是起始区域-目的地区域对。在这些图中,边缘通常用空间邻域或其他相似性来定义,例如从兴趣点 (PoI) 数据导出的功能相似性。实际案例和区域级图示例如图 3 所示。

Refer to caption
Refer to caption
Figure 3: The real-world case and example of region-level graphs. 3 The zip codes of Manhattan where each zip code zone is a node. Source: https://maps-manhattan.com/manhattan-zip-code-map. 3 The region-level graph example.
图 3:现实世界案例和区域级图表示例。 3 曼哈顿的邮政编码,其中每个邮政编码区域都是一个节点。资料来源:https://maps-manhattan.com/manhattan-zip-code-map。 3 区域级图示例。

Station-level graphs. These include subway station graphs, bus station graphs, bike station graphs, railway station graphs, car-sharing station graphs, parking lot graphs, and parking block graphs. Usually, there are natural links between stations that are used to define the edges, e.g. subway or railway lines, or the road network. The real-world case and example of station-level graphs are shown in Figure 4.
站级图表。其中包括地铁站图、公交车站图、自行车站图、火车站图、汽车共享站图、停车场图和停车场图。通常,用于定义边缘的站点之间存在自然链接,例如地铁或铁路线,或公路网。实际情况和站级图示例如图 4 所示。

Refer to caption
Refer to caption
Figure 4: The real-world case and example of station-level graphs. 4 The Beijing subway system where each subway station is a node. Source: https://www.travelchinaguide.com/cityguides/beijing/transportation/subway.htm. 4 The station-level graph example.
图 4:现实世界案例和站级图表示例。 4 北京地铁系统,每个地铁站都是一个节点。资料来源:https://www.travelchinaguide.com/cityguides/beijing/transportation/subway.htm。 4 站级图示例。

A full list of the traffic graphs used in the surveyed studies is shown in Table 2. Sensor graphs and road segment graphs are most frequently used because they are compatible with the available public datasets as discussed in Section 5. It is noted that in some studies multiple graphs are used as simultaneous inputs and then fused to improve the forecasting performance [Lv et al., 2020, Zhu et al., 2019].

Table 2: Traffic graphs in the surveyed studies.
Graph Node Edge Relevant Studies
Sensor Graph Traffic Sensors Road Links Li et al. [2018b], Wu et al. [2019], Xu et al. [2020a], Zheng et al. [2020b], Pan et al. [2020, 2019], Lu et al. [2019a], Mallick et al. [2020], Zhang et al. [2020j], Bai et al. [2020], Huang et al. [2020a], Zhang et al. [2020e], Song et al. [2020a], Xu et al. [2020b], Wang et al. [2020g], Chen et al. [2020e], Lv et al. [2020], Kong et al. [2020], Fukuda et al. [2020], Zhang & Guo [2020], Boukerche & Wang [2020b], Tang et al. [2020b], Kang et al. [2019], Guo et al. [2019c], Li et al. [2019b], Sun et al. [2020], Wei & Sheng [2020], Li et al. [2020f], Cao et al. [2020], Yu et al. [2018, 2019b], Li et al. [2020b], Yin et al. [2020], Chen et al. [2020g], Zhang et al. [2020a], Wang et al. [2020a], Xin et al. [2020], Xie et al. [2020d], Huang et al. [2020b], Li & Zhu [2021], Tian et al. [2020], Xu et al. [2020c], Chen et al. [2020c], Xiong et al. [2020], Chen et al. [2020d], Tang et al. [2020a], Zhang et al. [2018a, 2019a], Cirstea et al. [2019], Ge et al. [2019a, b], Shleifer et al. [2019], Ge et al. [2020], Yang et al. [2020], Zhao et al. [2019], Cui et al. [2019], Chen et al. [2019], Yu et al. [2019a], Wang et al. [2020f], Cui et al. [2020b, a], Zhou et al. [2020a], Cai et al. [2020], Zhou et al. [2020b], Wu et al. [2020c], Chen et al. [2020f], Opolka et al. [2019], Mallick et al. [2021], Oreshkin et al. [2021], Jia et al. [2020], Sun et al. [2021], Guo & Yuan [2020], Zhang et al. [2020i], Feng et al. [2020], Xie et al. [2020c], Park et al. [2020], Chen et al. [2020a], Lewenfus et al. [2020], Maas & Bloem [2020], Li et al. [2020d], Song et al. [2020b], Zhao et al. [2020b], Wang et al. [2020c
李等人。 2018b,吴等人。 2019,徐等人。 2020a,郑等人。 2020b,潘等人。 2020、2019,Lu 等人。 2019a,Mallick 等人。 2020,张等人。 2020j,白等人。 2020,黄等人。 2020a,张等人。 2020e,宋等人。 2020a,Xu 等人。 2020b,王等人。 2020g,陈等人。 2020e,Lv 等人。 2020,孔等人。 2020,福田等人。 2020,Zhang 和Guo 2020,Boukerche 和Wang 2020b,Tang 等人。 2020b,康等人。 2019,郭等人。 2019c,Li 等人。 2019b,Sun 等人。 2020,Wei & Shen 2020,Li 等人。 2020f,曹等人。 2020,Yu 等人。 2018、2019b,Li 等人。 2020b,尹等人。 2020,陈等人。 2020g,张等人。 2020a,王等人。 2020a,Xin 等人。 2020,谢等人。 2020d,黄等人。 2020b,Li & Zhu 2021,Tian 等人。 2020,徐等人。 2020c,陈等人。 2020c,Xiong 等人。 2020,陈等人。 2020d,唐等人。 2020a,张等人。 2018a、2019a,Cirstea 等人。 2019,葛等人。 2019a、b,Shleifer 等人。 2019,葛等人。 2020,杨等人。 2020,赵等人。 2019,崔等人。 2019,陈等人。 2019,Yu 等人。 2019a,王等人。 2020f,崔等人。 2020b,a,Zhou 等人。 2020a,蔡等人。 2020,Zhou 等人。 2020b,吴等人。 2020c,陈等人。 2020f,Opolka 等人。 2019,马利克等人。 2021,奥列什金等人。 2021,贾等人。 2020,孙等人。 2021,Guo & Yuan 2020,Zhang 等人。 2020i,冯等人。 2020,谢等人。 2020c,Park 等人。 2020,陈等人。 2020a,Lewenfus 等人。 2020,Maas 和 Bloem 2020,Li 等人。 2020d,宋等人。 2020b,赵等人。 2020b,王等人。 2020c
]
Road Segment Graph 路段图 Road Segments 路段 Road Intersections 道路交叉口 Zhang et al. [2018b], Guo et al. [2020a], Pan et al. [2019], Zhang et al. [2020j, l], Wang et al. [2018b], Zhang et al. [2020e], Lv et al. [2020], Zhang et al. [2019d, 2020a], Qu et al. [2020], Guo et al. [2020b], Ramadan et al. [2020], Zhao et al. [2020a], Bai et al. [2021], Shin & Yoon [2020], Liu et al. [2020a], Yu & Gu [2019], Xie et al. [2019], Guo et al. [2019a], Diao et al. [2019], Lu et al. [2019b], Zhang et al. [2019c], James [2019], Zhang et al. [2019b], Lee & Rhee [2022], Yu et al. [2020a], Lu et al. [2020b], Zhao et al. [2019], Cui et al. [2019], Zhang et al. [2019e], Lee & Rhee [2019], Cui et al. [2020b, a], Guo et al. [2020c], Xie et al. [2020b], Zhu et al. [2021, 2020], Fu et al. [2020], Zhang et al. [2020d], Agafonov [2020], Lu et al. [2020a], Jepsen et al. [2019, 2020], Zhu et al. [2022], Liao et al. [2018], Guopeng et al. [2020], Kim et al. [2020], Hasanzadeh et al. [2019], Fang et al. [2020b], Dai et al. [2020], Han et al. [2020], Hong et al. [2020], Chen et al. [2020h], Yu et al. [2020b
张等人 2018b,郭等人 2019,张等人 2020e,张等人 2019d,2020a ,Qu 等人 2020,Ramadan 等人 2020,Bai 等人 2020,Liu 等人 2020a,Xie 等人。 2019,Guo 等人 2019a,Lu 等人 2019b,Zhang 等人 2019b,Lee 等人 2020a,Lu 等人 2020b ,Zhao 等人 2019,Zhang 等人 2019e,Cui 等人 2020b,Guo 等人 2020b,Zhu 等人 2021。 2020,Zhang 等人 2020d,Lu 等人 2020a,Jepsen 等人 2022,Liao 等人 2018,Kim 等人。等人 2020,Fang 等人 2020,Han 等人 2020,Yu 等人 2020b
]
Road Intersection Graph 道路交叉口图 Road Intersections 道路交叉口 Road Segments 路段 Zhang et al. [2018b], Wei et al. [2019], Fang et al. [2019], Zhang et al. [2020e], Xu et al. [2019], Wu et al. [2018a], Sánchez et al. [2020], James [2020], Zhang et al. [2019f], Lu et al. [2019b], Zhang et al. [2019c], Bogaerts et al. [2020], Shao et al. [2020], Qin et al. [2020a
张等人。 2018b,Wei 等人。 2019,方等人。 2019,张等人。 2020e,徐等人。 2019,吴等人。 2018a,桑切斯等人。 2020,詹姆斯 2020,张等人。 2019f,Lu 等人。 2019b,Zhang 等人。 2019c,Bogaerts 等人。 2020,Shao 等人。 2020,秦等人。 2020年a
]
Road Lane Graph 道路车道图 Road Lanes 道路车道 Road Line Adjacency 道路线邻接 Wright et al. [2019 赖特等人。 2019年]
Irregular Region Graph 不规则区域图 Irregular Regions 不规则区域 Regional Adjacency or Virtual Edges
区域邻接或虚拟边缘
Zhou et al. [2020d], Sun et al. [2020], Chen et al. [2020d], Bing et al. [2020], Mohanty & Pozdnukhov [2018], Mohanty et al. [2020], Hu et al. [2018], Li & Axhausen [2020], Bai et al. [2019b, a], Ke et al. [2021a], Hu et al. [2020], Zheng et al. [2020a], Davis et al. [2020], Du et al. [2020], Li & Moura [2020], Ye et al. [2021], Zhang et al. [2020k], Liu et al. [2020c
周等人。 2020d,Sun 等人。 2020,陈等人。 2020d,Bing 等人。 2020,Mohanty 和 Pozdnukhov 2018,Mohanty 等人。 2020,胡等人。 2018,Li & Axhausen 2020,Bai 等人。 2019b,a,Ke 等人。 2021a,胡等人。 2020,郑等人。 2020a,戴维斯等人。 2020,杜等人。 2020,Li & Moura 2020,Ye 等人。 2021,张等人。 2020k,刘等人。 2020年c
]
Regular Region Graph 正则区域图 Regular Regions 常规地区 Regional Adjacency or Virtual Edges
区域邻接或虚拟边缘
Pan et al. [2020], Wang et al. [2020b], Zhang et al. [2020h], Wang et al. [2018a], Zhou et al. [2019], Wang et al. [2020e], Qiu et al. [2020], He & Shin [2020a], Yeghikyan et al. [2020], Shi et al. [2020], Wang et al. [2019], Shen et al. [2020], Pian & Wu [2020], Jin et al. [2020b, a], Geng et al. [2019b], Lee et al. [2019], Geng et al. [2019a], Li et al. [2020c], Xu & Li [2019], Davis et al. [2020], Wu et al. [2020a], Zhou et al. [2020e, f], Xu et al. [2020d
潘等人。 2020,王等人。 2020b,张等人。 2020h,王等人。 2018a,Zhou 等人。 2019,王等人。 2020e,Qiu 等人。 2020,He & Shin 2020a,Yeghikyan 等人。 2020,Shi 等人。 2020,王等人。 2019,沉等人。 2020,Pian & Wu 2020,Jin 等人。 2020b,a,耿等人。 2019b,李等人。 2019,耿等人。 2019a,Li 等人。 2020c,Xu & Li 2019,Davis 等人。 2020,吴等人。 2020a,Zhou 等人。 2020e, f, Xu 等人。 2020d
]
OD Graph OD图 OD Pair 外径对 Virtual Edges 虚拟边缘 Wang et al. [2020h], Ke et al. [2021b
王等人。 2020h,Ke 等人。 2021b
]
Subway Station Graph 地铁站图 Subway Stations 地铁站 Subway Lines 地铁线路 Fang et al. [2019, 2020a], Ren & Xie [2019], Li et al. [2018a], Zhao et al. [2020a], Han et al. [2019], Zhang et al. [2020b, c], Li et al. [2020e], Liu et al. [2020b], Ye et al. [2020b], Ou et al. [2020
方等人。 2019, 2020a,Ren & Xie 2019,Li 等人。 2018a,赵等人。 2020a,Han 等人。 2019,张等人。 2020b,c,Li 等人。 2020e,刘等人。 2020b,叶等人。 2020b,Ou 等人。 2020年
]
Bus Station Graph 公交车站图 Bus Stations 巴士站 Bus Lines 公交线路 Fang et al. [2019, 2020a
方等人。 2019, 2020a
]
Bike Station Graph 自行车站图 Bike Stations 自行车站 Road Links 道路链接 He & Shin [2020b], Chai et al. [2018], Du et al. [2020], Chen et al. [2020b], Wang et al. [2020d], Qin et al. [2020b], Xiao et al. [2020], Yoshida et al. [2019], Guo et al. [2019b], Kim et al. [2019], Lin et al. [2018
He & Shin 2020b,Chai 等人。 2018,杜等人。 2020,陈等人。 2020b,王等人。 2020d,Qin 等人。 2020b,肖等人。 2020,吉田等人。 2019,郭等人。 2019b,Kim 等人。 2019,林等人。 2018年
]
Railway Station Graph 火车站图 Railway Stations 火车站 Railway Lines 铁路线 He et al. [2020], Heglund et al. [2020
他等人。 2020,赫格伦德等人。 2020年
]
Car-sharing Station Graph
共享汽车站图
Car-sharing Stations 汽车共享站 Road Links 道路链接 Zhu et al. [2019], Luo et al. [2020
朱等人。 2019,罗等人。 2020年
]
Parking Lot Graph 停车场图 Parking Lots 停车场 Road Links 道路链接 Zhang et al. [2020g, f 张等人。 2020克,f]
Parking Block Graph 停车位图 Parking Blocks 停车位 Road Links 道路链接 Yang et al. [2019 杨等人。 2019年]

4.1.2 Adjacency Matrix Construction
4.1.2邻接矩阵构造

Adjacency matrices are seen as the key to capturing spatial dependency in traffic forecasting [Ye et al., 2020a]. While nodes may be fixed by physical constraints, the user typically has control over the design of the adjacency matrix, which can even be dynamically trained from continuously evolving data. We extend the categories of adjacency matrices used in previous studies [Ye et al., 2020a] and divide them into four types, namely, road-based, distance-based, similarity-based, and dynamic matrices.
邻接矩阵被视为捕获流量预测中空间依赖性的关键 [Ye et al., 2020a]。虽然节点可能受到物理约束的固定,但用户通常可以控制邻接矩阵的设计,甚至可以根据不断发展的数据进行动态训练。我们扩展了之前研究中使用的邻接矩阵的类别[Ye et al., 2020a],并将其分为四种类型,即基于道路的矩阵、基于距离的矩阵、基于相似性的矩阵和动态矩阵。

Road-based Matrix. This type of adjacency matrix relates to the road network and includes connection matrices, transportation connectivity matrices, and direction matrices. A connection matrix is a common way of representing the connectivity between nodes. It has a binary format, with an element value of 1 if connected and 0 otherwise. The transportation connectivity matrix is used where two regions are geographically distant but conveniently reachable by motorway, highway, or subway [Ye et al., 2020a]. It also includes cases where the connection is measured by travel time between different nodes, e.g. if a vehicle can travel between two intersections in less than 5 minutes then there is an edge between the two intersections [Wu et al., 2018a]. The less commonly used direction matrix takes the angle between road links into consideration.
基于道路的矩阵。此类邻接矩阵与道路网络相关,包括连接矩阵、交通连通性矩阵和方向矩阵。连接矩阵是表示节点之间连接性的常用方法。它具有二进制格式,如果已连接,则元素值为 1,否则为 0。交通连通性矩阵适用于地理上相距较远但可通过高速公路、高速公路或地铁方便到达的两个区域 [Ye et al., 2020a]。它还包括通过不同节点之间的旅行时间来衡量连接的情况,例如如果车辆可以在 5 分钟内在两个十字路口之间行驶,则两个十字路口之间存在边缘 [Wu et al., 2018a]。不太常用的方向矩阵考虑了道路连接之间的角度。

Distance-based Matrix. This widely used matrix-type represents the spatial closeness between nodes. It contains two sub-types, namely, neighbor and distance matrices. In neighbor matrices, the element values are determined by whether the two regions share a common boundary (if connected the value is set to 1, generally, or 1/4 for grids, and 0 otherwise). In distance-based matrices, the element values are a function of geometrical distance between nodes. This distance may be calculated in various ways, e.g. the driving distance between two sensors, the shortest path length along the road [Kang et al., 2019, Lee & Rhee, 2022], or the proximity between locations calculated by the random walk with restart (RWR) algorithm [Zhang et al., 2019e]. One flaw of distance-based matrices is that the fail to take into account the similarity of traffic states between long-distance nodes, and the constructed adjacency matrix is static in most cases.
基于距离的矩阵。这种广泛使用的矩阵类型表示节点之间的空间接近度。它包含两个子类型,即邻居矩阵和距离矩阵。在邻域矩阵中,元素值由两个区域是否共享公共边界来确定(如果连接,则该值通常设置为 1,对于网格,则设置为 1/4,否则设置为 0)。在基于距离的矩阵中,元素值是节点之间几何距离的函数。该距离可以通过多种方式计算,例如两个传感器之间的行驶距离、沿路的最短路径长度 [Kang et al., 2019, Lee & Rhee, 2022],或通过重新启动随机游走 (RWR) 算法计算的位置之间的接近度 [Zhang et al., 2019, Lee & Rhee, 2022]。 ,2019e]。基于距离的矩阵的一个缺陷是没有考虑远距离节点之间交通状态的相似性,并且构造的邻接矩阵在大多数情况下是静态的。

Similarity-based Matrix. This type of matrix is divided into two sub-types, namely, traffic pattern and functional similarity matrices. Traffic pattern similarity matrices represent the correlations between traffic states, e.g. similarities of flow patterns, mutual dependencies between different locations, and traffic demand correlation in different regions. Functional similarity matrices represent, for example, the distribution of different types of PoIs in different regions.
基于相似性的矩阵。此类矩阵分为两个子类型,即流量模式矩阵和功能相似性矩阵。交通模式相似度矩阵表示交通状态之间的相关性,例如流量模式的相似性、不同地点之间的相互依赖关系以及不同区域的交通需求相关性。例如,功能相似性矩阵表示不同类型的 PoI 在不同区域的分布。

Dynamic Matrix. This type of matrix is used when no pre-defined static matrices are used. Many studies have demonstrated the advantages of using dynamic matrices, instead of a pre-defined adjacency matrix, for various traffic forecasting problems.
动态矩阵。当没有使用预定义的静态矩阵时,使用这种类型的矩阵。许多研究已经证明,对于各种交通预测问题,使用动态矩阵而不是预定义的邻接矩阵具有优势。

A full list of the adjacency matrices applied in the surveyed studies is shown in Table 3. Dynamic matrices are listed at the bottom of the table, with no further subdivisions. The connection and distance matrices are the most frequently used types, because of their simple definition and representation of spatial dependency.
表 3 显示了调查研究中应用的邻接矩阵的完整列表。动态矩阵列在表底部,没有进一步细分。连接矩阵和距离矩阵是最常用的类型,因为它们的定义和空间依赖性的表示很简单。

Table 3: Adjacency matrices in the surveyed studies.
表 3:调查研究中的邻接矩阵。
Adjacency Matrix 邻接矩阵 Formula Relevant Studies 相关研究
Connection Matrix 连接矩阵 aij=1subscript𝑎𝑖𝑗1a_{ij}=1 when nodes i𝑖i and j𝑗j are connected and aij=0subscript𝑎𝑖𝑗0a_{ij}=0 otherwise
aij=1subscript𝑎𝑖𝑗1a_{ij}=1 当节点 i𝑖ij𝑗j 连接时, aij=0subscript𝑎𝑖𝑗0a_{ij}=0 否则
Zhang et al. [2018b], Wei et al. [2019], Xu et al. [2020a], Guo et al. [2020a], Zhang et al. [2020l], Wang et al. [2018b], Song et al. [2020a], Zhang & Guo [2020], Xu et al. [2019], Cao et al. [2020], Yu et al. [2019b], Chen et al. [2020g], Zhang et al. [2020a], Qu et al. [2020], Wang et al. [2020b], Huang et al. [2020b], Xiong et al. [2020], Sánchez et al. [2020], Wang et al. [2020h], Zhang et al. [2020c], Li et al. [2020e], Liu et al. [2020b], Ou et al. [2020], He et al. [2020], Bai et al. [2021], Liu et al. [2020a], Zhang et al. [2019f], Yu & Gu [2019], Xie et al. [2019], Guo et al. [2019a], Lu et al. [2019b], Zhang et al. [2019c], James [2019], Zhang et al. [2019b], Zhao et al. [2019], Cui et al. [2019, 2020b, 2020a], Wu et al. [2020c], Opolka et al. [2019], Sun et al. [2021], Guo & Yuan [2020], Xie et al. [2020b], Zhu et al. [2021, 2020], Zhang et al. [2020d], Agafonov [2020], Chen et al. [2020a], Lu et al. [2020a], Bing et al. [2020], Zhu et al. [2022], Fang et al. [2020b], Shao et al. [2020], Shen et al. [2020], Qin et al. [2020a], Hong et al. [2020], Xu & Li [2019], Davis et al. [2020], Chen et al. [2020h], Wang et al. [2020d], Zhou et al. [2020e], Yu et al. [2020b], Liu et al. [2020c], Zhang et al. [2020g, f], Heglund et al. [2020], Yin et al. [2020], Zhang et al. [2020b
张等人 2018b,徐等人 2020a,张等人 2020l,宋等人 2020a,徐等人。 . 2019,Yu 等人 2020g,Zhang 等人 2020b,Huang 等人 2020 ,Sánchez 等人 2020,Zhang 等人 2020e,Liu 等人 2020,He 等人 2021,Liu 等人 2020。等人 2020a,张等人 2019,Xie 等人 2019a,Lu 等人 2019c,James 2019,Zhang 等人 2019等人 2019,Cui 等人 2020c,Opolka 等人 2021,Guo 等人 2020,Zhu 等人 2021 2020,Agafonov 2020a,Lu 等人 2020,Zhu 等人 2022,Shao 等人 2020 2020,Qin 等人 2020a,Xu 等人 2020,Chen 等人 2020d,Zhou 等人 2020b,Liu 等人 2020。等人 2020c,张等人 2020g,赫格伦德等人 2020,张等人 2020b
]
Transportation Connectivity Matrix
交通连接矩阵
aij=1subscript𝑎𝑖𝑗1a_{ij}=1 when one can travel from node i𝑖i to node j𝑗j and aij=0subscript𝑎𝑖𝑗0a_{ij}=0 otherwise
aij=1subscript𝑎𝑖𝑗1a_{ij}=1 当可以从节点 i𝑖i 移动到节点 j𝑗j 时,否则为 aij=0subscript𝑎𝑖𝑗0a_{ij}=0
Pan et al. [2020, 2019], Lv et al. [2020], Wu et al. [2018a], Ye et al. [2020b], Geng et al. [2019b, a], Luo et al. [2020], Wright et al. [2019
潘等人。 2020、2019,Lv 等人。 2020,吴等人。 2018a,叶等人。 2020b,耿等人。 2019b,a,罗等人。 2020,赖特等人。 2019年
]
Direction Matrix 方向矩阵 aij=subscript𝑎𝑖𝑗absenta_{ij}= the angle between two road segments
aij=subscript𝑎𝑖𝑗absenta_{ij}= 两条路段之间的角度
Shin & Yoon [2020], Lee & Rhee [2022, 2019
申&尹 2020, 李&李 2022, 2019
]
Neighbor Matrix 邻域矩阵 aij=1subscript𝑎𝑖𝑗1a_{ij}=1 when nodes i𝑖i and j𝑗j are neighbors and aij=0subscript𝑎𝑖𝑗0a_{ij}=0 otherwise
aij=1subscript𝑎𝑖𝑗1a_{ij}=1 当节点 i𝑖ij𝑗j 是邻居时, aij=0subscript𝑎𝑖𝑗0a_{ij}=0 否则
Wang et al. [2018a], Yeghikyan et al. [2020], Shi et al. [2020], Wang et al. [2019], Hu et al. [2018], Geng et al. [2019b], Lee et al. [2019], Ke et al. [2021a, b], Hu et al. [2020], Zheng et al. [2020a], Yoshida et al. [2019
王等人。 2018a,Yeghikyan 等人。 2020,Shi 等人。 2020,王等人。 2019,胡等人。 2018,耿等人。 2019b,李等人。 2019,柯等人。 2021a,b,胡等人。 2020,郑等人。 2020a,吉田等人。 2019年
]
Distance Matrix 距离矩阵 aij=dijsubscript𝑎𝑖𝑗subscript𝑑𝑖𝑗a_{ij}=d_{ij} and dijsubscript𝑑𝑖𝑗d_{ij} is some distance between nodes i𝑖i and j𝑗j
aij=dijsubscript𝑎𝑖𝑗subscript𝑑𝑖𝑗a_{ij}=d_{ij}dijsubscript𝑑𝑖𝑗d_{ij} 是节点 i𝑖ij𝑗j 之间的一定距离
Li et al. [2018b], Zheng et al. [2020b], Pan et al. [2020, 2019], Lu et al. [2019a], Mallick et al. [2020], Huang et al. [2020a], Xu et al. [2020b], Wang et al. [2020g], Boukerche & Wang [2020b], Kang et al. [2019], Sun et al. [2020], Wei & Sheng [2020], Yu et al. [2018], Li et al. [2020b], Chen et al. [2020g], Wang et al. [2020a], Xin et al. [2020], Xie et al. [2020d], Li & Zhu [2021], Tian et al. [2020], Xu et al. [2020c], Chen et al. [2020c], Zhou et al. [2020d], Chen et al. [2020d], He & Shin [2020a], Ren & Xie [2019], Zhu et al. [2019], He & Shin [2020b], Chai et al. [2018], Shin & Yoon [2020], Zhang et al. [2018a], Ge et al. [2019a, b], Lee & Rhee [2022], Shleifer et al. [2019], Ge et al. [2020], Yang et al. [2020], Chen et al. [2019], Zhang et al. [2019e], Lee & Rhee [2019], Bogaerts et al. [2020], Wang et al. [2020f], Guo et al. [2020c], Zhou et al. [2020a], Cai et al. [2020], Zhou et al. [2020b], Chen et al. [2020f], Mallick et al. [2021], Jia et al. [2020], Zhang et al. [2020i], Feng et al. [2020], Xie et al. [2020c], Li et al. [2020d], Song et al. [2020b], Zhao et al. [2020b], Kim et al. [2020], Mohanty & Pozdnukhov [2018], Mohanty et al. [2020], Jin et al. [2020b], Li & Axhausen [2020], Jin et al. [2020a], Geng et al. [2019a], Ke et al. [2021a], Li et al. [2020c], Ke et al. [2021b], Luo et al. [2020], Chen et al. [2020b], Xiao et al. [2020], Guo et al. [2019b], Kim et al. [2019], Lin et al. [2018], Yang et al. [2019], Wang et al. [2020c], Xu et al. [2020d
李等人。 2018b,郑等人。 2020b,潘等人。 2020、2019,Lu 等人。 2019a,Mallick 等人。 2020,黄等人。 2020a,Xu 等人。 2020b,王等人。 2020g,Boukerche & Wang 2020b,Kang 等人。 2019,孙等人。 2020,Wei & Shen 2020,Yu 等人。 2018,李等人。 2020b,陈等人。 2020g,王等人。 2020a,Xin 等人。 2020,谢等人。 2020d,Li & Zhu 2021,Tian 等人。 2020,徐等人。 2020c,陈等人。 2020c,Zhou 等人。 2020d,陈等人。 2020d,He & Shin 2020a,Ren & Xie 2019,Zhu 等人。 2019,He & Shin 2020b,Chai 等人。 2018 年,Shin & Yoon 2020 年,Zhang 等人。 2018a,Ge 等人。 2019a、b,Lee & Rhee 2022,Shleifer 等人。 2019,葛等人。 2020,杨等人。 2020,陈等人。 2019,张等人。 2019e,Lee & Rhee 2019,Bogaerts 等人。 2020,王等人。 2020f,郭等人。 2020c,Zhou 等人。 2020a,蔡等人。 2020,Zhou 等人。 2020b,陈等人。 2020f,Mallick 等人。 2021,贾等人。 2020,张等人。 2020i,冯等人。 2020,谢等人。 2020c,Li 等人。 2020d,宋等人。 2020b,赵等人。 2020b,Kim 等人。 2020 年,Mohanty 和 Pozdnukhov 2018 年,Mohanty 等人。 2020,金等人。 2020b,Li 和 Axhausen 2020,Jin 等人。 2020a,耿等人。 2019a,柯等人。 2021a,Li 等人。 2020c,Ke 等人。 2021b,罗等人。 2020,陈等人。 2020b,肖等人。 2020,郭等人。 2019b,Kim 等人。 2019,林等人。 2018,杨等人。 2019,王等人。 2020c,Xu 等人。 2020d
]
Traffic Pattern Similarity Matrix
流量模式相似度矩阵
aij=subscript𝑎𝑖𝑗absenta_{ij}= the correlation coefficient of historical traffic states of nodes i𝑖i and j𝑗j
aij=subscript𝑎𝑖𝑗absenta_{ij}= 节点 i𝑖ij𝑗j 历史流量状态的相关系数
Lv et al. [2020], Li & Zhu [2021], Xu et al. [2020c], Zhou et al. [2020d], Sun et al. [2020], Wang et al. [2020e], He & Shin [2020a], Ren & Xie [2019], Han et al. [2019], Liu et al. [2020b], He & Shin [2020b], Chai et al. [2018], Lu et al. [2020a], Lewenfus et al. [2020], Dai et al. [2020], Han et al. [2020], Jin et al. [2020b], Li & Axhausen [2020], Jin et al. [2020a], Bai et al. [2019b, a], Li et al. [2020c], Ke et al. [2021b], Chen et al. [2020b], Wang et al. [2020d], Yoshida et al. [2019], Kim et al. [2019], Lin et al. [2018], Zhou et al. [2020f
Lv 等人 2020,Li 和 Zhu 2021,Zhou 等人 2020d,Wang 等人 2020e,He 和 Shin 2020a,Ren 和 Xie 2019 ,Liu 等人 2020b,Chai 等人 2020a,Lewenfus 等人 2020,Han 等人 2020b,Li 等人。 Axhausen 2020a,Bai 等人 2020c,Ke 等人 2020b,Wang 等人 2020d,Kim 等人。 2019,林等人,2018,周等人,2020f
]
Functional Similarity Matrix
功能相似矩阵
aij=subscript𝑎𝑖𝑗absenta_{ij}= the correlation coefficient of POI distributions in regions i𝑖i and j𝑗j
aij=subscript𝑎𝑖𝑗absenta_{ij}= i𝑖ij𝑗j 区域POI分布的相关系数
Lv et al. [2020], He & Shin [2020a], Shi et al. [2020], Zhu et al. [2019], Ge et al. [2019a, b, 2020], Jin et al. [2020b], Geng et al. [2019b, a], Ke et al. [2021b], Luo et al. [2020], Zhang et al. [2020k
Lv 等人。 2020,He & Shin 2020a,Shi 等人。 2020,朱等人。 2019,葛等人。 2019a,b,2020,Jin 等人。 2020b,耿等人。 2019b,a,Ke 等人。 2021b,罗等人。 2020,张等人。 2020k
]
Dynamic Matrix 动态矩阵 N/A Wu et al. [2019], Bai et al. [2020], Fang et al. [2019], Zhang et al. [2020e], Chen et al. [2020e], Kong et al. [2020], Tang et al. [2020b], Guo et al. [2019c], Li et al. [2019b], Zhang et al. [2019d], Li et al. [2020f], Guo et al. [2020b], Zhang et al. [2020h], Peng et al. [2020], Zhou et al. [2019], Shi et al. [2020], Li et al. [2018a], Tang et al. [2020a], Zhang et al. [2019a], Diao et al. [2019], Yu et al. [2020a], Fu et al. [2020], Maas & Bloem [2020], Li & Axhausen [2020], Du et al. [2020], Li & Moura [2020], Wu et al. [2020a], Ye et al. [2021
吴等人。 2019,白等人。 2020,方等人。 2019,张等人。 2020e,陈等人。 2020e,Kong 等人。 2020,唐等人。 2020b,郭等人。 2019c,Li 等人。 2019b,Zhang 等人。 2019d,Li 等人。 2020f,郭等人。 2020b,张等人。 2020h,Peng 等人。 2020,Zhou 等人。 2019,Shi 等人。 2020,李等人。 2018a,唐等人。 2020a,张等人。 2019a,Diao 等人。 2019,Yu 等人。 2020a,Fu 等人。 2020,Maas 和 Bloem 2020,Li 和 Axhausen 2020,Du 等人。 2020,Li & Moura 2020,Wu 等人。 2020a,叶等人。 2021年
]

4.2 Graph Neural Networks 4.2图神经网络

Previous neural networks, e.g. fully-connected neural networks (FNNs), CNNs, and RNNs, could only be applied to Euclidean data (i.e. images, text, and videos). As a type of neural network which directly operates on a graph structure, GNNs have the ability to capture complex relationships between objects and make inferences based on data described by graphs. GNNs have been proven effective in various node-level, edge-level, and graph-level prediction tasks [Jiang, 2022]. As mentioned in Section 2, GNNs are currently considered the state-of-the-art techniques for traffic forecasting problems. GNNs can be roughly divided into four types, namely, recurrent GNNs, convolutional GNNs, graph autoencoders, and spatiotemporal GNNs [Wu et al., 2020b]. Because traffic forecasting is a spatiotemporal problem, the GNNs used in this field can all be categorized as the spatiotemporal GNNs. However, certain components of the other types of GNNs have also been applied in the surveyed traffic forecasting studies.
以前的神经网络,例如全连接神经网络 (FNN)、CNN 和 RNN 只能应用于欧几里德数据(即图像、文本和视频)。作为一种直接在图结构上运行的神经网络,GNN 能够捕获对象之间的复杂关系,并根据图描述的数据进行推理。 GNN 已被证明在各种节点级、边缘级和图级预测任务中有效[Jiang,2022]。正如第 2 节中提到的,GNN 目前被认为是解决交通预测问题的最先进技术。 GNN 大致可以分为四种类型,即循环 GNN、卷积 GNN、图自编码器和时空 GNN [Wu et al., 2020b]。由于流量预测是一个时空问题,因此该领域使用的 GNN 都可以归类为时空 GNN。然而,其他类型 GNN 的某些组件也已应用于调查的交通预测研究中。

To give the mathematical formulation of GCN, we further introduce some notations. Give a graph G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A), 𝒩(vi)𝒩subscript𝑣𝑖\mathcal{N}(v_{i}) is defined as the neighbor node set of a single node visubscript𝑣𝑖v_{i}. 𝐃𝐃\mathbf{D} is defined as the degree matrix, of which each element is 𝐃ii=𝒩(vi)subscript𝐃𝑖𝑖norm𝒩subscript𝑣𝑖\mathbf{D}_{ii}=\|\mathcal{N}(v_{i})\|. 𝐋=𝐃𝐀𝐋𝐃𝐀\mathbf{L}=\mathbf{D}-\mathbf{A} is defined as the Laplacian matrix of an undirected graph and 𝐋~=𝐈N𝐃12𝐀𝐃12~𝐋subscript𝐈𝑁superscript𝐃12superscript𝐀𝐃12\tilde{\mathbf{L}}=\mathbf{I}_{N}-\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}} is defined as the normalized Laplacian matrix, where 𝐈Nsubscript𝐈𝑁\mathbf{I}_{N} is the identity matrix with size N𝑁N. Without considering the time step index, the node feature matrix of a graph is simplified as 𝐗RN×d𝐗superscript𝑅𝑁𝑑\mathbf{X}\in{R}^{N\times d}, where N𝑁N is the node number and d𝑑d is the dimension of the node feature vector as before. The basic notations used in this survey is summarized in Table 4.
为了给出 GCN 的数学公式,我们进一步引入一些符号。给定一个图 G=(V,E,A)𝐺𝑉𝐸𝐴G=(V,E,A) ,将 𝒩(vi)𝒩subscript𝑣𝑖\mathcal{N}(v_{i}) 定义为单个节点 visubscript𝑣𝑖v_{i} 的邻居节点集合。 𝐃𝐃\mathbf{D} 定义为度矩阵,其中每个元素为 𝐃ii=𝒩(vi)subscript𝐃𝑖𝑖norm𝒩subscript𝑣𝑖\mathbf{D}_{ii}=\|\mathcal{N}(v_{i})\|𝐋=𝐃𝐀𝐋𝐃𝐀\mathbf{L}=\mathbf{D}-\mathbf{A} 定义为无向图的拉普拉斯矩阵, 𝐋~=𝐈N𝐃12𝐀𝐃12~𝐋subscript𝐈𝑁superscript𝐃12superscript𝐀𝐃12\tilde{\mathbf{L}}=\mathbf{I}_{N}-\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}} 定义为归一化拉普拉斯矩阵,其中 𝐈Nsubscript𝐈𝑁\mathbf{I}_{N} 是大小为 N𝑁N 。在不考虑时间步索引的情况下,图的节点特征矩阵简化为 𝐗RN×d𝐗superscript𝑅𝑁𝑑\mathbf{X}\in{R}^{N\times d} ,其中 N𝑁N 是节点编号, d𝑑d 是节点的维度像以前一样的节点特征向量。表 4 总结了本次调查中使用的基本符号。

Table 4: Basic notations used in this study.
表 4:本研究中使用的基本符号。
Symbol Description
G𝐺G Graph
V𝑉V Node set 节点集
E𝐸E Edge set 边缘设置
A𝐴A Adjacency matrix 邻接矩阵
χtsubscript𝜒𝑡\chi_{t} or 𝐗𝐗\mathbf{X}  χtsubscript𝜒𝑡\chi_{t}𝐗𝐗\mathbf{X} Node feature matrix w/o time step index t𝑡t
不带时间步长索引的节点特征矩阵 t𝑡t
N𝑁N Node number 节点数
d𝑑d Node feature dimension 节点特征维度
𝒩(vi)𝒩subscript𝑣𝑖\mathcal{N}(v_{i}) Neighbor node set of a single node visubscript𝑣𝑖v_{i}
单个节点的邻居节点集 visubscript𝑣𝑖v_{i}
𝐃𝐃\mathbf{D} Degree matrix 度矩阵
𝐋𝐋\mathbf{L} Laplacian matrix 拉普拉斯矩阵
𝐋~~𝐋\tilde{\mathbf{L}} Normalized Laplacian matrix
归一化拉普拉斯矩阵
𝐈Nsubscript𝐈𝑁\mathbf{I}_{N} Identity matrix with size N𝑁N
大小为 N𝑁N 的单位矩阵

When extending the convolution operation from Euclidean data to non-Euclidean data, the basic idea of GNNs is to learn a function mapping for a node to aggregate its own features and the features of its neighbors to generate a new representation. GCNs are spectral-based convolutional GNNs, in which the graph convolutions are defined by introducing filters from graph signal processing in the spectral domain, e.g., the Fourier domain. The graph Fourier transform is firstly used to transform the graph signal to the spectral domain and the inverse graph Fourier transform is further used to transform the result after the convolution operation back. Several spectral-based GCNs are introduced in the literature. Spectral convoluted neural networking [Bruna et al., 2014] assumes that the filter is a set of learnable parameters and considers graph signals with multiple channels. GNN [Henaff et al., 2015] introduces a parameterization with smooth coefficients and makes the spectral filters spatially localized. Chebyshev’s spectral CNN (ChebNet) [Defferrard et al., 2016] leverages a truncated expansion in terms of Chebyshev polynomials up to K𝐾Kth order to approximate the diagonal matrix.
当将卷积运算从欧几里德数据扩展到非欧几里德数据时,GNN 的基本思想是学习节点的函数映射,以聚合其自身特征和邻居特征以生成新的表示。 GCN 是基于谱的卷积 GNN,其中图卷积是通过引入谱域(例如傅里叶域)中图信号处理的滤波器来定义的。图傅立叶变换首先用于将图信号变换到谱域,然后使用图傅立叶逆变换将卷积运算后的结果变换回来。文献中介绍了几种基于谱的 GCN。谱卷积神经网络 [Bruna et al., 2014] 假设滤波器是一组可学习的参数,并考虑具有多个通道的图信号。 GNN [Henaff et al., 2015] 引入了具有平滑系数的参数化,并使光谱滤波器在空间上局部化。切比雪夫的谱 CNN (ChebNet) [Defferrard et al., 2016] 利用高达 K𝐾K 阶的切比雪夫多项式的截断展开来近似对角矩阵。

GCN [Kipf & Welling, 2017] is a first-order approximation of ChebNet, which approximates the filter using the Chebyshev polynomials of the diagonal matrix of eigenvalues. To avoid overfitting, K=1𝐾1K=1 is used in GCN. Formally, the graph convolution operation Gabsent𝐺*G in GCN is defined as follows:
GCN [Kipf & Welling, 2017] 是 ChebNet 的一阶近似,它使用特征值对角矩阵的切比雪夫多项式来近似滤波器。为了避免过拟合,GCN中使用了 K=1𝐾1K=1 。形式上,GCN中的图卷积运算 Gabsent𝐺*G 定义如下:

𝐗G=𝐖(𝐈N+𝐃12𝐀𝐃12)𝐗subscript𝐗absent𝐺𝐖subscript𝐈𝑁superscript𝐃12superscript𝐀𝐃12𝐗\mathbf{X}_{*G}=\mathbf{W}(\mathbf{I}_{N}+\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}})\mathbf{X} (1)

where 𝐖𝐖\mathbf{W} is a learnable weight matrix, i.e., the model parameters. While in practice, the graph convolution operation is further developed in order to alleviate the potential gradient explosion problem as follows:
其中 𝐖𝐖\mathbf{W} 是可学习的权重矩阵,即模型参数。而在实践中,为了缓解潜在的梯度爆炸问题,图卷积操作被进一步发展如下:

𝐗G=𝐖(𝐃~12𝐀~𝐃~12)𝐗subscript𝐗absent𝐺𝐖superscript~𝐃12~𝐀superscript~𝐃12𝐗\mathbf{X}_{*G}=\mathbf{W}(\tilde{\mathbf{D}}^{-\frac{1}{2}}\tilde{\mathbf{A}}\tilde{\mathbf{D}}^{-\frac{1}{2}})\mathbf{X} (2)

where 𝐀~=𝐀+𝐈N~𝐀𝐀subscript𝐈𝑁\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}_{N} and 𝐃~ii=j𝐀~ijsubscript~𝐃𝑖𝑖subscript𝑗subscript~𝐀𝑖𝑗\tilde{\mathbf{D}}_{ii}=\sum_{j}{\tilde{\mathbf{A}}_{ij}}.
其中 𝐀~=𝐀+𝐈N~𝐀𝐀subscript𝐈𝑁\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}_{N}𝐃~ii=j𝐀~ijsubscript~𝐃𝑖𝑖subscript𝑗subscript~𝐀𝑖𝑗\tilde{\mathbf{D}}_{ii}=\sum_{j}{\tilde{\mathbf{A}}_{ij}}

The alternative approach is spatial-based convolutional GNNs, in which the graph convolutions are defined by information propagation. Diffusion graph convolution (DGC) [Atwood & Towsley, 2016], message passing neural network (MPNN) [Gilmer et al., 2017], GraphSAGE [Hamilton et al., 2017], and graph attention network (GAT) [Veličković et al., 2018] all follow this approach. The graph convolution is modeled as a diffusion process with a transition probability from one node to a neighboring node in DGC. An equilibrium is expected to be obtained after several rounds of information transition. The general framework followed is a message passing network, which models the graph convolutions as an information-passing process from one node to another connected node directly. To alleviate the computation problems caused by a large number of neighbors, sampling is used to obtain a fixed number of neighbors in GraphSAGE. Lastly, without using a predetermined adjacency matrix, the attention mechanism is used to learn the relative weights between two connected nodes in GAT.
另一种方法是基于空间的卷积 GNN,其中图卷积由信息传播定义。扩散图卷积 (DGC) [Atwood & Towsley, 2016]、消息传递神经网络 (MPNN) [Gilmer et al., 2017]、GraphSAGE [Hamilton et al., 2017] 和图注意力网络 (GAT) [Veličković et al., 2017] al., 2018] 都遵循这种方法。图卷积被建模为一种扩散过程,具有从 DGC 中一个节点到相邻节点的转移概率。经过几轮信息传递预计会达到均衡。接下来的总体框架是消息传递网络,它将图卷积建模为从一个节点直接到另一个连接节点的信息传递过程。为了缓解大量邻居带来的计算问题,GraphSAGE 中使用采样来获取固定数量的邻居。最后,在不使用预定邻接矩阵的情况下,使用注意力机制来学习 GAT 中两个连接节点之间的相对权重。

MPNN uses message passing functions to unify different spatial-based variants. MPNN operates in two stages, namely, a message passing phase and a readout phase. The message passing phase is defined as follows:
MPNN 使用消息传递函数来统一不同的基于空间的变体。 MPNN 分两个阶段运行,即消息传递阶段和读出阶段。消息传递阶段定义如下:

𝐦vi(t)=vj𝒩(vi)(t)(𝐗i(t1),𝐗j(t1),𝐞ij)superscriptsubscript𝐦subscript𝑣𝑖𝑡subscriptsubscript𝑣𝑗𝒩subscript𝑣𝑖superscript𝑡superscriptsubscript𝐗𝑖𝑡1superscriptsubscript𝐗𝑗𝑡1subscript𝐞𝑖𝑗\mathbf{m}_{v_{i}}^{(t)}=\sum_{v_{j}\in\mathcal{N}{(v_{i})}}\mathcal{M}^{(t)}(\mathbf{X}_{i}^{(t-1)},\mathbf{X}_{j}^{(t-1)},\mathbf{e}_{ij}) (3)

where 𝐦vi(t)superscriptsubscript𝐦subscript𝑣𝑖𝑡\mathbf{m}_{v_{i}}^{(t)} is the message aggregated from the neighbors of node visubscript𝑣𝑖v_{i}, (t)()superscript𝑡\mathcal{M}^{(t)}(\cdot) is the aggregation function in the t𝑡t-th iteration, 𝐗i(t)superscriptsubscript𝐗𝑖𝑡\mathbf{X}_{i}^{(t)} is the hidden state of node visubscript𝑣𝑖v_{i} in the t𝑡t-th iteration, and 𝐞ijsubscript𝐞𝑖𝑗\mathbf{e}_{ij} is the edge feature vector between node visubscript𝑣𝑖v_{i} and node vjsubscript𝑣𝑗v_{j}.
其中 𝐦vi(t)superscriptsubscript𝐦subscript𝑣𝑖𝑡\mathbf{m}_{v_{i}}^{(t)} 是从节点 visubscript𝑣𝑖v_{i} 的邻居聚合的消息, (t)()superscript𝑡\mathcal{M}^{(t)}(\cdot) 是第 t𝑡t 次迭代中的聚合函数,< b4>是节点 visubscript𝑣𝑖v_{i} 在第 t𝑡t 次迭代中的隐藏状态, 𝐞ijsubscript𝐞𝑖𝑗\mathbf{e}_{ij} 是节点 visubscript𝑣𝑖v_{i} 和节点 vjsubscript𝑣𝑗v_{j}

The readout phase is defined as follows:
读出阶段定义如下:

𝐗i(t)=𝒰(t)(𝐗i(t1),𝐦vi(t))superscriptsubscript𝐗𝑖𝑡superscript𝒰𝑡superscriptsubscript𝐗𝑖𝑡1superscriptsubscript𝐦subscript𝑣𝑖𝑡\mathbf{X}_{i}^{(t)}=\mathcal{U}^{(t)}(\mathbf{X}_{i}^{(t-1)},\mathbf{m}_{v_{i}}^{(t)}) (4)

where 𝒰(t)()superscript𝒰𝑡\mathcal{U}^{(t)}(\cdot) is the readout function in the t𝑡t-th iteration.
其中 𝒰(t)()superscript𝒰𝑡\mathcal{U}^{(t)}(\cdot) 是第 t𝑡t 次迭代中的读出函数。

In GAT [Veličković et al., 2018], the attention mechanism [Vaswani et al., 2017] is incorporated into the propagation step and the multi-head attention mechanism is further utilized with the aim of stabilizing the learning process. The specific operation is defined as follows:
在 GAT [Veličković et al., 2018] 中,注意力机制 [Vaswani et al., 2017] 被纳入传播步骤,并进一步利用多头注意力机制,以稳定学习过程。具体操作定义如下:

𝐗i(t)=kσ(j𝒩(vi)αk(𝐗i(t1),𝐗j(t1))𝐖(t1)𝐗j(t1))\mathbf{X}_{i}^{(t)}=\|_{k}\sigma(\sum_{j\in\mathcal{N}{(v_{i})}}\alpha^{k}(\mathbf{X}_{i}^{(t-1)},\mathbf{X}_{j}^{(t-1)})\mathbf{W}^{(t-1)}\mathbf{X}_{j}^{(t-1)}) (5)

where \| is the concatenation operation, σ𝜎\sigma is the activation method, αk()superscript𝛼𝑘\alpha^{k}(\cdot) is the k𝑘k-th attention mechanism.
其中 \| 是串联操作, σ𝜎\sigma 是激活方法, αk()superscript𝛼𝑘\alpha^{k}(\cdot) 是第 k𝑘k 注意力机制。

A general spatiotemporal GNN structure is shown in Figure 5, in which GCN is used to capture the spatial dependency and 1D-CNN is used to capture the temporal dependency. Both GCN and 1D-CNN components can be replaced with other structures for other spatiotemporal GNNs. A multilayer perceptron (MLP) component is used to generate the desired output. As for comparison, a two-layer GCN is also shown in Figure 5, in which only the spatial dependency is concerned.
通用的时空 GNN 结构如图 5 所示,其中 GCN 用于捕获空间依赖性,1D-CNN 用于捕获时间依赖性。 GCN 和 1D-CNN 组件都可以替换为其他时空 GNN 的其他结构。多层感知器 (MLP) 组件用于生成所需的输出。作为比较,图 5 还显示了两层 GCN,其中仅涉及空间依赖性。

Refer to caption
Refer to caption
Figure 5: A comparison between a two-layer GCN model and a typical spatiotemporal GNN structure (1D-CNN+GCN as an example). Adapted from Wang et al. [2021]. 5 a two-layer GCN model; 5 a typical spatiotemporal GNN structure.
图5:两层GCN模型与典型时空GNN结构的比较(以1D-CNN+GCN为例)。改编自 Wang 等人。 [2021]。 5 两层GCN模型;图5 典型的时空GNN结构。

Spatiotemporal GNNs can be further categorized based on the approach used to capture the temporal dependency in particular. Most of the relevant studies in the literature can be split into two types, namely, RNN-based and CNN-based spatiotemporal GNNs [Wu et al., 2020b]. The RNN-based approach is used in Li et al. [2018b], Guo et al. [2020a], Pan et al. [2020, 2019], Lu et al. [2019a], Mallick et al. [2020], Zhang et al. [2020j, l], Bai et al. [2020], Huang et al. [2020a], Wang et al. [2018b, 2020g], Lv et al. [2020], Fukuda et al. [2020], Zhang & Guo [2020], Boukerche & Wang [2020b], Kang et al. [2019], Li et al. [2019b], Xu et al. [2019], Wu et al. [2018a], Wei & Sheng [2020], Li et al. [2020f], Yu et al. [2019b], Yin et al. [2020], Xin et al. [2020], Qu et al. [2020], Huang et al. [2020b], Guo et al. [2020b], Fang et al. [2020a], Li & Zhu [2021], Chen et al. [2020c], Ramadan et al. [2020], Zhou et al. [2020d], Wang et al. [2018a], Peng et al. [2020], Zhou et al. [2019], Wang et al. [2020e], Qiu et al. [2020], Shi et al. [2020], Wang et al. [2020h, 2019], Zhang et al. [2020b], Liu et al. [2020b], Ye et al. [2020b], Zhu et al. [2019], Chai et al. [2018], He et al. [2020], Bai et al. [2021], Zhang et al. [2018a, 2019f], Xie et al. [2019], Zhang et al. [2019a], Guo et al. [2019a], Cirstea et al. [2019], Lu et al. [2019b], Zhang et al. [2019b], Lu et al. [2020b], Zhao et al. [2019], Cui et al. [2019], Chen et al. [2019], Zhang et al. [2019e], Bogaerts et al. [2020], Cui et al. [2020a], Zhou et al. [2020a], Mallick et al. [2021], Sun et al. [2021], Xie et al. [2020b], Zhu et al. [2021, 2020], Fu et al. [2020], Chen et al. [2020a], Lewenfus et al. [2020], Zhu et al. [2022], Liao et al. [2018], Zhao et al. [2020b], Guopeng et al. [2020], Shao et al. [2020], Shen et al. [2020], Mohanty & Pozdnukhov [2018], Mohanty et al. [2020], Hu et al. [2018], Pian & Wu [2020], Jin et al. [2020a], Geng et al. [2019a], Bai et al. [2019a], Li et al. [2020c], Ke et al. [2021b], Hu et al. [2020], Xu & Li [2019], Davis et al. [2020], Chen et al. [2020h], Du et al. [2020], Wu et al. [2020a], Ye et al. [2021], Luo et al. [2020], Chen et al. [2020b], Wang et al. [2020d], Xiao et al. [2020], Guo et al. [2019b], Lin et al. [2018], Zhou et al. [2020f], Liu et al. [2020c], Zhang et al. [2020g], Yang et al. [2019], Zhang et al. [2020f], Wang et al. [2020c], Wright et al. [2019]; while the CNN-based approach is used in Wu et al. [2019], Fang et al. [2019], Zhang et al. [2020e], Xu et al. [2020b], Chen et al. [2020e], Kong et al. [2020], Tang et al. [2020b], Guo et al. [2019c], Sun et al. [2020], Yu et al. [2018], Li et al. [2020b], Wang et al. [2020a], Tian et al. [2020], Chen et al. [2020d], Zhao et al. [2020a], Zhang et al. [2020c], Ou et al. [2020], Tang et al. [2020a], Diao et al. [2019], Lee & Rhee [2022, 2019], Wang et al. [2020f], Wu et al. [2020c], Guo & Yuan [2020], Zhang et al. [2020i], Feng et al. [2020], Zhang et al. [2020d], Xie et al. [2020c], Lu et al. [2020a], Maas & Bloem [2020], Li et al. [2020d], Song et al. [2020b], Dai et al. [2020], Hong et al. [2020], Zheng et al. [2020a], Zhou et al. [2020e], Yu et al. [2020b], Xu et al. [2020d], Heglund et al. [2020].
时空 GNN 可以根据用于捕获特定时间依赖性的方法进一步分类。文献中的大多数相关研究可以分为两种类型,即基于 RNN 的时空 GNN 和基于 CNN 的时空 GNN [Wu et al., 2020b]。 Li 等人使用了基于 RNN 的方法。 [2018b],郭等人。 [2020a],潘等人。 [2020,2019],Lu 等人。 [2019a],Mallick 等人。 [2020],张等人。 [2020j,l],白等人。 [2020],黄等人。 [2020a],王等人。 [2018b,2020g],Lv 等人。 [2020],福田等人。 [ 2020],Zhang 和Guo [2020],Boukerche 和Wang [2020b],Kang 等人。 [2019],李等人。 [2019b],徐等人。 [2019],吴等人。 [2018a],Wei & Shen [2020],Li 等人。 [2020f],Yu 等人。 [2019b],尹等人。 [2020],Xin 等人。 [2020],Qu 等人。 [2020],黄等人。 [2020b],郭等人。 [2020b],方等人。 [2020a],Li & Zhu [2021],Chen 等人。 [2020c],Ramadan 等人。 [2020],Zhou 等人。 [2020d],王等人。 [2018a],彭等人。 [2020],Zhou 等人。 [2019],王等人。 [2020e],Qiu 等人。 [2020],Shi 等人。 [2020],王等人。 [2020h,2019],Zhang 等人。 [2020b],刘等人。 [2020b],Ye 等人。 [2020b],朱等人。 [2019],柴等人。 [2018],何等人。 [2020],白等人。 [2021],张等人。 [2018a,2019f],Xie 等人。 [2019],张等人。 [2019a],郭等人。 [2019a],Cirstea 等人。 [2019],卢等人。 [2019b],张等人。 [2019b],Lu 等人。 [2020b],赵等人。 [2019],崔等人。 [2019],陈等人。 [2019],张等人。 [2019e],Bogaerts 等人。 [2020],崔等人。 [2020a],Zhou 等人。 [2020a],Mallick 等人。 [2021],Sun 等人。 [2021],Xie 等人。 [2020b],朱等人。 [2021, 2020],Fu 等人。 [2020],陈等人。 [2020a],Lewenfus 等人。 [2020],朱等人。 [2022],廖等人。 [2018],赵等人。 [2020b],国鹏等。 [2020],Shao 等人。 [2020],沉等人。 [2020],Mohanty 和 Pozdnukhov [2018],Mohanty 等人。 [2020],胡等人。 [2018],Pian & Wu [2020],Jin 等人。 [2020a],耿等人。 [2019a],白等人。 [2019a],Li 等人。 [2020c],Ke 等人。 [2021b],胡等人。 [2020],Xu & Li [2019],Davis 等人。 [2020],陈等人。 [2020h],杜等人。 [2020],吴等人。 [2020a],叶等人。 [2021],罗等人。 [2020],陈等人。 [2020b],王等人。 [2020d],Xiao 等人。 [2020],郭等人。 [2019b],林等人。 [2018],周等人。 [2020f],刘等人。 [2020c],Zhang 等人。 [2020g],杨等人。 [2019],张等人。 [2020f],王等人。 [2020c],赖特等人。 [2019]; Wu 等人使用了基于 CNN 的方法。 [2019],方等人。 [2019],张等人。 [2020e],Xu 等人。 [2020b],陈等人。 [2020e],Kong 等人。 [2020],唐等人。 [2020b],郭等人。 [2019c],Sun 等人。 [2020],Yu 等人。 [2018],李等人。 [2020b],Wang 等人。 [2020a],Tian 等人。 [2020],陈等人。 [2020d],赵等人。 [2020a],Zhang 等人。 [2020c],Ou 等人。 [2020],唐等人。 [2020a],Diao 等人。 [ 2019],Lee & Rhee [2022,2019],Wang 等人。 [2020f],吴等人。 [2020c],Guo & Yuan [2020],Zhang 等人。 [2020i],冯等人。 [2020],张等人。 [2020d],Xie 等人。 [2020c],Lu 等人。 [2020a],Maas 和 Bloem [2020],Li 等人。 [2020d],宋等人。 [2020b],戴等人。 [2020],洪等人。 [2020],郑等人。 [2020a],Zhou 等人。 [2020e],Yu 等人。 [2020b],Xu 等人。 [2020d],赫格伦德等人。 [2020]。

With the recent expansion of relevant studies, we add two sub-types of spatiotemporal GNNs in this survey, namely, attention-based and FNN-based. Attention mechanism is firstly proposed to memorize long source sentences in neural machine translation [Vaswani et al., 2017]. Then it is used for temporal forecasting problems. As a special case, Transformer is built entirely upon attention mechanisms, which makes it possible to access any part of a sequence regardless of its distance to the target [Xie et al., 2020d, Cai et al., 2020, Jin et al., 2020b, Li & Moura, 2020]. The attention-based approaches are used in Zheng et al. [2020b], Zhang et al. [2020a], Wang et al. [2020b], Xie et al. [2020d], Cai et al. [2020], Zhou et al. [2020b], Chen et al. [2020f], Park et al. [2020], Fang et al. [2020b], Jin et al. [2020b], Bai et al. [2019b], Li & Moura [2020], Zhang et al. [2020k], while the simpler FNN-based approach is used in Zhang et al. [2018b], Wei et al. [2019], Song et al. [2020a], Cao et al. [2020], Chen et al. [2020g], Zhang et al. [2020h], Sun et al. [2020], He & Shin [2020a], Yeghikyan et al. [2020], Ren & Xie [2019], Li et al. [2018a], Han et al. [2019], He & Shin [2020b], Zhang et al. [2019c], Ge et al. [2019a, b], Yu et al. [2020a], Ge et al. [2020], Yu et al. [2019a], Guo et al. [2020c], Agafonov [2020], Geng et al. [2019b], Qin et al. [2020b], Kim et al. [2019]. Apart from using neural networks to capture temporal dependency, other techniques that have also been combined with GNNs include autoregression [Lee et al., 2019], Markov processes [Cui et al., 2020b], and Kalman filters [Xiong et al., 2020].
随着最近相关研究的扩展,我们在本次调查中添加了两种时空 GNN 子类型,即基于注意力的和基于 FNN 的。注意力机制首先被提出来记忆神经机器翻译中的长源句子 [Vaswani et al., 2017]。然后将其用于时间预测问题。作为一个特例,Transformer 完全建立在注意力机制的基础上,这使得访问序列的任何部分成为可能,无论其与目标的距离如何[Xie et al., 2020d, Cai et al., 2020, Jin et al., 2020] ,2020b,Li 和 Moura,2020]。 Cheng 等人使用了基于注意力的方法。 [2020b],Zhang 等人。 [2020a],王等人。 [2020b],Xie 等人。 [2020d],蔡等人。 [2020],Zhou 等人。 [2020b],陈等人。 [2020f],Park 等人。 [2020],方等人。 [2020b],金等人。 [2020b],白等人。 [2019b],Li 和 Moura [2020],Zhang 等人。 [2020k],而Zhang等人使用了更简单的基于FNN的方法。 [2018b],Wei 等人。 [2019],宋等人。 [2020a],曹等人。 [2020],陈等人。 [2020g],张等人。 [2020h],Sun 等人。 [2020],He & Shin [2020a],Yeghikyan 等人。 [2020],Ren & Xie [2019],Li 等人。 [2018a],韩等人。 [ 2019],He & Shin [2020b],Zhang 等人。 [2019c],Ge 等人。 [2019a,b],Yu 等人。 [2020a],Ge 等人。 [2020],Yu 等人。 [2019a],郭等人。 [2020c],阿加福诺夫[2020],耿等人。 [2019b],秦等人。 [2020b],Kim 等人。 [2019]。除了使用神经网络来捕获时间依赖性之外,还与 GNN 相结合的其他技术包括自回归 [Lee et al., 2019]、马尔可夫过程 [Cui et al., 2020b] 和卡尔曼滤波器 [Xiong et al., 2019]。 2020]。

Among different approaches for temporal modeling, RNNs suffer from time-consuming iterations and gradient vanishing or explosion problem with long sequences. CNNs demonstrate their superiority in terms of simple structure, parallel computing and stable gradients. As for the traffic problems, the spatial and temporal dependencies are closely intertwined in reality. For example, it is argued that the historical observations in different locations at different times have varying impacts on central region in the future [Guo et al., 2019c]. Some efforts are put to jointly modeling the potential interaction between spatial and temporal features and one promising direction is the incorporate of the graph convolution operations into RNNs to capture spatial-temporal correlations [Yu et al., 2019b, Zhou et al., 2019, Chen et al., 2019, Liu et al., 2020b, Chen et al., 2020f, Guo et al., 2020a]. For example, the localized spatio-temporal correlation information is extracted simultaneously with the adjacency matrix of localized spatio-temporal graph in Song et al. [2020a], in which a localized spatio-temporal graph that includes both temporal and spatial attributes is constructed first and a spatial-based GCN method is applied then.
在不同的时间建模方法中,RNN 面临着耗时的迭代以及长序列的梯度消失或爆炸问题。 CNN 在结构简单、并行计算和稳定梯度方面表现出其优越性。至于交通问题,空间和时间依赖性在现实中是紧密交织在一起的。例如,有人认为不同地点、不同时间的历史观测对未来中部地区的影响不同[Guo et al., 2019c]。人们付出了一些努力来联合建模空间和时间特征之间的潜在相互作用,一个有前途的方向是将图卷积运算合并到 RNN 中以捕获时空相关性 [Yu et al., 2019b, Zhou et al., 2019, Chen 等人,2019;Liu 等人,2020b;Chen 等人,2020f;Guo 等人,2020a]。例如,Song等人在提取局部时空相关信息的同时,还提取了局部时空图的邻接矩阵。 [2020a],其中首先构建包括时间和空间属性的局部时空图,然后应用基于空间的GCN方法。

Of the additional GNN components adopted in the surveyed studies, convolutional GNNs are the most popular, while recurrent GNN [Scarselli et al., 2008] and Graph Auto-Encoder (GAE) [Kipf & Welling, 2016] are used less frequently. We further categorize convolutional GNNs into the following five types: (1) GCN [Kipf & Welling, 2017], (2) DGC [Atwood & Towsley, 2016], (3) MPNN [Gilmer et al., 2017], (4) GraphSAGE [Hamilton et al., 2017], and (5) GAT [Veličković et al., 2018]. These relevant graph neural networks are listed chronologically in Figure 6. While different GNNs can be used for traffic forecasting, a general design pipeline is proposed in [Zhou et al., 2020c] and suggested for future studies as follows:
在调查研究中采用的其他 GNN 组件中,卷积 GNN 是最受欢迎的,而循环 GNN [Scarselli et al., 2008] 和图自动编码器 (GAE) [Kipf & Welling, 2016] 的使用频率较低。我们进一步将卷积 GNN 分为以下五种类型:(1) GCN [Kipf & Welling, 2017],(2) DGC [Atwood & Towsley, 2016],(3) MPNN [Gilmer et al., 2017],(4 ) GraphSAGE [Hamilton 等人,2017],以及 (5) GAT [Veličković 等人,2018]。这些相关的图神经网络按时间顺序在图 6 中列出。虽然不同的 GNN 可用于流量预测,但 [Zhou et al., 2020c] 中提出了通用设计流程,并建议未来的研究如下:

  1. 1.

    Find graph structure. As discussed in Section IV, different traffic graphs are available.


    1. 查找图结构。正如第四节中所讨论的,可以使用不同的流量图。
  2. 2.

    Specify graph type and scale. The graphs can be further classified into different types if needed, e.g., directed/undirected graphs, homogeneous/heterogeneous graphs, static/dynamic graphs. For most cases in traffic forecasting, the graphs of the same type are used in a single study. As for the graph scale, the graphs in the traffic domain are not as large as those for the social networks or academic networks with millions of nodes and edges.


    2. 指定图表类型和比例。如果需要,图可以进一步分为不同类型,例如有向/无向图、同构/异构图、静态/动态图。对于流量预测的大多数情况,同一类型的图表用于单个研究。就图规模而言,流量领域的图不像社交网络或学术网络那样大,具有数百万个节点和边。
  3. 3.

    Design loss function. The training setting usually follows the supervised approach, which means the GNN-based models are firstly trained on a training set with labels and then evaluated on a test set. The forecasting task is usually designed as the node-level regression problem. Based on these considerations, the proper loss function and evaluation metrics can be chosen, e.g., root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).


    3.设计损失函数。训练设置通常遵循监督方法,这意味着基于 GNN 的模型首先在带有标签的训练集上进行训练,然后在测试集上进行评估。预测任务通常被设计为节点级回归问题。基于这些考虑,可以选择适当的损失函数和评估指标,例如均方根误差(RMSE)、平均绝对误差(MAE)和平均绝对百分比误差(MAPE)。
  4. 4.

    Build model using computational modules. The GNNs discussed in this section are exactly those which have already been used as computational modules to build forecasting models in the surveyed studies.


    4. 使用计算模块构建模型。本节讨论的 GNN 正是那些在调查研究中已被用作构建预测模型的计算模块的 GNN。
Refer to caption
Figure 6: The relevant graph neural networks in this survey.
图 6:本次调查中的相关图神经网络。

A full list of the GNN components used in the surveyed studies is shown in Table 5. Currently, the most widely used GNN is the GCN. However, we also notice a growing trend in the use of GAT in traffic forecasting.
表 5 显示了调查研究中使用的 GNN 组件的完整列表。目前,使用最广泛的 GNN 是 GCN。然而,我们也注意到 GAT 在流量预测中的使用呈增长趋势。

Table 5: GNNs in the surveyed studies.
表 5:调查研究中的 GNN。
GNN Relevant Studies 相关研究
Recurrent GNN 循环神经网络 Wang et al. [2018b, a], Lu et al. [2019b, 2020b
王等人。 2018b,a,Lu 等人。 2019b, 2020b
]
GAE Xu et al. [2020a, 2019], Opolka et al. [2019], Shen et al. [2020
徐等人。 2020a,2019,Opolka 等人。 2019,沉等人。 2020年
]
GCN Wu et al. [2019], Zhang et al. [2018b], Guo et al. [2020a], Lu et al. [2019a], Zhang et al. [2020j, l], Bai et al. [2020], Fang et al. [2019], Zhang et al. [2020e], Song et al. [2020a], Xu et al. [2020b], Wang et al. [2020g], Lv et al. [2020], Boukerche & Wang [2020b], Tang et al. [2020b], Guo et al. [2019c], Li et al. [2019b], Zhang et al. [2019d], Sun et al. [2020], Li et al. [2020f], Cao et al. [2020], Yu et al. [2018, 2019b], Li et al. [2020b], Chen et al. [2020g], Zhang et al. [2020a], Wang et al. [2020a], Xin et al. [2020], Qu et al. [2020], Wang et al. [2020b], Huang et al. [2020b], Guo et al. [2020b], Fang et al. [2020a], Li & Zhu [2021], Xu et al. [2020c], Chen et al. [2020c], Xiong et al. [2020], Ramadan et al. [2020], Zhou et al. [2020d], Sun et al. [2020], Peng et al. [2020], Zhou et al. [2019], Wang et al. [2020e], Qiu et al. [2020], He & Shin [2020a], Yeghikyan et al. [2020], Shi et al. [2020], Wang et al. [2020h], Ren & Xie [2019], Li et al. [2018a], Zhao et al. [2020a], Han et al. [2019], Zhang et al. [2020b, c], Liu et al. [2020b], Ye et al. [2020b], Zhu et al. [2019], Chai et al. [2018], He et al. [2020], Bai et al. [2021], Tang et al. [2020a], James [2020], Zhang et al. [2018a, 2019f], Yu & Gu [2019], Guo et al. [2019a], Diao et al. [2019], Zhang et al. [2019c], James [2019], Ge et al. [2019a, b], Zhang et al. [2019b], Lee & Rhee [2022], Yu et al. [2020a], Ge et al. [2020], Zhao et al. [2019], Cui et al. [2019], Zhang et al. [2019e], Yu et al. [2019a], Lee & Rhee [2019], Bogaerts et al. [2020], Cui et al. [2020b, a], Guo et al. [2020c], Cai et al. [2020], Wu et al. [2020c], Chen et al. [2020f], Jia et al. [2020], Sun et al. [2021], Xie et al. [2020b], Zhu et al. [2021], Feng et al. [2020], Zhu et al. [2020], Fu et al. [2020], Agafonov [2020], Chen et al. [2020a], Lu et al. [2020a], Jepsen et al. [2019, 2020], Bing et al. [2020], Lewenfus et al. [2020], Zhu et al. [2022], Liao et al. [2018], Maas & Bloem [2020], Li et al. [2020d], Song et al. [2020b], Zhao et al. [2020b], Guopeng et al. [2020], Shao et al. [2020], Dai et al. [2020], Mohanty & Pozdnukhov [2018], Mohanty et al. [2020], Qin et al. [2020a], Han et al. [2020], Hong et al. [2020], Hu et al. [2018], Li & Axhausen [2020], Jin et al. [2020a], Geng et al. [2019b], Bai et al. [2019b], Geng et al. [2019a], Bai et al. [2019a], Ke et al. [2021a], Li et al. [2020c], Ke et al. [2021b], Hu et al. [2020], Zheng et al. [2020a], Davis et al. [2020], Chen et al. [2020h], Du et al. [2020], Li & Moura [2020], Ye et al. [2021], Luo et al. [2020], Chen et al. [2020b], Wang et al. [2020d], Qin et al. [2020b], Xiao et al. [2020], Yoshida et al. [2019], Guo et al. [2019b], Kim et al. [2019], Lin et al. [2018], Zhou et al. [2020e], Yu et al. [2020b], Zhang et al. [2020k], Zhou et al. [2020f], Liu et al. [2020c], Zhang et al. [2020g], Yang et al. [2019], Zhang et al. [2020f], Xu et al. [2020d], Heglund et al. [2020
Wu 等人 2019,Guo 等人 2020a,Zhang 等人 2020,Fang 等人 2020e等人 2020a,Xu 等人 2020g,Lv 等人 2020b,Tang 等人 2019c,Li 等人 2019b 2019d,Sun 等人 2020f,Cao 等人 2018,2019b,Li 等人 2020g,Zhang 等人 2020a。 2020a,Qu 等人 2020,Huang 等人 2020b,Fang 等人 2020a,Xu 等人 2020c等人 2020c,Xiong 等人 2020,Zhou 等人 2020,Peng 等人 2020,Wang 等人 2020e 2020,He 和 Shin 2020a,Shi 等人 2020,Ren 和 Xie 2018a,Zhao 等人 2020a,Zhang 等人。等人 2020b,c,刘等人 2020b,朱等人 2018,何等人 2021,唐等人 2020 ,Zhang 等人 2018a,2019f,Guo 等人 2019a,Diao 等人 2019c,James 等人 2019a,b,Zhang 等人 2019b。 & Rhee 2022,Ge 等人 2020,Cui 等人 2019e,Yu 等人 2019a,Bogaerts 等人 2020 ,Cui 等人 2020b,Guo 等人 2020,Wu 等人 2020f,Sun 等人 2020b。 ,Zhu 等人 2021,Feng 等人 2020,Fu 等人 2020,Chen 等人 2020a,Jepsen 等人 2019,2020等人 2020,Lewenfus 等人 2022,Liao 等人 2020,Li 等人 2020d。 2020b,Guopeng 等人 2020,Dai 等人 2020,Mohanty 等人 2020,Han 等人 2020等人 2020,Li 等人 2020a,Geng 等人 2019b,Geng 等人 2019a,Ke 等人。 2021a,Li 等人 2021b,Hu 等人 2020,Davis 等人 2020h,Du 等人 2020,Ye等人 2021,陈等人 2020b,秦等人 2020,吉田等人 2019b . 2019,Zhou 等人 2020e,Zhang 等人 2020f,Liu 等人 2020g,Yang 等人 2019 ,Zhang 等人 2020f,Xu 等人 2020d,Heglund 等人。 2020
]
DGC Li et al. [2018b], Mallick et al. [2020], Chen et al. [2020e], Fukuda et al. [2020], Ou et al. [2020], Chen et al. [2019], Wang et al. [2020f], Zhou et al. [2020a, b], Mallick et al. [2021], Xie et al. [2020c], Kim et al. [2020], Wang et al. [2020c
李等人。 2018b,Mallick 等人。 2020,陈等人。 2020e,福田等人。 2020,欧等人。 2020,陈等人。 2019,王等人。 2020f,Zhou 等人。 2020a、b,Mallick 等人。 2021,谢等人。 2020c,Kim 等人。 2020,王等人。 2020年c
]
MPNN Wei et al. [2019], Xu et al. [2020b], Wang et al. [2019
魏等人。 2019,徐等人。 2020b,王等人。 2019年
]
GraphSAGE Liu et al. [2020a 刘等人。 2020年a]
GAT Zheng et al. [2020b], Pan et al. [2020, 2019], Huang et al. [2020a], Kong et al. [2020], Zhang & Guo [2020], Tang et al. [2020b], Kang et al. [2019], Wu et al. [2018a], Wei & Sheng [2020], Yin et al. [2020], Xie et al. [2020d], Zhang et al. [2020h], Tian et al. [2020], He & Shin [2020b], Tang et al. [2020a], Zhang et al. [2019a], Cirstea et al. [2019], Yang et al. [2020], Guo & Yuan [2020], Zhang et al. [2020i, d], Park et al. [2020], Song et al. [2020b], Fang et al. [2020b], Pian & Wu [2020], Jin et al. [2020b], Xu & Li [2019], Wu et al. [2020a], Wright et al. [2019
郑等人。 2020b,潘等人。 2020、2019,黄等人。 2020a,Kong 等人。 2020,Zhang 和Guo 2020,Tang 等人。 2020b,康等人。 2019,吴等人。 2018a,Wei & Shen 2020,Yin 等人。 2020,谢等人。 2020d,张等人。 2020h,Tian 等人。 2020,He & Shin 2020b,Tang 等人。 2020a,张等人。 2019a,Cirstea 等人。 2019,杨等人。 2020,Guo & Yuan 2020,Zhang 等人。 2020i、d、Park 等人。 2020,宋等人。 2020b,方等人。 2020b,Pian & Wu 2020,Jin 等人。 2020b,Xu & Li 2019,Wu 等人。 2020a,赖特等人。 2019年
]

During the process of customizing GNNs for traffic forecasting, some classical models stand out in the literature. The most famous one is diffusion convolutional recurrent neural network (DCRNN) [Li et al., 2018b], which uses diffusion graph convolutional networks and RNN to learn the representations of spatial dependencies and temporal relations. DCRNN was originally proposed for traffic speed forecasting and is now widely used as a baseline. To create the traffic graph, the adjacency matrix is defined as the thresholded pairwise road network distances. Compared with other graph convolutional models that can only operate on undirected graphs, e.g., ChebNet, DCRNN introduces the diffusion convolution (DC) operation for directed graph and is more suitable for transportation scenarios, which is defined as follows:
在定制 GNN 用于流量预测的过程中,一些经典模型在文献中脱颖而出。最著名的是扩散卷积循环神经网络(DCRNN)[Li et al., 2018b],它使用扩散图卷积网络和 RNN 来学习空间依赖性和时间关系的表示。 DCRNN 最初被提出用于交通速度预测,现在被广泛用作基线。为了创建交通图,邻接矩阵被定义为阈值成对道路网络距离。与其他只能在无向图上操作的图卷积模型(例如ChebNet)相比,DCRNN引入了有向图的扩散卷积(DC)操作,更适合交通场景,其定义如下:

𝐗DC=k=0K1(θk,1(DO1A)k)+θk,2(DI1AT)k)𝐗\mathbf{X}_{*DC}=\sum_{k=0}^{K-1}(\theta_{k,1}(D_{O}^{-1}A)^{k})+\theta_{k,2}(D_{I}^{-1}A^{T})^{k})\mathbf{X} (6)

where 𝐗RN×d𝐗superscript𝑅𝑁𝑑\mathbf{X}\in{R}^{N\times d} is the node feature matrix, A𝐴A is the adjacency matrix, DOsubscript𝐷𝑂D_{O} and DIsubscript𝐷𝐼D_{I} are diagonal out-degree and in-degree matrices, θk,1subscript𝜃𝑘1\theta_{k,1} and θk,2subscript𝜃𝑘2\theta_{k,2} are model parameters, K𝐾K is the number of diffusion steps. By defining and using out-degree and in-degree matrices, DCRNN models the bidirectional diffusion process to capture the influence of both upstream and downstream traffic. While DCRNN is a strong baseline, it is not suitable or desirable for the undirected graph cases. Then DCRNN is extended with a stronger learning ability in graph GRU in Zhang et al. [2018a], in which a unified method for constructing an RNN based on an arbitrary graph convolution operator is proposed, instead of the single RNN model used in DCRNN.
其中 𝐗RN×d𝐗superscript𝑅𝑁𝑑\mathbf{X}\in{R}^{N\times d} 是节点特征矩阵, A𝐴A 是邻接矩阵, DOsubscript𝐷𝑂D_{O}DIsubscript𝐷𝐼D_{I} 是对角线出度和入度矩阵, θk,1subscript𝜃𝑘1\theta_{k,1}θk,2subscript𝜃𝑘2\theta_{k,2} 是模型参数, K𝐾K 是扩散步数。通过定义和使用出度和入度矩阵,DCRNN 对双向扩散过程进行建模,以捕获上游和下游流量的影响。虽然 DCRNN 是一个强大的基线,但它不适合或不适合无向图情况。然后,Zhang 等人在图 GRU 中对 DCRNN 进行了扩展,使其具有更强的学习能力。 [2018a],其中提出了一种基于任意图卷积算子构建 RNN 的统一方法,而不是 DCRNN 中使用的单一 RNN 模型。

Spatio-temporal graph convolutional network (STGCN) [Yu et al., 2018] stacks multiple spatio-temporal convolution blocks and each block concatenate two temporal convolution and one graph convolution layer. ChebNet is chosen as the graph convolution operator in STGCN, after a comparison with its first-order approximation. The usage of temporal convolution layers instead of RNNs for temporal modeling accelerates the training phase of STGCN. Attention based Spatio-temporal graph convolutional network (ASTGCN) [Guo et al., 2019c] further introduces two attention layers in STGCN to capture the dynamic correlations in spatial dimension and temporal dimension, respectively.
时空图卷积网络(STGCN)[Yu et al., 2018]堆叠多个时空卷积块,每个块连接两个时间卷积和一个图卷积层。在与其一阶近似值进行比较后,选择 ChebNet 作为 STGCN 中的图卷积算子。使用时间卷积层而不是 RNN 进行时间建模加速了 STGCN 的训练阶段。基于注意力的时空图卷积网络(ASTGCN)[Guo et al., 2019c]进一步在 STGCN 中引入两个注意力层,分别捕获空间维度和时间维度的动态相关性。

Graph WaveNet [Wu et al., 2019] constructs a self-adaptive matrix to uncover unseen graph structures automatically from the data and WaveNet, which is based on causal convolutions, is used to learn temporal relations. However, the self-adaptive matrix in Graph WaveNet is fixed after training, which is unable to be adjusted dynamically with the data characteristics.
Graph WaveNet [Wu et al., 2019] 构造一个自适应矩阵,从数据中自动发现看不见的图结构,并且基于因果卷积的 WaveNet 用于学习时间关系。然而Graph WaveNet中的自适应矩阵在训练后是固定的,无法随数据特征动态调整。

5 Open Data and Source Codes
5开放数据和源代码

In this section, we summarize the open data and source code used in the surveyed papers. These open data are suitable for GNN-related studies with graph structures discussed in Section IV, which can be used to formulate different forecasting problems in Section III. We also list the GNN-related code resources for those who want to replicate the previous GNN-based solutions as baselines in the follow-up studies.
在本节中,我们总结了调查论文中使​​用的开放数据和源代码。这些开放数据适用于第四节中讨论的具有图结构的 GNN 相关研究,可用于制定第三节中的不同预测问题。我们还列出了 GNN 相关的代码资源,供那些想要复制之前基于 GNN 的解决方案的人作为后续研究的基线。

5.1 Open Data 5.1开放数据

We categorize the data used in the surveyed studies into three major types, namely, graph-related data, historical traffic data, and external data. Graph-related data refer to those data which exhibit a graph structure in the traffic domain, i.e., transportation network data. Historical traffic data refer to those data which record the historical traffic states, usually in different locations and time points. We further categorize the historical traffic data into sub-types as follows. External data refer to the factors that would affect the traffic states, i.e., weather data and calendar data. Some of these data can be used in the graph-based modeling directly, while the others may require some pre-processing steps before being Incorporated into GNN-based models.
我们将调查研究中使用的数据分为三大类,即图形相关数据、历史流量数据和外部数据。图相关数据是指在交通领域呈现出图结构的数据,即交通网络数据。历史交通数据是指记录历史交通状况的数据,通常是不同地点和时间点的交通状况。我们将历史流量数据进一步分类为以下子类型。外部数据是指影响交通状况的因素,例如天气数据和日历数据。其中一些数据可以直接用于基于图的建模,而其他数据可能需要一些预处理步骤才能合并到基于 GNN 的模型中。

Transportation Network Data. These data represent the underlying transportation infrastructure, e.g., road, subway, and bus networks. They can be obtained from government transportation departments or extracted from online map services, e.g., OpenStreetMap. Based on their topology structure, these data can be used to build the graphs directly, e.g., the road segments or the stations are nodes and the road intersections or subway links are the edges. While this modeling approach is straightforward, the disadvantage is that only static graphs can be built from transportation network data.
交通网络数据。这些数据代表了底层的交通基础设施,例如道路、地铁和公交网络。它们可以从政府交通部门获得或从在线地图服务(例如 OpenStreetMap)中提取。基于它们的拓扑结构,这些数据可以直接用来构建图,例如,路段或车站是节点,道路交叉口或地铁线路是边。虽然这种建模方法很简单,但缺点是只能根据交通网络数据构建静态图。

Traffic Sensor Data. Traffic sensors, e.g. loop detectors, are installed on roads to collect traffic information, e.g., traffic volume or speed. This type of data is widely used for traffic prediction, especially road traffic flow and speed prediction problems. For graph-based modeling, each sensor can be used as a node, with road connections as the edges. One advantage of using traffic sensor data for graph-based modeling is that the captured traffic information can be used directly as the node attributes, with little pre-processing overhead. One exception is that the sensors are prone to hardware faults, which causes the missing data or data noise problems and requires corresponding pre-processing techniques, e.g., data imputation and denoising methods. Another disadvantage of using traffic sensor data for graph-based modeling is that the traffic sensors can only be installed in a limited number of locations for a series of reasons, e.g., installation cost. With this constraint, only the part of the road networks with traffic sensors can be incorporated into a graph, while the uncovered areas are neglected.
交通传感器数据。交通传感器,例如环路探测器安装在道路上以收集交通信息,例如交通量或速度。此类数据广泛用于交通预测,特别是道路交通流量和速度预测问题。对于基于图的建模,每个传感器都可以用作节点,道路连接作为边缘。使用交通传感器数据进行基于图的建模的优点之一是捕获的交通信息可以直接用作节点属性,而预处理开销很小。一种例外是传感器容易出现硬件故障,从而导致数据丢失或数据噪声问题,需要相应的预处理技术,例如数据插补和去噪方法。使用交通传感器数据进行基于图的建模的另一个缺点是,由于安装成本等一系列原因,交通传感器只能安装在有限数量的位置。在这种限制下,只有带有交通传感器的道路网络部分可以纳入图表中,而未覆盖的区域将被忽略。

GPS Trajectory Data. Different types of vehicles (e.g. taxis, buses, online ride-hailing vehicles, and shared bikes) can be equipped with GPS receivers, which record GPS coordinates in 2-60 second intervals. The trajectory data calculated from these GPS coordinate samples can be matched to road networks and further used to derive traffic flow or speed. The advantage of using GPS trajectory data for graph-based modeling is both the low expense to collect GPS data with smartphones and the wider coverage with the massive number of vehicles, compared with traffic sensor data. However, GPS trajectory data contain no direct traffic information, which can be derived with corresponding definitions though. The data quality problems also remain with GPS trajectory data and more pre-processing steps are required, e.g., map matching.
GPS 轨迹数据。不同类型的车辆(例如出租车、公交车、网约车、共享单车)可以配备GPS接收器,以2-60秒的间隔记录GPS坐标。根据这些 GPS 坐标样本计算出的轨迹数据可以与道路网络相匹配,并进一步用于推导出交通流量或速度。与交通传感器数据相比,使用 GPS 轨迹数据进行基于图形的建模的优势在于,通过智能手机收集 GPS 数据的费用较低,而且覆盖范围更广,可以覆盖大量车辆。然而,GPS轨迹数据不包含直接的交通信息,但可以通过相应的定义导出。 GPS轨迹数据也存在数据质量问题,需要更多的预处理步骤,例如地图匹配。

Location-based Service Data. GPS function is also embedded in smartphones, which can be used to collect various types of location-related data, e.g., check-in data, point-of-interest data, and route navigation application data. The pros and cons of using location-based service data are similar with GPS trajectory data. And the difference is that location-based service data are often collected in a crowd-sourced approach, with more data providers but potentially a lower data quality.
基于位置的服务数据。智能手机中还嵌入了GPS功能,可用于收集各种类型的位置相关数据,例如签到数据、兴趣点数据和路线导航应用数据。使用基于位置的服务数据的优点和缺点与 GPS 轨迹数据类似。不同之处在于,基于位置的服务数据通常以众包方式收集,数据提供者更多,但数据质量可能较低。

Trip Record Data. These include departure and arrival dates/times, departure and arrival locations, and other trip information. Traffic speed and demand can derived from trip record data from various sources, e.g., taxis, ride-hailing services, buses, bikes, or even dock-less e-scooters used in He & Shin [2020a]. These data can be collected in public transportation systems with mature methods, for example, by AFC (Automatic Fare Collection) in the subway and bus systems. Trip record data have the advantage of being capable of constructing multiple graph-based problems, e.g., station-level traffic flow and demand problems. They are also easier to collect in existing public transportation systems.
行程记录数据。其中包括出发和到达日期/时间、出发和到达地点以及其他旅行信息。交通速度和需求可以从各种来源的行程记录数据中得出,例如出租车、叫车服务、公共汽车、自行车,甚至 He & Shin [2020a] 中使用的无桩电动滑板车。这些数据可以通过成熟的方法在公共交通系统中收集,例如地铁和公交系统中的AFC(自动售检票)。行程记录数据的优点是能够构建多个基于图的问题,例如车站级交通流和需求问题。它们也更容易在现有的公共交通系统中收集。

Traffic Report Data. This type of data is often used for abnormal cases, e.g., anomaly report data used in Liu et al. [2020c] and traffic accident report data used in Zhou et al. [2020e], Zhang et al. [2020k], Zhou et al. [2020f]. Traffic report data are less used in graph-based modeling because of their sparsity in both spatial and temporal dimensions, compared with trip record data.
交通报告数据。此类数据通常用于异常情况,例如 Liu 等人使用的异常报告数据。 [2020c] 以及 Zhou 等人使用的交通事故报告数据。 [2020e],Zhang 等人。 [2020k],Zhou 等人。 [2020f]。与行程记录数据相比,交通报告数据由于在空间和时间维度上的稀疏性而较少用于基于图的建模。

Multimedia Data. This type of data can be used as an additional input to deep learning models or for verifying the traffic status indicated by other data sources. Multimedia data used in the surveyed studies include the Baidu street-view images used in Qin et al. [2020a] for traffic congestion, as well as satellite imagery data [Zhang et al., 2020k], and video surveillance data [Shao et al., 2020]. Multimedia data are also less seen in graph-based modeling because of their higher requirement for data collection, transmission and storage, compared with traffic sensor data with similar functionalities. It is also more difficult to extract precise traffic information, e.g., vehicle counts, from images or videos through image processing and object detection techniques.
多媒体数据。此类数据可用作深度学习模型的附加输入或用于验证其他数据源指示的流量状态。调查研究中使用的多媒体数据包括Qin等人使用的百度街景图像。 [2020a] 交通拥堵情况,以及卫星图像数据 [Zhang et al., 2020k] 和视频监控数据 [Shao et al., 2020]。多媒体数据在基于图的建模中也较少出现,因为与具有类似功能的交通传感器数据相比,多媒体数据对数据收集、传输和存储的要求更高。通过图像处理和目标检测技术从图像或视频中提取精确的交通信息(例如车辆数量)也更加困难。

Simulated Traffic Data. In addition to observed real-world datasets, microscopic traffic simulators are also used to build virtual training and testing datasets for deep learning models. Examples in the surveyed studies include the MATES Simulator used in Fukuda et al. [2020] and INTEGRATION software used in Ramadan et al. [2020]. With many real-world datasets available, simulated traffic data are rarely used in GNN-based and more broader ML-based traffic forecasting studies. Traffic simulations have the potential of modeling unseen graphs though, e.g., evaluating a planned road topology.
模拟交通数据。除了观察到的现实世界数据集之外,微观交通模拟器还用于为深度学习模型构建虚拟训练和测试数据集。调查研究中的例子包括 Fukuda 等人使用的 MATES Simulator。 [ 2020] 和 Ramadan 等人使用的 INTEGRATION 软件。 [2020]。由于有许多现实世界的数据集可用,模拟交通数据很少用于基于 GNN 和更广泛的基于 ML 的交通预测研究。不过,交通模拟具有对看不见的图形进行建模的潜力,例如评估规划的道路拓扑。

Weather Data. Traffic states are highly affected by the meteorological factors including temperature, humidity, precipitation, barometer pressure, and wind strength.
天气数据。交通状况受温度、湿度、降水、气压、风力等气象因素影响较大。

Calendar Data. This includes the information on weekends and holidays. Because traffic patterns vary significantly between weekdays and weekends/holidays, some studies consider these two cases separately. Both weather and calendar data have been proven useful for traffic forecasting in the literature and should not be neglected in graph-based modeling as external factors.
日历数据。这包括周末和节假日的信息。由于工作日和周末/节假日之间的流量模式差异很大,因此一些研究分别考虑这两种情况。文献中已证明天气和日历数据对于交通预测有用,并且在基于图形的建模中不应作为外部因素而被忽视。

While present road network and weather data can be easily found on the Internet, it is much more difficult to source historical traffic data, both due to data privacy concerns and the transmission and storage requirements of large data volumes. In Table 6 we present a list of the open data resources used in the surveyed studies. Most of these open data are already cleaned or preprocessed and can be readily used for benchmarking and comparing the performance of different models in future work.
虽然当前的路网和天气数据可以很容易地在互联网上找到,但由于数据隐私问题以及大数据量的传输和存储要求,获取历史交通数据要困难得多。在表 6 中,我们列出了调查研究中使用的开放数据资源。大多数开放数据已经过清理或预处理,可以在未来的工作中轻松用于基准测试和比较不同模型的性能。

Table 6: Open data for traffic prediction problems.
表 6:交通预测问题的开放数据。
Dataset Name 数据集名称 Relevant Studies 相关研究
METR-LA  Li et al. [2018b], Wu et al. [2019], Xu et al. [2020a], Pan et al. [