A Note on Improvement of Multi Object Tracking by Frame Interpolation for Intersection Traffic 关于通过帧插值改善交叉路口交通的多目标跟踪的注解
Koji Saito 齋藤光司Graduate School of Engineering, 工程研究生院Hokkaido University 北海道大学N-13, W-8, Kita-ku, Sapporo, 札幌市北区 N-13, W-8Hokkaido, 060-8628, Japan 北海道,060-8628,日本E-mail:sk-1997_hd@eis.hokudai.ac.jp 电子邮件:sk-1997_hd@eis.hokudai.ac.jp
Sho Takahashi and Toru Hagiwara 高桥祥和和萩原透Faculty of Engineering, 工程学院Hokkaido University 北海道大学N-13, W-8, Kita-ku, Sapporo, 札幌市北区 W-8, N-13Hokkaido, 060-8628, Japan 北海道,060-8628,日本Email:{stakahashi, hagiwara} @eng.hokudai.ac.jp 电子邮件:{stakahashi, hagiwara}@eng.hokudai.ac.jp
Abstract 摘要
For video-based vehicle counting, methods utilizing object detection based on deep learning and DeepSORT for multi-object tracking have been proposed. Tracking accuracy degradation has been observed in DeepSORT for turning vehicles at traffic intersections. According to recent work, tracking accuracy is expected to be improved by increasing the frame rate of the input video. Therefore, this paper verifies the accuracy of a vehicle counting system for intersection traffic by changing the input video frame rate. Also, in order to discuss about efficient frame interpolation, this paper verified the accuracy of MOT in frame interpolated video. Specifically, the vehicle counting system is applied to low frame-rate video and video with simple frame interpolation based on averaged images. 针对基于视频的车辆计数,提出了利用基于深度学习的目标检测和 DeepSORT 进行多目标跟踪的方法。在交通路口转弯车辆的情况下,DeepSORT 的跟踪精度会有所下降。根据最新研究,通过提高输入视频的帧率,预计可以提高跟踪精度。因此,本文通过改变输入视频的帧率,验证了交叉路口交通车辆计数系统的精度。此外,为了讨论高效的帧插值,本文还验证了基于平均图像的帧插值视频中 MOT 的精度。具体来说,将车辆计数系统应用于低帧率视频和简单帧插值视频。
Index Terms-vehicle counting, object tracking, Deep SORT, frame interpolation 索引词 - 车辆计数, 目标跟踪, Deep SORT, 帧内插
I. INTRODUCTION 一、 简介
Automation of traffic volume surveys through AI analysis of closed-circuit television (CCTV) videos is being promoted from the viewpoints of cost reduction and operational efficiency. For AI-based traffic surveys, a method based on object detection and Multi-Object-Tracking (MOT) are proposed in [1], which utilizes DeepSORT [2] as MOT. DeepSORT is an improved method based on the SORT algorithm. DeepSORT improves object tracking performance in occlusion by utilizing objects' apparent features and performing nearestneighbor matching. At the same time, DeepSORT suppresses the occurrence of ID switching. Therefore, DeepSORT is also utilized for vehicle tracking on roads prone to occlusion due to traffic congestion. However, vehicles that turn right or left at intersections are prone to ID switching during tracking. The appearance of vehicles between frames changes with turning behavior, which may fail to evaluate target identity and result in ID switching. In traffic volume surveys at intersections, robust tracking of these turning vehicles is essential for more accurate counts of passing vehicles. Recent work [3] revealed that the accuracy of tracking algorithms significantly improved with higher frame rates (e.g. 240 fps) available. Tracking accuracy is expected to improve by increasing the frame rate of the input video. However, high frame rate cameras could not be utilized in some CCTV installation environments due to data transmission limitations. 通过对闭路电视(CCTV)视频进行人工智能分析来自动化交通量调查,正从成本节约和运营效率的角度得到推广。对于基于人工智能的交通调查,[1]中提出了一种基于目标检测和多目标跟踪(MOT)的方法,该方法利用 DeepSORT [2]作为 MOT。DeepSORT 是基于 SORT 算法的改进方法。DeepSORT 通过利用目标的显著特征并进行最近邻匹配来提高在遮挡情况下的目标跟踪性能。同时,DeepSORT 抑制了 ID 切换的发生。因此,DeepSORT 也被用于在交通拥堵导致遮挡的道路上进行车辆跟踪。然而,在十字路口处左转或右转的车辆容易在跟踪过程中发生 ID 切换。车辆在帧与帧之间的外观会随着转弯行为的变化而变化,这可能会导致目标身份评估失败并引起 ID 切换。在十字路口的交通量调查中,对这些转弯车辆进行可靠跟踪对于获得更准确的过车数量非常重要。最新的研究[3]表明,随着可用的帧率提高(例如 240 fps),跟踪算法的准确性显著提高。通过提高输入视频的帧率,可以预期跟踪准确性会有所提高。但是,由于数据传输限制,在某些 CCTV 安装环境中无法使用高帧率摄像机。
Fig. 1. The outline of the vehicle counting system. 图 1. 车辆计数系统的概述。
Fig. 2. Counter line on the intersection CCTV video. 图 2.交叉路口 CCTV 视频上的计数线。
In this paper, the accuracy of a vehicle counting system for intersection traffic is verified with changes in the frame rate of the input video. Also, in order to discuss about efficient frame interpolation, this paper verified the accuracy of MOT in frame interpolated video. Specifically, the system performs frame interpolation on the input video as a pre-processing step for object detection and MOT. By utilizing the results of the above verification, the realization of a more accurate system for surveying traffic volumes at intersections based on object detection methods and DeepSORT can be expected. 在本文中,我们验证了输入视频帧率变化对交叉路口车辆计数系统准确性的影响。为了讨论高效的帧插值方法,本文还验证了帧插值视频中 MOT 的准确性。具体而言,该系统对输入视频进行帧插值作为目标检测和多目标跟踪的预处理步骤。利用上述验证结果,基于目标检测方法和 DeepSORT 可以期望实现更准确的交叉路口交通量调查系统。
II. VeHicLe Counting SYStEM 二、车辆计数系统
The vehicle counting system acquires vehicle trajectories at intersections and counts them based on each movement's behavior. The outline of the system is shown in Fig. 1. At first, frame pre-processing by frame interpolation is applied to the input video obtained from CCTV. The pre-processed video is input to the vehicle tracking. The vehicle tracking is composed of the object detection and MOT model to obtain vehicle trajectories from the pre-processed video. For object detection, YOLOv4 [4] trained on the COCO data-set [5] is utilized. Then, MOT with DeepSORT is applied to obtain the vehicle trajectories detected by YOLOv4. Vehicle movements are assigned from the vehicle trajectories and counting results are obtained. Specifically, vehicle movements are assigned based on the crossing status of the vehicle trajectories and the counter lines. The counter lines are the yellow lines A, B, C, 车辆计数系统在路口采集车辆轨迹并根据每个运动行为进行计数。系统概况如图 1 所示。首先,对从 CCTV 获得的输入视频应用帧插值的预处理。预处理后的视频输入到车辆跟踪中。车辆跟踪由对象检测和 MOT 模型组成,以从预处理视频中获取车辆轨迹。对于对象检测,利用针对 COCO 数据集训练的 YOLOv4[4]。然后应用 DeepSORT 进行 MOT,以获得 YOLOv4 检测到的车辆轨迹。根据车辆轨迹的跨越状态和计数线确定车辆运动,并得到计数结果。具体来说,计数线为黄线 A、B、C。
Fig. 3. Missing number and Frame Rate. 图 3.缺失数字和帧率。
and D shown in Fig. 2. For example, a vehicle moving from the counter line from A to B is assigned a "left turn from A to B" in Fig. 2. Based on the results of these movement assignments, counting results of vehicles in the intersection are obtained. 如图 2 所示, A 到 B。 例如,从 A 到 B 的截线车辆被分配为"从 A 到 B 的左转"。 根据这些运动分配的结果,获得了交叉口内车辆的计数结果。
III. VERIFICATION OF LOW FRAME-RATE VIDEO 三、低帧率视频的验证
In order to confirm the relationship between the frame rate of the input video and tracking accuracy, an experiment was conducted on low frame rate video. In this experiment, a CCTV video installed at the intersection shown in Fig. 2 was utilized(30 fps, pixels, 1 hour). The count targets were turning vehicles traveling routes A to B and A to D in Fig. 2. There were 88 turning vehicles in the video, which were utilized as the actual number. The video was converted to a lower frame rate in the frame pre-processing phase of the vehicle counting system. Frame extraction was performed to the 30 fps original video to generate low frame rate video of , and 15 fps. The vehicle counting system was conducted on the original video and on the generated low frame-rate videos. The missing number is the number of vehicles that fail to track and are not counted. In the experiment, the relationship between the frame rate of the input video and the missing number will be confirmed. 为了确认输入视频帧率与跟踪精度之间的关系,我们对低帧率视频进行了实验。在这个实验中,我们使用了安装在图 2 所示交叉口的一个 CCTV 视频(30 fps, 像素, 1 小时)。目标为转向车辆,行驶路线为从 A 到 B 和从 A 到 D。视频中共有 88 辆转向车辆,作为实际车辆数。在车辆计数系统的帧预处理阶段,该视频被转换为更低的帧率。从原始 30 fps 视频中提取帧,生成 fps 和 15 fps 的低帧率视频。车辆计数系统分别应用于原始视频和生成的低帧率视频。未检测到的车辆数就是未能跟踪且未被计数的车辆数。实验将确认输入视频帧率与未检测车辆数之间的关系。
The experimental results are shown in Fig. 3 shows that the missing number tends to decrease as the fps rate increases. The number of vehicles correctly tracked increased with increasing frame rate from 10 fps to 30 fps , and the ID switching no longer occurred for some of these vehicles. Therefore, tracking accuracy is expected to improve with increasing frame rates of the input video. 实验结果如图 3 所示,显示丢失的数字趋于减少,随着帧率的增加。从 10fps 到 30fps,正确跟踪的车辆数量随着帧率的增加而增加,并且有些车辆不再发生 ID 切换。因此,输入视频的帧率越高,跟踪精度预计会提高。
IV. FRAME INTERPOLATION BASED ON AVERAGE IMAGES 基于平均图像的帧插值
In frame pre-processing of the proposed system, frame interpolation is utilized to increase the frame rate of the input video. A simple method based on average images is utilized as a frame interpolation method, Specifically, insert the average images of the previous frame and the next frame as intermediate frames for frame interpolation. In order to confirm the effectiveness of the system with the interpolation method, an experiment was conducted. The experimental video and settings are the same as in the experiment in Section III. The experiment generated a 60 fps video from a 30 fps original video. In addition, to verify the frame interpolation method, 在所提出系统的帧预处理中,采用帧插值来增加输入视频的帧率。采用基于平均图像的简单方法作为帧插值方法,具体是插入前一帧和下一帧的平均图像作为中间帧进行帧插值。为了证实采用该插值方法的系统的有效性,进行了实验。实验视频和设置与第三节中的实验相同。该实验从原始 30 帧/秒的视频生成了 60 帧/秒的视频。另外,为了验证帧插值方法,
TABLE I 表 I
VEHICLE COUNTING RESULTS 车辆统计结果
Original 原文:Original
翻译文本:原始
Frame-interpolated 帧插补
Frame rate 帧率
30fps 30 帧每秒
30fps 30 帧/秒
60fps 60 帧每秒
Counting Result 计数结果
94
88
80
Over detection 过度检测
16
10
3
Missing number 缺失的数字
10
10
11
the original video was converted to 15 fps and then restored to 30 fps. The vehicle counting system was conducted on the original video, and two videos were generated with frame interpolation. 原始视频已转换为 15 帧/秒,然后恢复为 30 帧/秒。车辆计数系统在原始视频上进行,并用帧插值生成了两个视频。
Table I shows the experimental results. In Table I, the over detection is the number of vehicles that excessively counted due to duplicate detection and false positives. The missing number is the number of vehicles that fail to track and are not counted. Table I show that over-detection is reduced in the frame-interpolated video. The missing number has not changed in the video restored from 15 fps to 30 fps by frame interpolation. Therefore, the count results are closer to the actual number. On the other hand, the missing number increased by one vehicle at 60 fps . A simple frame interpolation method based on averaged images not correctly complementing the middle frames of turning vehicles. Therefore, the middle frame caused id switching as noise and tracking failure occurred. It is considered necessary to increase the frame rate and optimize the interpolation method according to the vehicle's behavior. 表 I 显示了实验结果。在表 I 中,过度检测是由于重复检测和误报导致的车辆数过多。缺失数是未能跟踪的车辆数,未被计算在内。表 I 显示,通过帧插值视频,过度检测得到了降低。从 15fps 增加到 30fps 的帧插值视频,缺失数没有改变。因此,计数结果更接近实际数量。另一方面,在 60fps 时缺失数增加了一辆。基于平均图像的简单帧插值方法未能正确补充转弯车辆的中间帧。因此,中间帧产生了 id 切换,导致跟踪失败。认为有必要提高帧率,并根据车辆行为优化插值方法。
V. CONCLUSIONS 五、结论
In this paper, the performance of the vehicle counting system according to the frame rate variation was verified. As a result, it was confirmed that vehicle tracking performance improves as fps increase. In addition, an experiment was conducted on videos with simple frame interpolation based on the averaged image. In future work, optimal frame interpolation methods for turning vehicles will be examined. In addition, verification under different intersections and weather conditions will be conducted to confirm robustness under more complex conditions. 在本文中,根据帧率变化验证了车辆计数系统的性能。结果显示,随着 fps 的增加,车辆跟踪性能有所提高。此外,还针对基于平均图像的简单帧插值的视频进行了实验。未来的工作中将研究转弯车辆的最佳帧插值方法。此外,将在不同的十字路口和天气条件下进行验证,确认在更复杂条件下的鲁棒性。
ACKNOWLEDGMENT 确认
Experimental data were provided by the Hokkaido Regional Development Bureau, Ministry of Land, Infrastructure, Transport and Tourism. This work was partly supported by JSPS KAKENHI Grant Numbers JP22H01607. 北海道地方整備局、国土交通省より提供された実験データを使用しました。この研究は、JSPS 科研費 JP22H01607の一部助成を受けて行われました。
REFERENCES 参考文献
[1] H. Liang, H. Song, H. Li, and Z. Dai, "Vehicle counting system using deep learning and multi-object tracking methods," Transportation research record, vol.2674, no.4, pp.114-128, 2020 [1] 梁宏, 宋晖, 李晖, 戴哲, "基于深度学习和多目标跟踪方法的车辆计数系统",《交通研究记录》, 第 2674 卷, 第 4 期, 114-128 页, 2020 年
[2] N. Wojke, A. Bewley and D.Paulus, "Simple online and realtime tracking with a deep association metric," 2017 IEEE international conference on image processing (ICIP), pp. 3645-3649, 2017. [2] N. Wojke, A. Bewley 和 D.Paulus, "利用深度关联度量的简单在线和实时跟踪," 2017 年 IEEE 国际图像处理会议(ICIP), 第 3645-3649 页, 2017 年.
[3] Kiani, G.H., Fagg, A., Huang, C., et al, "Need for speed: A benchmark for higher frame rate object tracking," Proceedings of the IEEE International Conference on Computer Vision, pp. 125-1134, 2017 [3] 基尼, G.H., 法格, A., 黄, C.,等人, "速度需求:更高帧率目标跟踪的基准测试," 2017 年 IEEE 国际计算机视觉会议论文集,第 125-1134 页。
[4] Alexey Bochkovskiy, et al, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020. [4] Alexey Bochkovskiy 等人, "Yolov4:物体检测的最佳速度和精度",arXiv 预印本 arXiv:2004.10934, 2020.
[5] T. Y. Lin, M. Marie, S. Belongie, J. Hays, P. Perona, D.Ramanan, C. L. Zitnick, "Microsoft coco: Common objects in context," European conference on computer vision, pp. 740-755, 2014. [5] T. Y. 林,M. 玛丽,S. 邦黑尼,J. 海斯,P. 珀罗纳,D.拉曼,C. L. Zitnick,"Microsoft coco:上下文中的常见物体",欧洲计算机视觉会议,740-755 页,2014 年。