This article was submitted to Optics and Photonics, a section of the journal Frontiers in Physics 这篇文章被提交到《物理前沿》期刊的光学和光子学部分
Received: 19 October 2020 收到:2020 年 10 月 19 日
Accepted: 12 November 2020 接受:2020 年 11 月 12 日
Published: 09 December 2020 发布日期:2020 年 12 月 9 日
Citation: 引用:
Lin Y, Zhong Q and Sun H (2020) A Pointer Type Instrument Intelligent Reading System Design Based on Convolutional Neural Networks. 林毅,钟强,孙浩(2020)基于卷积神经网络的手指类型智能阅读系统设计。
A Pointer Type Instrument Intelligent Reading System Design Based on Convolutional Neural Networks 基于卷积神经网络的光指针类型智能阅读系统设计
Yue Lin, Qinghua Zhong * and Hailing Sun 岳林,清华大学* 和 孙海琳Guangdong Provincial Key Laboratory of Optical Information Materials and Technology and Institute of Electronic Paper Displays, South China Academy of Advanced Optoelectronics, South China Normal University, Guangzhou, China 广东省光学信息材料与技术重点实验室、华南师范大学电子纸显示技术研究所,华南师范大学,中国广州
Abstract 摘要
The pointer instrument has the advantages of being simple, reliable, stable, easy to maintain, having strong anti-interference properties, and so on, which has long occupied the main position of electrical and electric instruments. Though the pointer instrument structure is simple, it is not convenient for real-time reading of measurements. In this paper, a RK3399 microcomputer was used for real-time intelligent reading of a pointer instrument using a camera. Firstly, a histogram normalization transform algorithm was used to optimize the brightness and enhance the contrast of images; then, the feature recognition algorithm You Only Look Once 3rd (YOLOv3) was used to detect and capture the panel area in images; and Convolutional Neural Networks were used to read and predict the characteristic images. Finally, predicted results were uploaded to a server. The system realized automatic identification, numerical reading, an intelligent online reading of pointer data, which has high feasibility and practical value. The experimental results show that the recognition rate of this system was 98.71%98.71 \% and the reading accuracy was 97.42%97.42 \%. What is more, the system can accurately locate the pointer-type instrument area and read corresponding values with simple operating conditions. This achievement meets the demand of real-time readings for analog instruments. 指针仪表具有结构简单、可靠、稳定、易于维护、抗干扰能力强等优点,长期以来占据着电工仪表的主导地位。尽管指针仪表结构简单,但不便于实时读取测量值。在本文中,使用 RK3399 微计算机通过摄像头实时智能读取指针仪表。首先,使用直方图归一化变换算法优化图像亮度并增强对比度;然后,使用特征识别算法 YOLOv3 检测和捕获图像中的面板区域;接着,使用卷积神经网络读取和预测特征图像。最后,将预测结果上传到服务器。该系统实现了自动识别、数值读取、指针数据的智能在线读取,具有较高的可行性和实用价值。实验结果表明,该系统的识别率为 98.71%98.71 \% ,读取精度为 97.42%97.42 \% 。 此外,该系统可以准确定位指针式仪表区域,并在简单操作条件下读取相应值。这一成就满足了模拟仪表实时读数的需求。
Keywords: pointer instrument, deep learning, You Only Look Once 3rd, convolutional neural networks, automatic reading 关键词:指针仪表、深度学习、YOLO 3、卷积神经网络、自动阅读
INTRODUCTION 简介
The pointer instrument has been widely used in traditional industries, such as industrial production and power transmission because of its properties of being simple, reliable, stable, less affected by temperature, strong anti-interference, easy to maintain, and so on, but it does not reserve any digital interface. Therefore, most of them must be read by humans. This method has disadvantages of low accuracy, poor reliability and low efficiency. So, the pointer position of the instrument should be converted into digital signals by sensors to realize automatic meter reading, which is of great significance in the application of unattended substation [1]. 指针仪表因其简单、可靠、稳定、受温度影响小、抗干扰能力强、易于维护等特性,在传统工业如工业生产和电力传输中得到广泛应用,但它不预留任何数字接口。因此,大多数仪表必须由人工读取。这种方法存在精度低、可靠性差和效率低等缺点。因此,应通过传感器将仪表的指针位置转换为数字信号,以实现自动抄表,这在无人值守变电站的应用中具有重要意义[1]。
In order to solve this problem, many automatic recognition numerical reading algorithms based on computer readings have emerged in recent years. The existing pointer instrument recognition algorithms can be divided into two kinds, traditional algorithms based on digital image processing technology and modern algorithms based on machine learning or deep learning [2]. Traditional algorithms are the foundation of dilation and corrosion, and noise reduction 为了解决这个问题,近年来涌现出许多基于计算机阅读的自动识别数字阅读算法。现有的指针仪器识别算法可以分为两类,一类是基于数字图像处理技术的传统算法,另一类是基于机器学习或深度学习的现代算法[2]。传统算法是膨胀和腐蚀、降噪的基础。
filtering and feature matching of image matrixes must be done to perform target recognition. The common traditional algorithms included a binarization threshold segmentation method based on symmetry for dial recognition, and an improved random sample consensus algorithm for pointer reading recognition [3], but this algorithm had high requirements for picture resolution and relatively high definition. Then, a method of accurately positioning the pointer by the Circle-based Regional Cumulative Histogram method was proposed [4]. The adaptability of this method for complex scenes was poor, and the recognition rate was not high. In addition, a visual inspection method was used to detect the transformed image to get the reading of the pointer meter [5]. And a regional growth method was used to locate the dial area and its center, then the improved center projection method was used to perform scale marking and boundary detection by the dial image [6]. Such algorithms based on computer vision and machine learning have poor portability and general versatility. Modern algorithms are machine learning techniques which are used to build and simulate neural networks for analytical learning. From a statistical point of view, this method can predict the distribution of data, learn a model from the data and then use the model to predict new data. So, we can carry out data classification without too much related knowledge [7, 8]. Then, the Mask-RCNN was used to divide the meter dial and the pointer area, the line fitting and angle reading method was used to calculate the pointer reading [9]. The requirement of the instrument placement position in these algorithms was relatively high. At the same time, Faster R-CNN was used to detect the position of the target instrument, and reading by feature correspondence and perspective transformation [10]. However, it had a high requirement for system operation performance. Mask R-CNN was also used to detect key points of tick marks and pointers, then used the intersection of circle and straight line to calculate the reading [11], but the algorithm was complex, and computationally intensive. 图像矩阵的滤波和特征匹配必须完成以进行目标识别。常见的传统算法包括基于对称性的二值化阈值分割方法用于拨号识别,以及改进的随机样本一致性算法用于指针阅读识别[3],但此算法对图像分辨率要求高且相对分辨率高。然后,提出了一种基于圆的区域累积直方图方法准确定位指针的方法[4]。该方法对复杂场景的适应性较差,识别率不高。此外,使用视觉检查方法检测变换后的图像以获取指针表的读数[5]。然后,使用区域增长方法定位拨盘区域及其中心,然后使用改进的中心投影方法通过拨盘图像进行尺度标记和边界检测[6]。此类基于计算机视觉和机器学习的算法具有较差的可移植性和通用性。现代算法是用于构建和模拟用于分析学习的神经网络的学习机器学习技术。 从统计角度看,这种方法可以预测数据的分布,从数据中学习模型,然后使用模型预测新数据。因此,我们可以进行数据分类而不需要太多相关知识[7, 8]。然后,使用 Mask-RCNN 将表盘和指针区域分开,使用线拟合和角度读取方法计算指针读数[9]。在这些算法中,对仪器放置位置的要求相对较高。同时,使用 Faster R-CNN 检测目标仪器的位置,并通过特征对应和透视变换进行读取[10]。然而,它对系统操作性能的要求很高。Mask R-CNN 还用于检测刻度和指针的关键点,然后通过圆和直线的交点计算读数[11],但算法复杂,计算量大。
Therefore, we proposed a pointer type instrument intelligent reading system based on CNN. The system used a histogram normalization algorithm for image brightness optimization and contrast enhancement, and then YOLOv3 feature recognition algorithm was used to detect a panel area in an image and the corresponding area was saved as a feature image. Then, a multilayer convolution neural network was introduced to read the numerical prediction. The system mainly includes four algorithms: histogram normalization transformation, dial grayscale transformation, YOLOv3 feature recognition, and convolutional neural network reading prediction, which can accurately locate the panel area and read corresponding numerical values. 因此,我们提出了一种基于 CNN 的指针类型仪器智能阅读系统。该系统使用直方图归一化算法进行图像亮度优化和对比度增强,然后使用 YOLOv3 特征识别算法检测图像中的面板区域,并将相应的区域保存为特征图像。然后,引入多层卷积神经网络来读取数值预测。该系统主要包括四种算法:直方图归一化变换、旋钮灰度变换、YOLOv3 特征识别和卷积神经网络阅读预测,可以准确定位面板区域并读取相应的数值。
SYSTEM DESIGN PRINCIPLE 系统设计原则
Histogram Normalization Transformation 直方图归一化变换
The electrician and electric pointer meters are often placed in some special rooms, and the surrounding light environment is 电工和电指示仪表通常放置在一些特殊房间内,周围的光环境是
complicated so that the quality of the surveillance images is low contrast and has a poor image clarity. Since the gray value distribution of image pixels conforms to the law of probability and statistics distribution, the image is preprocessed by the histogram normalization transformation to realize the balanced distribution of image gray levels for improving image contrast and optimizing image brightness [12]. The row of the input image I is rr and the column is cc. And I(r,c)I(r, c) is the gray value of input image I in row r and column c, O(r,c)O(r, c) is the gray value of output image O in row r and column c. The minimum gray value of I is I_("min ")I_{\text {min }} and the maximum value is I_(max)I_{\max }. The minimum gray value of OO is O_(min)O_{\min } and the maximum value is O_(max)O_{\max }. Then relationship among them is shown in Eq. (1). 复杂,导致监控图像质量低对比度,图像清晰度差。由于图像像素的灰度值分布符合概率和统计分布规律,通过直方图归一化变换对图像进行预处理,以实现图像灰度级的平衡分布,从而提高图像对比度和优化图像亮度[12]。输入图像 I 的行是 rr ,列是 cc 。 I(r,c)I(r, c) 是输入图像 I 在行 r 和列 c 处的灰度值, O(r,c)O(r, c) 是输出图像 O 在行 r 和列 c 处的灰度值。I 的最小灰度值是 I_("min ")I_{\text {min }} ,最大值是 I_(max)I_{\max } 。 OO 的最小灰度值是 O_(min)O_{\min } ,最大值是 O_(max)O_{\max } 。它们之间的关系如公式(1)所示。
The process is called histogram normalization transformation. At the same time, O(r,c)O(r, c) can be calculated as Eq. (3), alpha\alpha and beta\beta are weight and bias variate. 该过程称为直方图归一化变换。同时, O(r,c)O(r, c) 可以按公式(3)计算, alpha\alpha 和 beta\beta 是权重和偏差变量。
As a result, histogram normalization transformation is a linear transformation method to automatically select alpha\alpha and beta\beta. General order O_("min ")=0,O_(max)=255O_{\text {min }}=0, O_{\max }=255, then Eq. (4) can be converted to Eq. (5). 因此,直方图归一化变换是一种自动选择 alpha\alpha 和 beta\beta 的线性变换方法。一般顺序 O_("min ")=0,O_(max)=255O_{\text {min }}=0, O_{\max }=255 ,然后可以将公式(4)转换为公式(5)。
Most of the panel area of pointer meters is a relatively simple color, either black or white. In order to reduce the amount of computation, image data can be transformed into grayscale. According to the importance of three primary colors, and the sensitivity of human eyes to different colors, the three components of color are given weighted averages by different weights [13], and it can be carried out by Eq. (6) to obtain a grayscale image, where ii and jj represent coordinates of horizontal and vertical of images, R(i,j),G(i,j)R(i, j), G(i, j) and B(i,j)B(i, j) respectively represent components of a points in row ii and column jj of three primary colors. 指针仪表的大部分面板区域是相对简单的颜色,要么是黑色,要么是白色。为了减少计算量,图像数据可以转换为灰度。根据三原色的重要性和人眼对不同颜色的敏感性,颜色的三个分量通过不同的权重给出加权平均值[13],可以通过公式(6)进行,以获得灰度图像,其中 ii 和 jj 分别代表图像的水平和垂直坐标, R(i,j),G(i,j)R(i, j), G(i, j) 和 B(i,j)B(i, j) 分别代表三原色中第 4#行和第 5#列的点的分量。
FIGURE 1 | YOLOv3 Tiny Network structure diagram. CONV is a convolution operation, POOL is a pooling operation, and UPSAMPLE is an up sample operation. 图 1 | YOLOv3 Tiny 网络结构图。CONV 是卷积操作,POOL 是池化操作,UPSAMPLE 是上采样操作。
You Only Look Once 3rd Feature Recognition 您只看一次 3rd 特征识别
There are two common features in recognition algorithms. One involves candidate regions, and then the region of interest (ROI) is classified, and location coordinates are predicted. This kind of algorithm is called two-stage feature recognition algorithm [14]. Another is one-stage detection algorithm, which needs one network to generate the ROI and predict the category, such as the YOLOv3 feature recognition algorithm [15]. Compared with the two-stage feature recognition algorithm, YOLOv3 uses a single network structure to predict object category and location for generating candidate regions, and each real box of YOLOv3 only corresponds to a correct candidate area [16]. These features provide YOLOv3 with less computation and a faster detection speed, which is more suitable for porting to an embedded computing platform due to weak computing performance. 两种常见的识别算法。一种涉及候选区域,然后对感兴趣区域(ROI)进行分类,并预测位置坐标。这种算法称为两阶段特征识别算法[14]。另一种是一阶段检测算法,需要单个网络生成 ROI 并预测类别,例如 YOLOv3 特征识别算法[15]。与两阶段特征识别算法相比,YOLOv3 使用单个网络结构来预测对象类别和位置以生成候选区域,YOLOv3 的每个真实框只对应一个正确的候选区域[16]。这些特性为 YOLOv3 提供了更少的计算和更快的检测速度,由于计算性能较弱,更适合移植到嵌入式计算平台。
A standard network structure of YOLOv3 has 107 convolutional layers. The first 74 are based on the Darknet-53 network layer and serve as the main network structure. The 75th to 107 th layers are the feature interaction layers, which realizes local feature interaction by means of convolution kernel [17]. Because the target operating platform is an embedded platform, and recognition characteristics of pointer meters are relatively obvious, a simplified version of YOLO network structure YOLOv3 Tiny was used as a network structure by us. Its network structure is shown in Figure 1, CONV was a convolution operation, POOL was a pooling operation, and UPSAMPLE was an up sample operation. To reduce YOLOv3 的标准网络结构有 107 个卷积层。前 74 层基于 Darknet-53 网络层,作为主要网络结构。第 75 层到第 107 层是特征交互层,通过卷积核[17]实现局部特征交互。由于目标操作系统是嵌入式平台,指针表识别特征相对明显,我们使用了 YOLO 网络结构的简化版 YOLOv3 Tiny。其网络结构如图 1 所示,CONV 表示卷积操作,POOL 表示池化操作,UPSAMPLE 表示上采样操作。为了减少
computation, the input layer used color images which was resized to a width of 640 pixels and height of 640 pixels, and the YOLO Layer was used as the output layer. Although YOLOv3 Tiny network only retained 24 convolutional layers, it still retained two YOLO network layers, which greatly reduced the amount of computation and still ensured the accuracy of model recognition [18, 19]. 计算中,输入层使用了调整至 640 像素宽和 640 像素高的彩色图像,并使用 YOLO 层作为输出层。尽管 YOLOv3 Tiny 网络只保留了 24 个卷积层,但它仍然保留了两个 YOLO 网络层,这大大减少了计算量,同时仍确保了模型识别的准确性[18, 19]。
As a feedforward neural network, the convolutional neural network has excellent performance in large-scale image processing and it has been widely used in image classification and positioning. Panel image data cannot be linearly classified. To deal with this kind of data, we proposed a multi-layer convolutional neural network, which could perform the classification by mapping the original data to a linearly separable high-dimensional space, and then a specific linear classifier was used [20,21]. A three-layer neural network model including an input layer, a hidden layer and an output layer was used to train the image data of panel areas by numerical reading [22], as shown in Eq. (7). 作为前馈神经网络,卷积神经网络在大型图像处理中表现出色,它已被广泛应用于图像分类和定位。面板图像数据不能进行线性分类。为了处理这类数据,我们提出了一种多层卷积神经网络,通过将原始数据映射到可线性分离的高维空间,然后使用特定的线性分类器进行分类[20,21]。使用包括输入层、隐藏层和输出层的三层神经网络模型,通过数值读取训练面板区域的图像数据[22],如图(7)所示。
If input xx has m nodes and output f(x)f(x) has n nodes, then weight vector WW is a matrix with nn rows and mm columns. Input xx is a vector with length m , bias vector bb is a vector with length n , 如果输入 xx 有 m 个节点,输出 f(x)f(x) 有 n 个节点,那么权重向量 WW 是一个有 nn 行和 mm 列的矩阵。输入 xx 是一个长度为 m 的向量,偏置向量 bb 是一个长度为 n 的向量。