TRAFFIC-RESPONSIVE SIGNAL TIMING FOR SYSTEM-WIDE TRAFFIC CONTROL 系統範圍交通控制的交通響應信號定時
JAMES C. SPALL and DANIEL C. CHIN JAMES C. SPALL 和 DANIEL C. CHINThe Johns Hopkins University, Applied Physics Laboratory, Laurel, Maryland 20723-6099, U.S.A. 約翰霍普金斯大學,應用物理實驗室,馬里蘭州勞雷爾,郵政編碼 20723-6099,美國。
(Received 9 August 1996; in revised form 15 July 1997) (收到於 1996 年 8 月 9 日;修訂於 1997 年 7 月 15 日)
A major component of advanced traffic management for complex road systems is the timing strategy for the signalized intersections. This is an extremely challenging control problem at a system (network)-wide level. For present purposes: 複雜道路系統的先進交通管理的一個主要組成部分是信號交叉口的定時策略。這是一個在系統(網絡)層面上極具挑戰性的控制問題。為了目前的目的:
Abstract 摘要
System-wide control is the means for real-time (demand-responsive) adjustment of the timings of all signals in a traffic network to achieve a reduction in overall congestion consistent with the chosen system-wide measure of effectiveness (MOE). This real-time control is responsive to instantaneous changes in traffic conditions, including changes due to accidents or other traffic incidents. Further, the timings should change automatically to adapt to long-term changes in the system (e.g. street reconfiguration or seasonal variations). To achieve true sys-tem-wide optimality, the timings at different signals will not generally have a predetermined relationship to one another.* 系統範圍的控制是實時(需求響應)調整交通網絡中所有信號時序的手段,以實現與所選擇的系統範圍效能指標(MOE)一致的整體擁堵減少。這種實時控制對交通狀況的瞬時變化具有響應性,包括由於事故或其他交通事件引起的變化。此外,時序應自動變化,以適應系統中的長期變化(例如街道重組或季節性變化)。為了實現真正的系統範圍最佳化,不同信號的時序通常不會有預定的相互關係。
To the best knowledge of the authors, no existing or planned approach achieves such system-wide control. This paper presents an approach-S-TRAC (System-wide Traffic-Adaptive Control)-for treating this challenging problem. 根據作者的最佳知識,現有或計劃中的方法都無法實現這種系統範圍的控制。本文提出了一種方法-S-TRAC(系統範圍交通自適應控制)-來解決這一挑戰性問題。
All attempts known to us for real-time demand responsive control either are optimized only on a per-intersection basis or make simplifying assumptions to treat the multiple-intersection problem. An example of the former is OPAC (Gartner et al., 1991) while examples of the latter include SCOOT (Hunt et al. 1981; Martin and Hockaday, 1995) and REALBAND (Dell’Olmo and Mirchandani, 1995). SCOOT’s system-wide (i.e. multiple, interconnecting artery) approach is limited to broad strategy choices from one traffic corridor to another rather than a co-ordinated set of signal parameter selections for the entire network. Hence, although SCOOT may be 我們所知的所有即時需求響應控制的嘗試,要麼僅在每個交叉口的基礎上進行優化,要麼做出簡化假設來處理多交叉口問題。前者的例子是 OPAC(Gartner 等,1991),而後者的例子包括 SCOOT(Hunt 等,1981;Martin 和 Hockaday,1995)和 REALBAND(Dell’Olmo 和 Mirchandani,1995)。SCOOT 的系統範圍(即多個互聯動脈)方法僅限於從一個交通走廊到另一個交通走廊的廣泛策略選擇,而不是對整個網絡的信號參數選擇進行協調。因此,儘管 SCOOT 可能是
implemented on a full traffic system, it is not a true system-wide controller in the sense considered here. “[SCOOT’s] regional boundaries are satisfactory for zoned control, but fail to offer widespread strategic control” (Martin and Hockaday, 1995). The other multiple intersection technique mentioned above, REALBAND, provides a way to improve platoon progression, which the other techniques apparently lack. However, REALBAND is limited in its application to types of traffic patterns for which vehicle platoons are easily identifiable and, thus, may not perform well in heavily congested conditions with no readily identifiable platoons. Note that none of these techniques incorporate a method to automatically self-tune over a period of weeks or months. In addition, most approaches to traffic control have been developed independent of modern techniques in nonlinear stochastic control (notable exceptions to this for freeway traffic control are Messmer and Papageorgiou (1994) and Papageorigiou et al. (1995)). 在一個完整的交通系統中實施,它並不是這裡所考慮的真正系統範圍控制器。“[SCOOT 的]區域邊界對於分區控制是滿意的,但未能提供廣泛的戰略控制”(馬丁和霍克戴,1995 年)。上述提到的另一種多交叉口技術 REALBAND,提供了一種改善車隊進展的方法,而其他技術顯然缺乏這一點。然而,REALBAND 在應用上受到限制,僅適用於車輛車隊易於識別的交通模式,因此,在沒有明顯可識別的車隊的嚴重擁堵情況下,可能表現不佳。請注意,這些技術中沒有一種包含自動自我調整的方法,這需要幾週或幾個月的時間。此外,大多數交通控制方法的發展是獨立於現代非線性隨機控制技術的(對於高速公路交通控制的顯著例外是梅斯默和帕帕喬里奧(1994 年)以及帕帕喬里奧等(1995 年))。
The essential ingredient in these and other modern attempts to provide optimal signal timings for single or multiple intersections is a model for the traffic behavior. However, the problem of fully modeling traffic at a system-wide level is daunting: “To develop a ‘general theory’ for the stochastic behavior of a traffic system is out of the question. Even if it were possible such a theory would be so complex as to be of no practical value.” (Newell, 1989, p. 258). In the OPAC, SCOOT, and REALBAND approaches discussed above, the models used are in the form of traditional equation-based relationships, but it is also possible to use other model representations such as a neural network (Nataksuji and Kaku, 1991), fuzzy associative memory matrix (Kelsey and Bisset, 1993), or rules base for an expert system (Ritchie, 1990). The signal timings are then based on relationships (algebraic or otherwise) derived from the assumed model of the traffic dynamics. For real-time (demand-responsive) approaches, this relationship (or ‘control function’) takes as input information about current traffic conditions and produces as output the timings for the signals. However, to the extent that the traffic dynamics model is flawed or oversimplified, the signal timings will be suboptimal. 這些以及其他現代嘗試為單個或多個交叉口提供最佳信號時機的基本成分是交通行為的模型。然而,全面建模系統級別的交通問題是艱鉅的:“開發一個交通系統隨機行為的‘一般理論’是不可想像的。即使可能,這樣的理論也會複雜到沒有實際價值。”(Newell, 1989, p. 258)。在上述討論的 OPAC、SCOOT 和 REALBAND 方法中,所使用的模型是傳統方程式關係的形式,但也可以使用其他模型表示法,例如神經網絡(Nataksuji 和 Kaku,1991)、模糊關聯記憶矩陣(Kelsey 和 Bisset,1993)或專家系統的規則庫(Ritchie,1990)。然後,信號時機基於從假設的交通動態模型推導出的關係(代數或其他)。對於實時(需求響應)方法,這種關係(或“控制函數”)以當前交通狀況的信息為輸入,並產生信號的時機作為輸出。 然而,若交通動態模型存在缺陷或過於簡化,信號時序將會不理想。
The unique aspect of the S-TRAC control strategy here is that it does not require a system-wide traffic-dynamics model (this model avoidance is possible through use of a powerful method in stochastic optimization, as discussed in Sections 2-4 below). S-TRAC is based on a neural network or other function approximator for use in the control function; no model (e.g. set of differential equations or a second neural network) is needed for the traffic dynamics. Thus, in S-TRAC, there are no requirements to build equations describing critical traffic elements such as complex flow interactions among the arteries in the presence of traffic congestion, weather-related changes in driving patterns, flow changes as a result of variable message signs or radio announcements, etc. The extreme difficulty in mathematically describing such critical elements of the traffic system will inherently limit any control strategy that requires a model of the traffic dynamics, which is the implication of the Newell (1989) quote above. Related to this is the non-robustness of system model-based controls to operational traffic situations that differ significantly from situations represented in the data used to build the system model (this non-robustness can sometimes lead to unstable system behavior). Further, even if a reliable system model could be built, a change to the scenario or measure-of-effectiveness (MOE) would typically entail many complex calculations to modify the model and requisite optimization process. S-TRAC 控制策略的獨特之處在於,它不需要系統範圍內的交通動態模型(這種模型的避免是通過使用隨機優化中的一種強大方法來實現的,如下文第 2-4 節所討論的)。S-TRAC 基於神經網絡或其他函數逼近器用於控制功能;不需要交通動態的模型(例如,微分方程組或第二個神經網絡)。因此,在 S-TRAC 中,沒有要求建立描述關鍵交通元素的方程,例如在交通擁堵、天氣相關的駕駛模式變化、可變信息標誌或廣播公告引起的流量變化等情況下,動脈之間的複雜流量互動。數學上描述這些交通系統關鍵元素的極大困難將固有地限制任何需要交通動態模型的控制策略,這是上述 Newell(1989)引用的含義。 與此相關的是,基於系統模型的控制對於與用於構建系統模型的數據中所表示的情況顯著不同的操作交通情況的非穩健性(這種非穩健性有時會導致系統行為不穩定)。此外,即使可以構建出可靠的系統模型,對場景或效能指標(MOE)的變更通常會涉及許多複雜的計算來修改模型和必要的優化過程。
In addition to the above considerations, system-wide control (as defined in the first paragraph) requires that the controller automatically adapt to the inevitable long-term (say, month-to-month) changes in the system. This is a formidable requirement for the current model-based controllers as these long-term changes encompass difficult-to-model aspects such as seasonal variations in flow patterns on all links in the system, long-term construction blockages or lane reconfigurations, changes in the number of residences and/or businesses in the system, etc. In fact, in the context of the Los Angeles traffic system, Rowe (1991) notes that the difficulty in adapting to long-term system changes is a major limitation of current traffic control strategies. By avoiding the need for a system model, however, S-TRAC is able to produce a controller that generates optimal instantaneous (minute-to-minute) signal timings while automatically adapting to long-term (month-tomonth) system changes. 除了上述考量之外,系統範圍的控制(如第一段所定義)要求控制器自動適應系統中不可避免的長期(例如,逐月)變化。這對於當前基於模型的控制器來說是一個艱鉅的要求,因為這些長期變化涵蓋了難以建模的方面,例如系統中所有連結的流量模式的季節性變化、長期的施工阻塞或車道重組、系統中住宅和/或商業數量的變化等。事實上,在洛杉磯交通系統的背景下,Rowe(1991)指出,適應長期系統變化的困難是當前交通控制策略的一個主要限制。然而,通過避免對系統模型的需求,S-TRAC 能夠產生一個控制器,該控制器在自動適應長期(逐月)系統變化的同時,生成最佳的瞬時(逐分鐘)信號時序。
Central to S-TRAC is the use of the simultaneous perturbation stochastic approximation (SPSA) algorithm (Spall, 1992). SPSA provides a highly efficient means for estimating parameters without the need for the gradient of the underlying performance measure (with respect to the parameters being estimated). In the context of control problems, requiring the gradient vector is S-TRAC 的核心是使用同時擾動隨機逼近(SPSA)算法(Spall, 1992)。SPSA 提供了一種高效的手段來估計參數,而無需對基礎性能度量(相對於被估計的參數)進行梯度計算。在控制問題的背景下,要求梯度向量是
tantamount to requiring a model of process (see, e.g. Spall and Cristion, 1994, 1995, 1997 for further discussion). 等同於要求一個過程模型(參見,例如 Spall 和 Cristion,1994,1995,1997 以獲取進一步討論)。
The remainder of this paper is organized as follows: Section 2 presents an overview of the S-TRAC approach, including the relationship between the demand-responsive instantaneous traffic controller and the long-term SPSA training process, and some of the practical issues associated with algorithm initialization and calculation of the measurement of effects. Section 3 discusses the SPSA algorithm for traffic control and Section 4 translates the principles in Section 2 and Section 3 into a step-by-step implementation guide. Section 5 illustrates S-TRAC for a nineintersection network in mid-town Manhattan and Section 6 offers some concluding remarks. 本論文的其餘部分組織如下:第二節介紹 S-TRAC 方法的概述,包括需求響應瞬時交通控制器與長期 SPSA 訓練過程之間的關係,以及與算法初始化和效果測量計算相關的一些實際問題。第三節討論了交通控制的 SPSA 算法,第四節將第二節和第三節的原則轉化為逐步實施指南。第五節展示了 S-TRAC 在曼哈頓中城九個交叉口網絡中的應用,第六節提供了一些結論性評論。
2. OVERVIEW OF S-TRAC CONTROL STRATEGY 2. S-TRAC 控制策略概述
2.1. Summary 2.1. 摘要
S-TRAC is based on developing a mathematical function, say u(∙)u(\bullet), that takes current information on the state of the traffic conditions and produces the timings for all signals in the networks to optimize the performance of the system. (A dot shown here and throughout the rest of this paper as an argument in a mathematical function represents all relevant variables entering the function.) The inputs may include sensor readings from throughout the traffic system and other relevant information such as weather and time-of-day. The output values for each of the signals in the network may be any of the usual timing quantities: e.g. red-green splits, offsets, and cycle times. S-TRAC 是基於開發一個數學函數,例如 u(∙)u(\bullet) ,該函數根據當前的交通狀況信息生成網絡中所有信號的定時,以優化系統的性能。(這裡及本文其餘部分中作為數學函數參數顯示的點表示所有相關變量進入該函數。)輸入可能包括來自整個交通系統的傳感器讀數以及其他相關信息,如天氣和時間。網絡中每個信號的輸出值可以是任何常見的定時量:例如紅綠燈分配、偏移和循環時間。
The traffic control function u(∙)u(\bullet) in S-TRAC is implemented by a neural network (NN) for which the internal NN connection weights are estimated and refined by an on-line training process. These weights will fully define a function that takes sensor information on current traffic conditions and produce the optimal system-wide timings.* It is within these weights that information about the optimal control strategy is embedded. To reflect reality, it is important that the weights contain information to facilitate a response to traffic conditions (including accidents or other incidents). The weights are able to evolve in the long-term (say, month-to-month) in accordance with the inevitable changes in the transportation system. Hence, the values of the weights are absolutely critical to this framework. S-TRAC 中的交通控制功能 u(∙)u(\bullet) 是通過神經網絡(NN)實現的,該神經網絡的內部連接權重是通過在線訓練過程進行估計和精煉的。這些權重將完全定義一個函數,該函數根據當前交通狀況的傳感器信息生成最佳的系統整體時間安排。正是在這些權重中,嵌入了有關最佳控制策略的信息。為了反映現實,權重中包含的信息必須能夠促進對交通狀況(包括事故或其他事件)的反應。這些權重能夠隨著交通系統中不可避免的變化而在長期內(例如,按月)演變。因此,這些權重的值對於這一框架至關重要。
Figure 1 illustrates the overall operation of S-TRAC. The lower loop provides the real-time feedback on traffic conditions for use by the NN controller (with specified weights) in providing real-time signal commands. The upper loop is the weight estimation path that refines the real-time control. This loop operates on a day-to-day basis and can be turned on and off as needed to build the NN controller and to self-tune the controller to long-term changes in the system. At the heart of the upper-loop weight training is the SPSA algorithm ^(†){ }^{\dagger}, which provides a highly efficient and 圖 1 說明了 S-TRAC 的整體運作。下部迴路提供交通狀況的即時反饋,以供 NN 控制器(具有指定權重)用於提供即時信號命令。上部迴路是權重估計路徑,精煉即時控制。此迴路按日運作,並可根據需要開啟和關閉,以建立 NN 控制器並自我調整控制器以適應系統的長期變化。上部迴路權重訓練的核心是 SPSA 算法 ^(†){ }^{\dagger} ,它提供了高效能的
Fig. 1. S-TRAC configuration: relationship between traffic system, controller, and training algorithm. 圖 1. S-TRAC 配置:交通系統、控制器和訓練算法之間的關係。
relatively easy-to-implement means of estimating the NN weights theta\theta on an on-line basis. The use of SPSA in the day-to-day training will be presented in detail in Sections 3 and 4. 相對容易實施的在線估計 NN 權重 theta\theta 的方法。SPSA 在日常訓練中的使用將在第 3 和第 4 節中詳細介紹。
In implementing S-TRAC, a different specific NN structure (number of inputs/outputs, number of weights, etc.) may be chosen to produce controls during each of several periods within a 24 h time-frame. The periods should be chosen so that system-wide traffic patterns are roughly consistent over the period. For example, a 24 h time-frame may be divided into five periods: 5:30 AM9:30 AM, 9:30 AM-3:30 PM, 3:30 PM-7:30 PM, 7:30PM-11:30 PM, and 11:30 PM-5:30 AM, each of which will have a separate NN controller. Hence, the controller illustrated in Fig. 1 pertains to one time period of interest. In principle, it would be possible to have one NN for a full 24 h period, but such a NN may be excessively complex due to the wide variety of traffic conditions over a full day (and further a fixed timing plan may be sufficient for the time periods 9:30 AM3:30 PM and 11:30 PM-5:30 AM). 在實施 S-TRAC 時,可以選擇不同的特定神經網絡結構(輸入/輸出數量、權重數量等)來在 24 小時的時間框架內的幾個時期中產生控制。這些時期應該選擇得使得系統範圍內的交通模式在該時期內大致一致。例如,24 小時的時間框架可以劃分為五個時期:上午 5:30 至上午 9:30、上午 9:30 至下午 3:30、下午 3:30 至下午 7:30、下午 7:30 至晚上 11:30,以及晚上 11:30 至上午 5:30,每個時期將有一個單獨的神經網絡控制器。因此,圖 1 中所示的控制器與一個感興趣的時間段有關。原則上,可以為整個 24 小時的時間段設置一個神經網絡,但由於整天的交通條件變化多端,這樣的神經網絡可能過於複雜(而且固定的時間計劃可能對於上午 9:30 至下午 3:30 和晚上 11:30 至上午 5:30 的時間段已經足夠)。
2.2. Some practical issues 2.2. 一些實際問題
The upper loop (weight training process) in Fig. 1 will continue as long as needed to achieve effective convergence of the weight estimate; convergence is obtained when the MOE has been optimized subject to constraints on road capacity, minimum signal phase length, etc. While the SPSA training is occurring, only minor controller-imposed variations in traffic flow (from what would have occurred based on the previous day’s timing strategy) will be seen, which should be unnoticed by most drivers. After training is complete for a given period, there will be a control function u(∙)u(\bullet) (based on a converged value of weights theta\theta ) that provides optimal signal timings for any specific time within the period given the current traffic conditions.* Note that the training is based on adjacent days having similar mean traffic behavior within the time period of interest (the actual traffic conditions are allowed to vary significantly day-to-day in line with the usual stochastic effects); so, for example, there may be a recursion for weekdays (perhaps with a special ‘tag’ for Friday evenings to accommodate the extra flow if that was significant) and another corresponding recursion (and associated NN ) for weekends/holidays. 圖 1 中的上層迴圈(權重訓練過程)將持續進行,直到達到權重估計的有效收斂;當在道路容量、最小信號相位長度等約束條件下優化了 MOE 時,即可獲得收斂。在 SPSA 訓練期間,交通流量僅會出現輕微的控制器施加變化(與前一天的定時策略相比),大多數駕駛者應該不會注意到這些變化。 在給定的時間段內,訓練完成後將會有一個控制功能 u(∙)u(\bullet) (基於收斂的權重值 theta\theta )提供最佳信號時機,以應對當前的交通狀況。* 請注意,訓練是基於相鄰天數在感興趣的時間段內具有相似的平均交通行為(實際的交通狀況可以根據通常的隨機效應在日常中顯著變化);因此,例如,可能會有一個針對工作日的遞歸(也許對星期五晚上有一個特殊的“標籤”以適應如果流量顯著的額外流量),以及另一個相應的遞歸(和相關的神經網絡)針對週末/假期。
As part of the training process, an initial set of values (prior to running SPSA) must be chosen for the NN weights (these yield the control strategy on 'day 0 ’ of the training process). It will generally be desirable to initialize the weights to produce a NN control with the same timing strategy as the traffic system had in place prior to the implementation of S-TRAC (this allows STRAC to take advantage of the ‘tuning’ and prior information embedded in the prior strategy). For a fixed time-of-day strategy, this is straightforward, though the specification of ‘bias weights’ on the NN output (with other weights, except those linking time-of-day if that input is used, zeroed out). For a demand-responsive prior strategy, one could use current and recent-past data on traffic flow and corresponding (flow dependent) signal timings in conjunction with standard (‘off the shelf’) back-propagation-type software. This will generate a NN controller that is able to reproduce the timing strategy embedded in these data. Then the SPSA optimization process will begin with that strategy and improve from there. We must emphasize that this off-line analysis is done only to initialize the weights in the algorithm. Alternatively (or supplementarily) ‘pseudo historical’ data could be generated by running traffic simulations (say, based on the wellknown U. S. Federal Highway Administration-sponsored TRAF software collection) together with corresponding ‘reasonable’ (flow-dependent) signal timings. These pseudo historical data could then be used with back-propagation (as with the real historical data) to generate the initial weights. 作為訓練過程的一部分,必須為神經網絡權重選擇一組初始值(在運行 SPSA 之前),這些值在訓練過程的“第 0 天”產生控制策略。通常希望將權重初始化為產生與交通系統在實施 S-TRAC 之前相同的時序策略的神經網絡控制(這使得 S-TRAC 能夠利用先前策略中嵌入的“調整”和先前信息)。對於固定的時間策略,這是直接的,儘管在神經網絡輸出上指定“偏置權重”(其他權重,除了那些連接時間的權重,如果使用該輸入,則歸零)。對於需求響應的先前策略,可以使用當前和最近的交通流量數據以及相應的(流量依賴的)信號時序,結合標準的(“現成的”)反向傳播類型軟件。這將生成一個能夠重現這些數據中嵌入的時序策略的神經網絡控制器。然後,SPSA 優化過程將以該策略開始並從中改進。 我們必須強調,這種離線分析僅用於初始化算法中的權重。或者(或補充地)可以通過運行交通模擬(例如,基於美國聯邦公路管理局贊助的 TRAF 軟件集合)來生成“偽歷史”數據,並配合相應的“合理”(流量依賴)信號時序。然後,這些偽歷史數據可以與反向傳播(如同真實歷史數據)一起使用,以生成初始權重。
One appealing feature in using simulations for initialization is that it is possible to introduce ‘incidents’ (accidents, break-downs, special events, etc.) that may not have been encountered in other initialization information (e.g. historical data); having this incident information embedded in 使用模擬進行初始化的一個吸引人特點是,可以引入“事件”(事故、故障、特殊事件等),這些事件可能在其他初始化信息(例如歷史數據)中未曾遇到;將這些事件信息嵌入
the initial weights may help the real-time NN controller cope with similar incidents in real operations after day 0 . It is not required that all possible incident scenarios be introduced in the simulation since the NN (in principle, at least) can interpolate to unencountered incidents if the initialization information contains a reasonable variety of plausible incidents. 初始權重可能有助於實時神經網絡控制器在第 0 天後應對真實操作中的類似事件。並不要求在模擬中引入所有可能的事件場景,因為神經網絡(至少在原則上)可以對未遇到的事件進行插值,只要初始化信息包含合理多樣的可信事件。
Periodically, after effective convergence for theta\theta has been achieved (and the controller is operating without the use of SPSA, i.e. the upper loop in Fig. 1 is disconnected), the training should be turned ‘on’ in order to adapt the weights to the inevitable long-term changes in the traffic system and flow patterns. (The reason that it is not recommended to run training continuously day-to-day is that when the training is operative, the weight values theta\theta used in the controller are slightly perturbed from those that the algorithm has currently found to be optimal.) This updating can be done relatively easily without the need to do the expensive and time-consuming off-line modeling that is required for standard model-based approaches to traffic control (e.g. in the context of the Los Angeles traffic system, Rowe (1991) points out that the adaptation to long-term changes is not done as frequently as necessary because of the high costs and extreme difficulty involved). Whether the training in SPSA is ‘on’ or ‘off’ should be invisible to most drivers. 定期地,在達到有效收斂後(且控制器在不使用 SPSA 的情況下運行,即圖 1 中的上層迴路已斷開),應該將訓練開啟,以便調整權重以適應交通系統和流量模式中不可避免的長期變化。(不建議每天持續進行訓練的原因是,當訓練運行時,控制器中使用的權重值 theta\theta 會與算法目前找到的最佳值略有偏差。)這種更新可以相對容易地進行,而無需進行標準基於模型的交通控制所需的昂貴且耗時的離線建模(例如,在洛杉磯交通系統的背景下,Rowe(1991)指出,由於涉及的高成本和極大困難,對長期變化的適應並不如必要那樣頻繁)。SPSA 中的訓練是“開”還是“關”應該對大多數駕駛者來說是不可見的。
3. THE MATHEMATICAL ALGORITHM: SPSA-BASED TRAINING 3. 數學演算法:基於 SPSA 的訓練
The above discussion outlines how NN functions for real-time traffic control can be constructed by setting up a recursion that iterates on a day-to-day basis for a fixed time period. The discussion here will provide the mathematical form of the recursion. Given the set of weights to be determined, we let hat(theta)_(k)\hat{\theta}_{k} denote the estimate of theta\theta at the kk th iteration of the SPSA algorithm. The aim of the SPSA algorithm is to find that set of weight values that minimizes some ‘loss function’, which is directly related to optimizing the MOE. Mathematically, this is equivalent to finding a weight value such that the gradient of the loss function with respect to the weights is zero. However, since we are not assuming a model for the traffic dynamics, it is not possible to compute this gradient for use in standard NN optimization procedures such as backpropagation. 上述討論概述了如何通過設置一個在固定時間段內逐日迭代的遞歸來構建用於實時交通控制的神經網絡(NN)功能。這裡的討論將提供遞歸的數學形式。給定要確定的權重集,我們讓 hat(theta)_(k)\hat{\theta}_{k} 表示 SPSA 算法在第 kk 次迭代中對 theta\theta 的估計。SPSA 算法的目標是找到一組權重值,以最小化某個“損失函數”,該函數與優化 MOE 直接相關。在數學上,這等同於找到一個權重值,使得損失函數對權重的梯度為零。然而,由於我們並不假設交通動態的模型,因此無法計算這個梯度以用於標準的神經網絡優化程序,如反向傳播。
The SPSA algorithm is based on forming a succession of highly efficient approximations to the uncomputable gradient of the loss function in the process of finding the optimal weights. The SP gradient approximation used in SPSA only requires observed values of the system (e.g. loop detector counts, traffic queues, wait times, pollutant emission readings, etc.). The theoretical and numerical properties of the SPSA algorithm are thoroughly described in Spall (1992). The high efficiency of SPSA relative to competing (gradient-free) SA algorithms is established in Spall (1992) and Chin (1993, 1997). The application of SPSA to NN controller design has been considered in Spall and Cristion (1994, 1995, 1997). (The theoretical properties related to algorithm convergence in Spall (1992) provide a guarantee that SPSA will work properly in a wide variety of practical conditions; this contrasts with many other algorithms proposed for adaptive traffic control, which are ad hoc and have only been demonstrated on a limited set of test cases.) The SPSA algorithm for estimating theta\theta has the form: SPSA 演算法基於形成一系列高效的近似值,以計算損失函數的不可計算梯度,從而尋找最佳權重。SPSA 中使用的 SP 梯度近似僅需要系統的觀察值(例如,迴路探測器計數、交通排隊、等待時間、污染物排放讀數等)。SPSA 演算法的理論和數值特性在 Spall (1992) 中有詳細描述。SPSA 相對於競爭的(無梯度)SA 演算法的高效率在 Spall (1992) 和 Chin (1993, 1997) 中得到了證實。SPSA 在神經網絡控制器設計中的應用已在 Spall 和 Cristion (1994, 1995, 1997) 中考慮過。(Spall (1992) 中與演算法收斂相關的理論特性保證了 SPSA 在各種實際條件下能正常運作;這與許多為自適應交通控制提出的其他演算法形成對比,後者是臨時性的,僅在有限的測試案例中得到驗證。)用於估計 theta\theta 的 SPSA 演算法形式為:
where a_(k)a_{k} is a scalar gain coefficient and hat(g)_(k)( hat(theta)_(k))\hat{g}_{k}\left(\hat{\theta}_{k}\right) is the SP gradient estimate at theta= hat(theta)_(k)\theta=\hat{\theta}_{k}. Note that eqn (1) states that the new estimate of theta\theta is equal to the previous estimate plus an adjustment that is proportional to the negative of the gradient estimate. The initial value hat(theta)_(0)\hat{\theta}_{0} may be chosen according to the discussion of subsection 2.2. 其中 a_(k)a_{k} 是一個標量增益係數,而 hat(g)_(k)( hat(theta)_(k))\hat{g}_{k}\left(\hat{\theta}_{k}\right) 是在 theta= hat(theta)_(k)\theta=\hat{\theta}_{k} 的 SP 梯度估計。注意,方程 (1) 表示 theta\theta 的新估計等於先前的估計加上一個與梯度估計的負值成比例的調整。初始值 hat(theta)_(0)\hat{\theta}_{0} 可以根據第 2.2 小節的討論來選擇。
To calculate the most critical part of eqn (1)-i.e. the gradient approximation hat(g)_(k)(theta)\hat{g}_{k}(\theta) for any theta\theta we must define an underlying loss function L(theta)L(\theta). This loss function is directly related to the MOE, and mathematically expresses the MOE criteria. The form of L(theta)L(\theta) reflects the particular system aspects to be optimized and/or the relative importance to put on optimizing several criteria at once (e.g. mean queue length or wait times at intersections, traffic flow along certain arteries, pollutant emissions, etc.). Because of the variety of MOE criteria considered in practice, the specific form of L(theta)L(\theta) will be allowed to be flexible in this paper. An example loss function might be a standard quadratic measure such as 為了計算方程式 (1) 中最關鍵的部分,即對於任何 theta\theta 的梯度近似 hat(g)_(k)(theta)\hat{g}_{k}(\theta) ,我們必須定義一個基礎損失函數 L(theta)L(\theta) 。這個損失函數與 MOE 直接相關,並在數學上表達了 MOE 標準。 L(theta)L(\theta) 的形式反映了需要優化的特定系統方面和/或在同時優化幾個標準時所需賦予的相對重要性(例如,平均排隊長度或交叉口的等待時間、某些幹道的交通流量、污染物排放等)。由於在實踐中考慮的 MOE 標準多種多樣,本文將允許 L(theta)L(\theta) 的具體形式保持靈活。一個例子損失函數可能是一個標準的二次度量,例如
L(theta)=E[x^(T)x∣theta]L(\theta)=E\left[x^{T} x \mid \theta\right]
where 哪裡
E(∙∣theta)E(\bullet \mid \theta) denotes an expected value conditional on the set of controls with weights theta\theta; E(∙∣theta)E(\bullet \mid \theta) 表示在權重 theta\theta 的控制集條件下的期望值;
xx represents the system state vector, e.g. vector of mean queue lengths or mean vehicle wait times at all intersections within the time period of interest (the state depends on theta\theta through the fact that the control used in affecting the state xx depends on theta\theta ). xx 代表系統狀態向量,例如在感興趣的時間段內所有交叉口的平均隊列長度或平均車輛等待時間的向量(狀態依賴於 theta\theta ,因為影響狀態的控制 xx 依賴於 theta\theta )。
Given a definition of the loss function (as derived from the MOE), the critical step in implementing the SPSA algorithm in eqn (1) is to determine the gradient estimate hat(g)_(k)(theta)\hat{g}_{k}(\theta) of any value of theta\theta. This embodies the key and unique technical contribution of our approach since hat(g)_(k)(theta)\hat{g}_{k}(\theta) does not require a complete model for the system-wide traffic dynamics. Assuming that theta\theta is pp-dimensional, the gradient estimate at any theta\theta has the form 根據損失函數的定義(如從 MOE 推導而來),在公式(1)中實現 SPSA 算法的關鍵步驟是確定任何 theta\theta 值的梯度估計 hat(g)_(k)(theta)\hat{g}_{k}(\theta) 。這體現了我們方法的關鍵和獨特技術貢獻,因為 hat(g)_(k)(theta)\hat{g}_{k}(\theta) 不需要系統範圍內交通動態的完整模型。假設 theta\theta 是 pp 維的,則在任何 theta\theta 處的梯度估計具有以下形式
where hat(L)(∙)\hat{L}(\bullet) denotes an observed (sample) value of L(∙),Delta_(k)=(Delta_(ki),Delta_(k2),dots,Delta_(kp))L(\bullet), \Delta_{k}=\left(\Delta_{k i}, \Delta_{k 2}, \ldots, \Delta_{k p}\right) is a user-generated vector of random variables that satisfy certain important regularity conditions, Spall (1992), Spall and Cristion (1994, 1995, 1997); having Delta_(ki)=+-1AA k,i\Delta_{k i}= \pm 1 \forall k, i with probability 1//21 / 2 of each outcome satisfies these conditions and is used in the study of Section 5 below), and c_(k)c_{k} is a small positive number. Note that the numerators in the pp components of hat(g)_(k)(theta)\hat{g}_{k}(\theta) are identical; only the denominators change. Hence, to compute hat(g)_(k)(theta)\hat{g}_{k}(\theta), one only needs two values of hat(L)(∙)\hat{L}(\bullet) independent of the dimension pp. Note also that SPSA (as a stochastic approximation algorithm) is designed specifically to deal with day-to-day stochastic variations in traffic conditions. The mathematical manifestation of this property is that SPSA will converge even though hat(L)(∙)!=L(∙)\hat{L}(\bullet) \neq L(\bullet) in general. 其中 hat(L)(∙)\hat{L}(\bullet) 表示 L(∙),Delta_(k)=(Delta_(ki),Delta_(k2),dots,Delta_(kp))L(\bullet), \Delta_{k}=\left(\Delta_{k i}, \Delta_{k 2}, \ldots, \Delta_{k p}\right) 的觀察(樣本)值,L(∙),Delta_(k)=(Delta_(ki),Delta_(k2),dots,Delta_(kp))L(\bullet), \Delta_{k}=\left(\Delta_{k i}, \Delta_{k 2}, \ldots, \Delta_{k p}\right) 是滿足某些重要正則性條件的用戶生成隨機變量向量,Spall (1992),Spall 和 Cristion (1994, 1995, 1997);擁有 Delta_(ki)=+-1AA k,i\Delta_{k i}= \pm 1 \forall k, i 的每個結果的概率為 1//21 / 2 滿足這些條件,並在下面第 5 節的研究中使用),而 c_(k)c_{k} 是一個小的正數。請注意, hat(g)_(k)(theta)\hat{g}_{k}(\theta) 的 pp 組件中的分子是相同的;只有分母改變。因此,計算 hat(g)_(k)(theta)\hat{g}_{k}(\theta) 時,只需要兩個與維度 pp 無關的 hat(L)(∙)\hat{L}(\bullet) 值。還要注意,SPSA(作為隨機近似算法)專門設計用來處理日常交通條件中的隨機變化。這一特性的數學表現是,即使 hat(L)(∙)!=L(∙)\hat{L}(\bullet) \neq L(\bullet) 一般情況下,SPSA 也會收斂。
The SPSA approach is in contrast to the standard approach for approximating gradients (the ‘finite-difference’ method), which requires 2p2 p values of hat(L)(∙)\hat{L}(\bullet), each representing a positive or negative perturbation of one element of theta\theta with all other elements held fixed. In the context of traffic control, each value of hat(L)(∙)\hat{L}(\bullet) represents data collected during one time period (within one 24 h period). For traffic control, the dimension pp is at least as large as the total number of factors to be controlled within the traffic system (e.g. in a system with 100 signals and an average of four control factors per light, p >= 400p \geq 400 ). Hence, the SPSA method is easily two to three orders of magnitude more efficient than the standard finite-difference method in finding the optimal weights for most realistic traffic settings. Theory in Spall (1992) and Chin (1993,1997)(1993,1997) rigorously justifies this gain in efficiency. (In particular, it is shown that the SPSA method and the finite-difference method achieve a given level of accuracy in estimating theta\theta in the same number of iterations, which translates into a pp-fold total savings in hat(L)(∙)\hat{L}(\bullet) evaluations since each iteration of SPSA requires only 1//p1 / p the number of hat(L)(∙)\hat{L}(\bullet) evaluations as finite-difference.) SPSA 方法與標準的梯度近似方法(“有限差分”方法)形成對比,後者需要 2p2 p 值的 hat(L)(∙)\hat{L}(\bullet) ,每個值代表對 theta\theta 的一個元素的正或負擾動,而所有其他元素保持不變。在交通控制的背景下,每個 hat(L)(∙)\hat{L}(\bullet) 的值代表在一個時間段內收集的數據(在一個 24 小時內)。對於交通控制,維度 pp 至少與交通系統中要控制的因素總數一樣大(例如,在一個有 100 個信號燈且每個燈平均有四個控制因素的系統中, p >= 400p \geq 400 )。因此,SPSA 方法在尋找大多數現實交通設置的最佳權重時,效率通常比標準的有限差分方法高出兩到三個數量級。Spall (1992) 和 Chin (1993,1997)(1993,1997) 的理論嚴格證明了這一效率增益。 (特別是,顯示 SPSA 方法和有限差分方法在相同的迭代次數中達到對 theta\theta 的估計的給定準確度,這轉化為在 hat(L)(∙)\hat{L}(\bullet) 評估中總共節省了 pp 倍,因為 SPSA 的每次迭代只需要 1//p1 / p 次 hat(L)(∙)\hat{L}(\bullet) 評估的數量,作為有限差分。)
4. STEP-BY-STEP IMPLEMENTATION OF SPSA TRAINING ALGORITHM FOR S-TRAC 4. S-TRAC 的 SPSA 訓練演算法逐步實施
Let us now present a step-by-step summary of how the SPSA algorithm in eqns (1) and (3) would be implemented to achieve optimal traffic control in the system-wide setting. This summary pertains to building up the controller (i.e. estimating a theta\theta ) for one time period, as illustrated in Fig. . 1 above. Obviously, the same procedure would apply in the other periods. Starting with some hat(theta)_(0)\hat{\theta}_{0} (see the discussion in subsection 2.2) the step-by-step procedure for updating hat(theta)_(k)\hat{\theta}_{k} to hat(theta)_(k+1)\hat{\theta}_{k+1} is: 現在讓我們逐步總結如何在系統範圍內實現最佳交通控制的 SPSA 算法(方程(1)和(3))的實施。這個總結涉及到為一個時間段建立控制器(即估計 theta\theta ),如上圖 1 所示。顯然,其他時間段也會適用相同的程序。從某些 hat(theta)_(0)\hat{\theta}_{0} 開始(見第 2.2 小節的討論),更新 hat(theta)_(k)\hat{\theta}_{k} 到 hat(theta)_(k+1)\hat{\theta}_{k+1} 的逐步程序是:
Given the current weight vector estimate hat(theta)_(k)\hat{\theta}_{k}, change all values to hat(theta)_(k)+c_(k)Delta_(k)\hat{\theta}_{k}+c_{k} \Delta_{k} where c_(k)c_{k} and Delta_(k)\Delta_{k} satisfy conditions in Spall (1992) or Spall and Cristion (1994, 1995, 1997). 給定當前的權重向量估計 hat(theta)_(k)\hat{\theta}_{k} ,將所有值更改為 hat(theta)_(k)+c_(k)Delta_(k)\hat{\theta}_{k}+c_{k} \Delta_{k} ,當 c_(k)c_{k} 和 Delta_(k)\Delta_{k} 滿足 Spall (1992) 或 Spall 和 Cristion (1994, 1995, 1997) 中的條件時。
Throughout the given time period, use a NN control u(theta,∙)u(\theta, \bullet) with weights theta= hat(theta)_(k)+c_(k)Delta_(k)\theta=\hat{\theta}_{k}+c_{k} \Delta_{k}. Inputs to u(theta,∙)u(\theta, \bullet) at any time within the period include current and recent past state information (e.g. queues at intersections), previous controls (signal parameter settings), time-ofday, weather, etc. 在給定的時間範圍內,使用一個 NN 控制 u(theta,∙)u(\theta, \bullet) 及權重 theta= hat(theta)_(k)+c_(k)Delta_(k)\theta=\hat{\theta}_{k}+c_{k} \Delta_{k} 。在此期間的任何時刻, u(theta,∙)u(\theta, \bullet) 的輸入包括當前和最近的過去狀態信息(例如,交叉口的排隊情況)、先前的控制(信號參數設置)、時間、天氣等。
Monitor system throughout time period (and possibly slightly thereafter) and form sample loss function hat(L)( hat(theta)_(k)+c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}+c_{k} \Delta_{k}\right) based on observed system behavior. For example, with the loss function in eqn (2), we have 在整個時間段內(可能稍後也會)監控系統,並根據觀察到的系統行為形成樣本損失函數 hat(L)( hat(theta)_(k)+c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}+c_{k} \Delta_{k}\right) 。例如,使用方程(2)中的損失函數,我們有
hat(L)( hat(theta)_(k)+c_(k)Delta_(k))=x^(T)x\hat{L}\left(\hat{\theta}_{k}+c_{k} \Delta_{k}\right)=x^{T} x
where the state values are based on the controls u( hat(theta)_(k)+c_(k)Delta_(k),∙)u\left(\hat{\theta}_{k}+c_{k} \Delta_{k}, \bullet\right) used throughout the period (a possible state vector might include the queues of all intersections over a set of sampling time in the overall time periods). 其中狀態值基於整個期間使用的控制 u( hat(theta)_(k)+c_(k)Delta_(k),∙)u\left(\hat{\theta}_{k}+c_{k} \Delta_{k}, \bullet\right) (一個可能的狀態向量可能包括在整個時間段內所有交叉口的隊列,根據一組取樣時間)。
4. During the same time period on following like day (e.g. weekday after weekday), repeat steps 1-31-3 with hat(theta)_(k)-c_(k)Delta_(k)\hat{\theta}_{k}-c_{k} \Delta_{k} replacing hat(theta)_(k)+c_(k)Delta_(k)\hat{\theta}_{k}+c_{k} \Delta_{k}. Form hat(L)( hat(theta)_(k)-c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}-c_{k} \Delta_{k}\right). 4. 在相同的時間段內,在接下來的類似日子(例如,工作日之後的工作日),重複步驟 1-31-3 ,用 hat(theta)_(k)-c_(k)Delta_(k)\hat{\theta}_{k}-c_{k} \Delta_{k} 替換 hat(theta)_(k)+c_(k)Delta_(k)\hat{\theta}_{k}+c_{k} \Delta_{k} 。形成 hat(L)( hat(theta)_(k)-c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}-c_{k} \Delta_{k}\right) 。
5. With the quantities computed in steps 3 and 4, hat(L)( hat(theta)_(k)+c_(k)Delta_(k))4, \hat{L}\left(\hat{\theta}_{k}+c_{k} \Delta_{k}\right) and hat(L)( hat(theta)_(k)-c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}-c_{k} \Delta_{k}\right), form the SP gradient estimate in eqn (3) and then take one iteration of the SPSA algorithm in eqn (1) to update the value of hat(theta)_(k)\hat{\theta}_{k} to hat(theta)_(k+1)\hat{\theta}_{k+1}. 5. 使用在步驟 3 和 4, hat(L)( hat(theta)_(k)+c_(k)Delta_(k))4, \hat{L}\left(\hat{\theta}_{k}+c_{k} \Delta_{k}\right) 及 hat(L)( hat(theta)_(k)-c_(k)Delta_(k))\hat{L}\left(\hat{\theta}_{k}-c_{k} \Delta_{k}\right) 中計算的數量,形成方程 (3) 中的 SP 梯度估計,然後在方程 (1) 中進行一次 SPSA 算法的迭代,以將 hat(theta)_(k)\hat{\theta}_{k} 的值更新為 hat(theta)_(k+1)\hat{\theta}_{k+1} 。
6. (Optional) During same period on following like day, use a NN control with updated weights theta= hat(theta)_(k+1)\theta=\hat{\theta}_{k+1}. This provides information on performance with current updated weight estimates (no perturbation); this information, is not explicitly used in the SPSA updating algorithm. 6. (可選)在接下來的同一時期的類似日子,使用具有更新權重 theta= hat(theta)_(k+1)\theta=\hat{\theta}_{k+1} 的 NN 控制。這提供了有關當前更新權重估計的性能信息(無擾動);這些信息在 SPSA 更新算法中並未明確使用。
7. Repeat steps 1-61-6 with the new value hat(theta)_(k+1)\hat{\theta}_{k+1} replacing hat(theta)_(k)\hat{\theta}_{k} until traffic flow is approximately optimized (or at least sufficiently improved) based on the chosen MOE. 7. 重複步驟 1-61-6 ,用新值 hat(theta)_(k+1)\hat{\theta}_{k+1} 替換 hat(theta)_(k)\hat{\theta}_{k} ,直到根據所選的 MOE 交通流量大致優化(或至少有足夠改善)。
There are several practical aspects of the above procedure that are worth noting. By initializing the weight vector at a value hat(theta)_(0)\hat{\theta}_{0} that is able to produce the initial signal timings actually in the system (see Section 3), the algorithm will tend to produce signal timings that are between the initial and improved timings while it is in the training phase. Hence, there will likely be no significant control-induced disruption in the traffic system during the training phase. After the weight estimates have effectively converged (so we have a controller that produces improved signal timings for given traffic conditions), the algorithm may be turned ‘on’ or ‘off’ relatively easily without the need to perform detailed off-line modeling. It would, of course, be desirable to turn the algorithm ‘on’ periodically in order to adapt to the inevitable long-term changes in the underlying traffic flow patterns. A further point to note in using SPSA is that there will be some coupling between traffic flows in adjacent time periods within a 24 h time-frame. This is automatically accounted for by the fact that inputs to u(∙)u(\bullet) include previous states and controls (even if they are from the previous period). Hence, even though there are separate SPSA recursions (and neural networks) for each of the time periods, information is passed across periods to ensure true optimal performance. 上述程序有幾個實際方面值得注意。通過將權重向量初始化為能夠產生系統中實際信號時序的值 hat(theta)_(0)\hat{\theta}_{0} (見第 3 節),該算法在訓練階段將傾向於產生介於初始和改進時序之間的信號時序。因此,在訓練階段,交通系統中可能不會有顯著的控制引起的干擾。在權重估計有效收斂後(因此我們擁有一個能夠為給定交通條件產生改進信號時序的控制器),該算法可以相對輕鬆地被打開或關閉,而無需進行詳細的離線建模。當然,定期打開該算法以適應不可避免的基礎交通流模式的長期變化是理想的。使用 SPSA 時還需注意的一點是,在 24 小時的時間框架內,相鄰時間段的交通流之間會有一些耦合。 這是自動考慮到的,因為對 u(∙)u(\bullet) 的輸入包括先前的狀態和控制(即使它們來自前一個時期)。因此,儘管每個時期都有單獨的 SPSA 遞歸(和神經網絡),信息仍然在各個時期之間傳遞,以確保真正的最佳性能。
5. EXAMPLE OF S-TRAC IMPLEMENTATION IN MANHATTAN 5. 曼哈頓 S-TRAC 實施範例
5.1. Introduction 5.1. 介紹
This section illustrates by simulation an application of the S-TRAC approach to a nine-intersection network in mid-town Manhattan, NY. The small-scale realistic example here is intended to be illustrative of the ability of S-TRAC to address larger-scale traffic systems and is not entirely trivial as it considers a congested (saturated) traffic network and includes nonlinear, stochastic effects. The simulation was calibrated based an actual Manhattan traffic data, as discussed in subsection 5.2. 本節通過模擬展示了 S-TRAC 方法在紐約曼哈頓市中心九個交叉口網絡中的應用。這個小規模的現實例子旨在說明 S-TRAC 應對更大規模交通系統的能力,並且並非完全微不足道,因為它考慮了一個擁擠(飽和)的交通網絡,並包括非線性、隨機效應。模擬是基於實際的曼哈頓交通數據進行校準的,如第 5.2 小節所討論。
We are considering control for one 4 h time period and are estimating, across days, the NN weights for the collective set of traffic signal responses to instantaneous traffic conditions during this 4 h period. The software used here is described in detail in Chin and Smith (1994); the simulation was conducted on a Pentium-based PC using C++\mathrm{C}++. The traffic dynamics were simulated using state-space flow equations similar to those in Papageorgiou (1990) or Nataksuji and Kaku (1991) with Poisson-distributed vehicle arrivals at input nodes into the network. Of course, consistent with the fundamental S-TRAC approach as it would be applied in a real system, the controller does not have knowledge of the equations being used to generate the simulated traffic flows. The traffic simulation here is being applied as a surrogate for the real traffic system; SPSA on-line training in a real system would not require a traffic simulation. The controller is constructed via SPSA by the efficient use of small system changes and observation of resulting system performance. Recall that SPSA is explicitly designed to account for stochastic variations in the traffic flow in creating the NN weight estimates. This simulation will illustrate this capability. 我們考慮在一個 4 小時的時間段內進行控制,並估計在這 4 小時期間內,針對瞬時交通狀況的交通信號反應的集體集合的 NN 權重,跨越多天。這裡使用的軟體在 Chin 和 Smith(1994)中有詳細描述;模擬是在基於 Pentium 的 PC 上使用 C++\mathrm{C}++ 進行的。交通動態是使用類似於 Papageorgiou(1990)或 Nataksuji 和 Kaku(1991)中的狀態空間流量方程進行模擬,並在網絡的輸入節點上使用泊松分佈的車輛到達。當然,與在實際系統中應用的基本 S-TRAC 方法一致,控制器並不知道用於生成模擬交通流的方程。這裡的交通模擬被用作真實交通系統的替代;在真實系統中進行的 SPSA 在線訓練不需要交通模擬。控制器是通過 SPSA 構建的,通過有效利用小系統變化和觀察結果系統性能。 請記住,SPSA 明確設計用於考慮交通流中的隨機變化,以創建神經網絡權重估計。這個模擬將說明這一能力。
5.2. The simulated traffic flow and form for NNN N controller 5.2. NNN N 控制器的模擬交通流量和形式
Two studies were conducted for a simulated 90 -day period: one with constant mean Poisson distributed arrival rates over the total period, and another with a 10%10 \% step increase in all mean arrival rates into the network (not including the internal egress discussed below) at day 10 during the total period. In both studies, the simulated traffic network runs between 55th and 57th Streets (North and South) and from 6th Avenue to Madison Avenue (East and West) and therefore includes nine intersections with 5th Avenue as the central artery. Figure 2 depicts the scenario. The time of control covers the 4 h period, from 3:30 PM to 7:30 PM, which represents evening rush time. The technique could obviously be applied to any other period during the day as well. In the 4 h control period several streets have their traffic levels gradually rising and then falling. Their traffic arrival rates increase linearly from non-rush hour rates starting at 3:30 PM The rates peak at 5:30 PM to a rush hour saturated flow condition and then subside linearly until 7:30 PM Backup occurs during some of the 4 h period in the sense that queues do not totally deplete during a green cycle. Nonlinear, flow-dependent driver behavioral aspects are embedded in the simulation. (e.g. the probabilities of turns of intersections are dependent on the congestion levels of the through street and cross street). Some streets have unchanging traffic statistics during the total time period while others have inflow rates from garage-generated egress at the end of office hours from 4:304: 30 PM to 5:30 PM. The simulation and baseline fixed time controller have been extensively tested to ensure that they produce traffic volumes that correspond to actual recorded data for the Manhattan traffic sector as given in Rathi (1988). [A complete discussion of the development and testing of the baseline simulation and the details of its operation are given in Chin and Smith (1994).] 進行了兩項研究,模擬了 90 天的期間:一項是整個期間內平均泊松分佈到達率保持不變,另一項是在總期間的第 10 天,所有平均到達率(不包括下面討論的內部出口)增加了 10%10 \% 步。在這兩項研究中,模擬的交通網絡位於第 55 街和第 57 街(南北方向)之間,並從第六大道延伸到麥迪遜大道(東西方向),因此包括了以第五大道為中心動脈的九個交叉口。圖 2 描繪了這一場景。控制時間涵蓋了 4 小時的期間,從下午 3:30 到下午 7:30,這代表了晚高峰時間。顯然,這項技術也可以應用於一天中的任何其他時段。在這 4 小時的控制期間,幾條街道的交通水平逐漸上升然後下降。它們的交通到達率從下午 3:30 開始以非高峰時段的速率線性增加。到下午 5:30 時,速率達到高峰,進入高峰時段的飽和流量狀態,然後線性下降直到下午 7:30。在這 4 小時的某些時段內,出現了擁堵的情況,因為在綠燈周期內隊列並未完全消耗。 非線性、流量依賴的駕駛行為方面被嵌入模擬中。(例如,交叉口轉彎的概率依賴於主幹道和交叉街道的擁堵水平)。某些街道在整個時間段內的交通統計數據保持不變,而其他街道則在辦公時間結束時從車庫產生的流出率從 4:304: 30 PM 到 5:30 PM。模擬和基準固定時間控制器已被廣泛測試,以確保它們產生的交通量與 Rathi(1988)中提供的曼哈頓交通部門的實際記錄數據相符。[有關基準模擬的開發和測試的完整討論以及其運作細節,請參見 Chin 和 Smith(1994)。]
For S-TRAC, we used a two-hidden-layer, feed-forward NN with 42 input nodes. The 42 NN inputs were (i) the queue levels* at each cycle termination for the 21 traffic queues in the simulation, (ii) the per-cycle vehicle arrivals at the 11 external nodes in the system, (iii) the time from the start of the simulation, and (iv) the nine outputs from the previous control solution. The output layer had nine nodes, one for each signal’s green/red split. The two hidden layers had 12 and 10 nodes, respectively. For this NN , there were a total of 745 NN weights that must be estimated. 對於 S-TRAC,我們使用了一個具有 42 個輸入節點的兩層隱藏層前饋神經網絡。這 42 個神經網絡輸入為 (i) 模擬中 21 個交通排隊在每個週期結束時的隊列水平*,(ii) 系統中 11 個外部節點的每週期車輛到達數,(iii) 從模擬開始的時間,以及 (iv) 來自先前控制解決方案的九個輸出。輸出層有九個節點,每個節點對應一個信號的綠/紅燈分配。兩個隱藏層分別有 12 和 10 個節點。對於這個神經網絡,總共有 745 個神經網絡權重需要估計。
In response to current traffic conditions, the controller determines the green/red split for the succeeding cycle of each of the nine signals in the traffic network. Each signal operates on a fixed 90 s cycle as discussed in Rathi (1988) (in a full implementation of S-TRAC, cycle length for each signal could also be a control variable). The controller operates in a real-time adaptive mode in which its cycle-by-cycle responses to traffic fluctuations are gradually improved, over a period of several days or weeks, based on an MOE (i.e. loss function) consisting of the summed square values of the cycle-traffic-wait time at each intersection over the daily 4 h period. Note that since the underlying MOE for the NN controller weight estimation is based on system-wide traffic data (i.e. data downstream from each traffic signal as well as upstream ) over a several-hour time period, 根據當前的交通狀況,控制器決定交通網絡中九個信號的下一個周期的綠/紅燈分配。每個信號的運行周期為固定的 90 秒,如 Rathi(1988)所討論的(在 S-TRAC 的完整實施中,每個信號的周期長度也可以是一個控制變量)。控制器以實時自適應模式運行,其對交通波動的周期性反應在幾天或幾周的時間內逐漸改善,基於一個由每日 4 小時期間每個交叉口的周期交通等待時間的平方值總和組成的 MOE(即損失函數)。請注意,由於 NN 控制器權重估計的基礎 MOE 是基於系統範圍的交通數據(即每個交通信號下游和上游的數據)在幾小時的時間段內,
Fig. 2. Traffic simulation area (mid-Manhattan). 圖 2. 交通模擬區域(中曼哈頓)。
the effect of signal settings, turning movements, etc. on the future accumulation of traffic at internal queues is factored into the formation of the controller function. (This is an example of how a true system-wide solution would differ from a solution based on combining individual intersection, artery, or zoned solutions on a network-wide basis as done e.g. in SCOOT.) 信號設置、轉向動作等對內部排隊未來交通積累的影響被納入控制器功能的形成中。(這是一個例子,說明真正的系統範圍解決方案如何與基於在整個網絡上結合單個交叉口、幹道或區域解決方案的解決方案(例如在 SCOOT 中所做的)不同。)
5.3. Results 5.3. 結果
The results of our simulation study of the system-wide traffic control algorithm are presented in Fig. 3 (mean arrival rate into the network over the 90 day period does not change) and Fig. 4 (step increased mean arrival rates on day 10 for all artery points into network). The ‘prior’ fixed-time control assumed a green-time/total-cycle-time value of 0.55 for all signals along N-S arteries. This was in the specified range of prior strategies in-place in the Manhattan sector during the recording of actual data (Rathi, 1988). In order to show true learning effects (and not just random chance as from a single realization) the curves in Figs 3 and 4 are based on an average of 100 statistically independent simulations. Every third day for S-TRAC in both figures represented an optional ‘evaluation day’ (step 6 of implementation in Section 4) to demonstrate improved values of the MOE. However, only data from the other 60 ‘training days’ were used in the SPSA algorithm; thus, the adaptive training period could have been reduced to 60 days. 我們的系統範圍交通控制算法的模擬研究結果如圖 3 所示(在 90 天期間內進入網絡的平均到達率不變)和圖 4 所示(第 10 天所有動脈進入網絡的平均到達率增加)。‘先前’的固定時間控制假設所有 N-S 動脈沿線的綠燈時間/總週期時間值為 0.55。這在錄製實際數據期間,曼哈頓區域內先前策略的指定範圍內(Rathi, 1988)。為了顯示真正的學習效果(而不僅僅是單次實現的隨機機會),圖 3 和圖 4 中的曲線基於 100 次統計獨立模擬的平均值。兩個圖中的每第三天代表一個可選的‘評估日’(第 4 節實施的第 6 步),以展示 MOE 的改進值。然而,SPSA 算法僅使用了其他 60 個‘訓練日’的數據;因此,自適應訓練期可以縮短為 60 天。
In Fig. 3, S-TRAC resulted in a net improvement of approximately 10%10 \% relative to the fixed-strategy-controlled system. This reduction in total wait time represents a reasonably large saving with a relatively small investment, particularly for high traffic density sectors. In comparison, major construction changes to achieve a net improvement in traffic flow of 10%10 \% in a well-developed area, such as for the traffic system in mid-Manhattan, would be enormously expensive. The large drop on the first day follows from the introduction of real-time (demand-responsive) control (vs the initial fixed-time strategy). Confidence bonds around the indicated curves that captured 90%90 \% of the daily variation were +-2.8h\pm 2.8 \mathrm{~h} for the prior control strategy and +-5.2h\pm 5.2 \mathrm{~h} for S-TRAC. Note that these bounds do not overlap after the first day, indicating the significance of the improvement offered by S-TRAC. 在圖 3 中,S-TRAC 相對於固定策略控制系統實現了約 10%10 \% 的淨改善。這一總等待時間的減少代表著相對較小的投資下,帶來了相當可觀的節省,特別是在高交通密度的區域。相比之下,在一個發達地區,如中曼哈頓的交通系統,為了實現 10%10 \% 的交通流量淨改善,進行重大建設變更將會非常昂貴。第一天的大幅下降源於實時(需求響應)控制的引入(與最初的固定時間策略相比)。圍繞所示曲線的置信區間捕捉了 90%90 \% 的日常變化,對於之前的控制策略為 +-2.8h\pm 2.8 \mathrm{~h} ,而對於 S-TRAC 則為 +-5.2h\pm 5.2 \mathrm{~h} 。請注意,這些界限在第一天後並不重疊,這表明了 S-TRAC 所提供的改善的重要性。
In the step increase case, Fig. 4 shows a corresponding step increase in total system wait time under the fixed-time (prior) strategy. Under S-TRAC, a step increase also occurred in total system wait time on day 10 , but the wait time continued to decrease without any transient behavior 在階段增加的情況下,圖 4 顯示在固定時間(先前)策略下,總系統等待時間相應地增加了階段。在 S-TRAC 下,第 10 天總系統等待時間也出現了階段增加,但等待時間持續減少,沒有任何瞬態行為。
Fig. 3. System-wide mean wait time for 3:30 PM-7:30 PM period with constant mean arrival rates over 90 days. 圖 3. 系統整體平均等待時間,針對下午 3:30 至 7:30 的時間段,並在 90 天內保持穩定的平均到達率。
Fig. 4. System-wide mean wait time for 3:30 PM-7:30 PM period with increase in mean arrival rates on day 10 . 圖 4. 第 10 天 3:30 PM-7:30 PM 時段系統整體平均等待時間隨到達率增加的情況。
subsequent to this phenomenon, and an approximate 11%11 \% improvement is evident after the 90 -day test period. 在這一現象之後,經過 90 天的測試期後,顯示出約 11%11 \% 的改善。
6. CONCLUDING REMARKS 6. 結論
This paper has discussed S-TRAC for system-wide signal timing. It provides timings in response to instantaneous flow conditions while accounting for the inherent stochastic variations in traffic flow through a powerful stochastic optimization technique. The SPSA optimization technique (Spall, 1992) is critical to the feasibility of the approach since it efficiently provides the values of weight parameters in the neural network for control of signal timings in one of the periods within a 24 h time-frame. S-TRAC makes signal timing adjustments to accommodate to short-term conditions such as congestion, accidents, brief construction blockages, adverse weather, etc. Through SPSA, S-TRAC also has the ability to automatically accommodate to long-term system changes (such as seasonal traffic variations, new residences or businesses, long-term construction projects, etc.) without the cumbersome and expensive off-line remodeling process that has been customary in traffic control. The SPSA training process may be turned ‘on’ or ‘off’ as necessary to adapt to these long-term changes in a manner that would be essentially invisible to the drivers in the system. 本論文討論了用於系統範圍信號定時的 S-TRAC。它根據瞬時流量條件提供定時,同時考慮到交通流量中固有的隨機變化,通過強大的隨機優化技術進行處理。SPSA 優化技術(Spall, 1992)對於該方法的可行性至關重要,因為它有效地提供了神經網絡中用於控制 24 小時時間框架內某一時期信號定時的權重參數值。S-TRAC 會根據短期條件(如擁堵、事故、短暫的施工阻塞、不利天氣等)進行信號定時調整。通過 SPSA,S-TRAC 還能自動適應長期系統變化(如季節性交通變化、新住宅或商業、長期施工項目等),而無需繁瑣且昂貴的離線重建過程,這在交通控制中是慣常做法。SPSA 訓練過程可以根據需要開啟或關閉,以適應這些長期變化,這種變化對系統中的駕駛者來說幾乎是不可見的。
A major issue in modern traffic control is practical implementation and maintainability. In practice, it has been found that most modern computer-based systems are not achieving their full potential as a result of inadequate understanding or commitment on the part of municipal authorities and the associated difficulties in implementation [see, e.g. DeSanto (1996)], which mentions that only two of 24 systems recently surveyed by the U. S. Department of Transportation were operating at their full capability. Approaches currently under development (e.g. OPAC) are even more complex than those currently implemented. On the other hand, S-TRAC avoids much of the complex modeling associated with other modern traffic control approaches (the main practical challenges in S-TRAC are the initialization of the search process and the choice of the NN structure for the controller). Further, S-TRAC may work with any existing sensor implementation provided there is some means of transmitting information between intersections and a central control facility; this contrasts with known model-based approaches (e.g. SCOOT) where additional sensors must be installed. Hence, S-TRAC has the potential to deliver real-time system-wide signal timings in a practically feasible manner. 現代交通控制的一個主要問題是實際實施和可維護性。實踐中發現,大多數現代基於計算機的系統未能充分發揮其潛力,這是由於市政當局對其理解或承諾不足,以及實施過程中的相關困難[參見,例如 DeSanto (1996)],該文提到美國交通部最近調查的 24 個系統中只有兩個在其全部能力下運行。目前正在開發的方法(例如 OPAC)甚至比目前實施的方法更為複雜。另一方面,S-TRAC 避免了與其他現代交通控制方法相關的複雜建模(S-TRAC 的主要實際挑戰是搜索過程的初始化和控制器的 NN 結構選擇)。此外,S-TRAC 可以與任何現有的傳感器實施一起工作,只要在交叉口和中央控制設施之間有某種信息傳輸手段;這與已知的基於模型的方法(例如 SCOOT)形成對比,後者必須安裝額外的傳感器。 因此,S-TRAC 有潛力以實際可行的方式提供實時系統範圍的信號時間。
Acknowledgements-The authors are grateful to Dr Richard H. Smith of the Johns Hopkins University, Applied Physics Laboratory for his help and knowledge of current traffic control systems. This work was supported by a JHU/APL Independent Research and Development Grant and U.S. Navy Contract N00039-95-C-0002. 致謝 - 作者感謝約翰霍普金斯大學應用物理實驗室的理查德·H·史密斯博士對當前交通控制系統的幫助和知識。本研究得到了約翰霍普金斯大學/應用物理實驗室的獨立研究與發展資助以及美國海軍合同 N00039-95-C-0002 的支持。
REFERENCES 參考文獻
Chin, D. C. (1993) Performance of several stochastic approximation algorithms in the multivariate Kiefer-Wolfowitz setting. Proc. of the 25th Symp. on the Interface: Computing Science and Statistics, pp. 289-295. Chin, D. C. (1993) 在多變量 Kiefer-Wolfowitz 設定中幾個隨機逼近算法的性能。第 25 屆介面:計算科學與統計研討會論文集,頁 289-295。
Chin, D. C. (1997) Comparative study of several stochastic approximation algorithms for system optimization based on gradient approximations. IEEE Transactions on Systems, Man, and Cybernetics 27, 244-249. Chin, D. C. (1997) 基於梯度近似的系統優化幾種隨機逼近算法的比較研究。IEEE 系統、人類與控制論學報 27, 244-249。
Chin, D. C. and Smith, R. H. (1994) A traffic simulation for mid-manhattan with model-free adaptive signal control. Proc. of the 1994 Summer Computer Simulation Conf., pp. 296-301. San Diego, CA. Chin, D. C. 和 Smith, R. H. (1994) 一個無模型自適應信號控制的中曼哈頓交通模擬。1994 年夏季計算機模擬會議論文集,頁 296-301。聖地亞哥,加州。
Dell’Olmo, P. and Mirchandani, P. (1995) An approach for real-time coordination of traffic flows on networks. Transportation Research Board Annual Meeting, Paper no. 950837. Washington, DC. Dell’Olmo, P. 和 Mirchandani, P. (1995) 一種實時協調網絡上交通流的方法。交通研究委員會年會,論文編號 950837。華盛頓特區。
DeSanto, R. (1996) Operations and Maintenance of Traffic Control Systems. Technical report: RDS Assoc., Rocky Hill, CT 06067, U.S.A. DeSanto, R. (1996) 交通控制系統的操作與維護。技術報告:RDS Assoc.,洛基山,康涅狄格州 06067,美國。
Funahashi, K. I. (1989) On the approximate realization of continuous mapping by neural networks. Neural Nets. 2, 183192. Funahashi, K. I. (1989) 關於神經網絡對連續映射的近似實現。神經網絡。2, 183192。
Gartner, N. H., Tarnoff, P. J. and Andrews, C. M. (1991) Evaluation of optimized policies for adaptive control strategy. Transportation Research Record 1324, pp. 105-114. Gartner, N. H., Tarnoff, P. J. 和 Andrews, C. M. (1991) 自適應控制策略的最佳化政策評估。交通研究紀錄 1324, 頁 105-114。
Hunt, P. B., Robertson, D. I., Bretherton, R. D. and Winton, R. I. (1981) SCOOT-a traffic responsive method of coordinating signals. Transport and Road Research Lab., Crowthorne, U.K. Hunt, P. B., Robertson, D. I., Bretherton, R. D. 和 Winton, R. I. (1981) SCOOT-一種交通響應的信號協調方法。交通與道路研究實驗室,克羅索恩,英國。
Kelsey, R. L. and Bisset, K. R. (1993) Simulation of traffic flow and control using fuzzy and conventional methods. Fuzzy Logic and Control, eds M. Jamshidi et al., Ch. 12. Prentice Hall, Englewood Cliffs, NJ. Kelsey, R. L. 和 Bisset, K. R. (1993) 使用模糊和傳統方法模擬交通流量和控制。模糊邏輯與控制,編輯 M. Jamshidi 等,章節 12。普倫蒂斯霍爾,恩格爾伍德克利夫斯,新澤西州。
Martin, P. J. and Hockaday, S. L. (1995) SCOOT-an update. ITE Journal, pp. 44-48. 馬丁,P. J. 和霍克戴,S. L. (1995) SCOOT-更新。ITE 期刊,第 44-48 頁。
Messmer, A. and Papageorgiou, M. (1994) Automatic control methods applied to freeway network traffic. Automatica 30 , 691-702. Messmer, A. 和 Papageorgiou, M. (1994) 自動控制方法應用於高速公路網絡交通。Automatica 30 , 691-702.
Nataksuji, T. and Kaku, T. (1991) Development of a self-organizing traffic control system using neural network models. Transportation Research Record 1324, pp. 137-145. Nataksuji, T. 和 Kaku, T. (1991) 使用神經網絡模型開發自我組織的交通控制系統。交通研究紀錄 1324, 頁 137-145。
Newell, G. F. (1989) Theory of Highway Traffic Signals. Institute of Transportation Studies, University of California, Berkeley. Newell, G. F. (1989) 高速公路交通信號理論。加州大學伯克利分校交通研究所。
Papageorgiou, M. (1990) Dynamic modeling, assignment, and route guidance in traffic networks. Transportation ResearchB 24B, 471-495. Papageorgiou, M. (1990) 交通網絡中的動態建模、分配和路徑指導。交通研究 B 24B, 471-495。
Papageorgiou, M., Messmer, A., Azema, J. and Drewauz, D. (1995) A neural network approach to freeway network traffic control. Control Engineering Practice 3, 1719-1726. Papageorgiou, M., Messmer, A., Azema, J. 和 Drewauz, D. (1995) 一種神經網絡方法用於高速公路網絡交通控制。控制工程實踐 3, 1719-1726。
Rathi, A. K. (1988) A control scheme for high traffic density sectors. Transportation Research-B 22B, 88-101. Rathi, A. K. (1988) 高交通密度區域的控制方案。交通研究-B 22B, 88-101。
Ritchie, S. G. (1990) A knowledge-based decision support architecture for advanced traffic management. Transportation Research-A 24A, 27-37. Ritchie, S. G. (1990) 一種基於知識的決策支持架構,用於先進交通管理。交通研究-A 24A, 27-37。
Rowe, E. (1991) The Los Angeles automatic traffic surveillance and control system. IEEE Transactions on Vehicular Tech. 40, 16-20. Rowe, E. (1991) 洛杉磯自動交通監控與控制系統。IEEE 車輛技術期刊。40,16-20。
Santiago, A. J. and Smith, S. E. (1991) Evaluation of the highway capacity manual procedure for signal design. ITE 1991 Compendium of Technical Papers, pp. 239-243. Santiago, A. J. 和 Smith, S. E. (1991) 評估高速公路容量手冊程序對信號設計的影響。ITE 1991 技術論文彙編,第 239-243 頁。
Spall, J. C. (1992) Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control 37, 332-341. Spall, J. C. (1992) 多變量隨機逼近法使用同時擾動梯度逼近。IEEE 自動控制學報 37, 332-341。
Spall, J. C. and Cristion, J. A. (1994) Nonlinear adaptive control using neural networks: estimation with a smoothed simultaneous perturbation gradient approximation. Statistica Sinica 4, 1-27. Spall, J. C. 和 Cristion, J. A. (1994) 使用神經網絡的非線性自適應控制:使用平滑的同時擾動梯度近似進行估計。Statistica Sinica 4, 1-27。
Spall, J. C. and Cristion, J. A. (1995) Model-free control of nonlinear stochastic systems in discrete time. Proc. of the IEEE Conf. on Decision and Control, pp. 2199-2204. Spall, J. C. 和 Cristion, J. A. (1995) 無模型控制離散時間非線性隨機系統。IEEE 決策與控制會議論文集,第 2199-2204 頁。
Spall, J. C. and Cristion, J. A. (1997) A neural network controller for systems with unmodeled dynamics with applications to wastewater treatment. IEEE Transactions on Systems, Man, and Cybemetics-B 27, 369-375. Spall, J. C. 和 Cristion, J. A. (1997) 一種用於具有未建模動力學系統的神經網絡控制器及其在廢水處理中的應用。IEEE 系統、人類與控制論學報-B 27, 369-375。
U.S. Dept. of Transportation (1991) Transyt-7F User’s Guide (Methodology for Optimizing Signal Timing, MOST Vol. 4). 美國交通部(1991)Transyt-7F 使用者指南(信號定時優化方法,MOST 第 4 卷)。
U.S. Dept. of Transportation (1995) Intelligent Transportation System (ITS) Projects, Publication no. FHWA-JPO-95-001. 美國交通部(1995)智能交通系統(ITS)項目,出版號 FHWA-JPO-95-001。
*One notable exception to this would be for those signals along one or more arteries within the system to synchronize the timings, where it is desirable. *這方面的一個顯著例外是系統內一條或多條動脈上的信號同步時間,這是可取的。
*Theory given in, say, Funahashi (1989) shows that any reasonable mathematical function can be approximated to a high level of accuracy by a NN if (and only if) the weights are properly estimated. In our case, the NN is being used to *根據 Funahashi (1989) 提出的理論,任何合理的數學函數都可以通過神經網絡以高精度進行近似,前提是權重被正確估計。在我們的情況下,神經網絡被用來 ^(†){ }^{\dagger} Note that SPSA is fundamentally different from infinitesimal perturbation analysis (IPA) (or other PA approaches) although the algorithms share one word in their names. SPSA uses only loss function evaluations in its optimization while IPA uses the gradient of the loss function. For traffic control problems, requiring the gradient is equivalent to requiring a network-wide model of the system; evaluating the loss function alone does not require a model. The lack of a gradient also precludes the use of such standard NN training algorithms as backpropagation. ^(†){ }^{\dagger} 注意,SPSA 在根本上與無窮小擾動分析(IPA)(或其他 PA 方法)不同,儘管這些算法的名稱中共享一個詞。SPSA 在其優化中僅使用損失函數評估,而 IPA 則使用損失函數的梯度。對於交通控制問題,要求梯度等同於要求系統的全網模型;僅評估損失函數不需要模型。缺乏梯度也排除了使用標準的神經網絡訓練算法,如反向傳播。
*We must emphasize that although there is a fixed value of theta\theta after training is complete, the signal timings given by u(∙)u(\bullet) will generally change throughout the period-possibly on a cycle-to-cycle basis-to adapt to instantaneous fluctuations in traffic conditions, i.e. the function u(∙)u(\bullet) is the same during the time period of interest, but the specific output values of u(∙)u(\bullet) will change during the period as the traffic conditions change. If necessary, this idea can perhaps be made clearer by viewing the NN control u(∙)u(\bullet) with specified weights as analogous to a polynomial function with specified coefficients. For a fixed set of coefficients, the value of the polynomial will change as the value of the independent variable changes. In contrast, a change in the coefficient values represents a change in the polynomial function itself. The former case is analogous to what happens in producing instantaneous controls for a fixed weight vector (the lower loop in Fig. 1) and the latter case is analogous to what happens as the NN undergoes its day-to-day training (the upper loop in Fig. 1). *我們必須強調,儘管在訓練完成後 theta\theta 有一個固定值,但 u(∙)u(\bullet) 給出的信號時序通常會在整個期間內變化——可能是基於循環的變化——以適應交通條件的瞬時波動,即在感興趣的時間段內,函數 u(∙)u(\bullet) 是相同的,但 u(∙)u(\bullet) 的具體輸出值會隨著交通條件的變化而變化。如果有必要,這個想法或許可以通過將具有指定權重的 NN 控制 u(∙)u(\bullet) 視為類似於具有指定係數的多項式函數來使其更清晰。對於一組固定的係數,當自變量的值變化時,多項式的值會改變。相反,係數值的變化代表了多項式函數本身的變化。前者的情況類似於為固定權重向量(圖 1 中的下部循環)產生瞬時控制時發生的情況,而後者的情況則類似於 NN 在日常訓練中發生的情況(圖 1 中的上部循環)。
*The traffic queues were approximated from the assumed travel time, the upstream and downstream loop-counts, the downstream traffic signal phases, and the depletion process. Also, a queue represents the total number of cars on a road sector at each intersection without being further divided into lane counts. *交通排隊是根據假設的旅行時間、上游和下游的循環次數、下游交通信號階段以及耗盡過程進行估算的。此外,排隊代表每個交叉口在某一道路區段上的車輛總數,而不進一步細分為車道數。