Contribution - Multi-agent traffic control using game theory and reinforcement learning

4.4 Summary

4.4.2 Contribution

[Thesis group 2] I developed a SARL method to generate the optimal decisions in a cooperative environment. To improve the traffic flow control efficiency, I then proposed and examined a novel MARL method, i.e., SNASBQ. Finally, I compared RL meth- ods with the game theory-based and constant strategies in Thesis 1. By analyzing the averaged queues and throughput, all the other strategies have a better result than the constant strategy. Although method SARL and SNAQ is worse than the game theory- based strategies in traffic control, the proposed algorithm SSBQ performs similarly to the game theory-based strategies but with less computation.

[Thesis 2.1] I developed a SARL, where the traffic intersection is regarded as a

"central agent", which can control the signal status of all the traffic incoming links. The central agent learns the optimal policy and updates the Q-function with the maximum operator. Thus, the agent takes actions corresponding to the maximum Q-values in a Q-function to make the vehicles pass through the intersection as many as possible.

[Thesis 2.2] I proposed a novel MARL, i.e., SNAQ, that integrates game theory and RL to improve the traffic situation in a non-cooperative environment. It updates the Q- function based on Nash equilibrium of the current Q-values and selects the Nash equi- librium solution with cooperative behaviour due to the informational non-uniqueness in the learning process. Then I extended the SNAQ to a SSBQ algorithm with hierarchical equilibrium solutions, where Nash equilibrium is replaced by Stackelberg equilibrium in the Q-learning process.

Related own publications

(Guo and Harmati,2020c); (Guo and Harmati,2020a); (Guo and Harmati,2020b).

Chapter 5 Traffic Lane-changing Control with a Game Theory-based Decomposition Algorithm

In the lane-changing system with game theory-based approaches, the lane-changing process is formulated as a non-zero-sum non-cooperative or cooperative game, and the vehicles are considered the players. However, almost all proposed models based on game theory are constructed as a two-player game or consider merging only between two traffic lanes, as mentioned in Section1.2.2. In this case, applying such an approach in a scenario where vehicles in multiple traffic lanes have many demands for MLC is limited. Without any doubt, the space complexity of a single game could exponentially rise as the number of players increases, which results in a deficiency in computation.

To fill these gaps, a decomposition algorithm Game Theory-based Decomposition Al- gorithm (GTDA) for more than two players is proposed and examined in this chapter.

This chapter firstly mathematically formulated a novel cellular lane-changing model where the traffic lanes are discretized into cells due to its simplicity. Then, different lane-changing control strategies (i.e., a rule-based approach, a classic Nash equilibrium approach and a proposed novel algorithm - GTDA) are presented, and both MLC and DLC maneuvers are considered to control the traffic flow reasonably and efficiently.

Thirdly, the chapter shows the experimental setup, analyzes the results and compares the performance of all the presented strategies. Finally, the reasons for the presented results, limitations, and potential of the proposed algorithm are discussed.

5.1 Model Formulation

Fig. 5.1 shows the schematic of a basic traffic lane-changing system and the actions which could be taken by the vehicle. The schematic of the traffic lanes system is dis- cretized into cells with the dash lines, and each cell can only contain one vehicle. The length of the cell is determined by the limited speed and safe distance between vehicles, which can be defined distinctively in different application scenarios such as freeway, urban, and traffic zones with different driving rules. Thus, the number of cells in each traffic lane is also determined by the size of the traffic zones. The cell’s width is equiv- alent to the width of a single traffic lane. This lane-changing system can be divided into three sectors, i.e., the incoming queue sector behind the red line, the outgoing sector before the green line and the lane-changing sector in the middle. The incoming queue sector gathers all the waiting vehicles of each lane, defined as incoming queues. The outgoing sector is for passing vehicles with an intended turning. i.e., the vehicles must complete the lane-changing to arrive at the target lane in the lane-changing sector be- fore entering the outgoing sector. As shown in Fig.5.1, four categories of turning lanes are in this system, i.e., U-turn, left-turn for lane 1, left-turn for lane 2, straight for lane 3 and lane 4, and right-turn for lane 5. In the middle, i.e., the lane-changing sector, which can be constructed as a matrix Cel from the perspective of mathematics. Each cell can be coordinated as Cel_i,_j by the index (i.e., x-axis) i∈ΩN ={1,2, . . .,n}(n=5 in this system), and the length of the lane (i.e., y-axis) j∈ΩJ ={1,2, . . .,J}(J =26 in this system). Whether there is a vehicle in the cell can be expressed as follows:

Cel_i,_j=

(0, No vehicle (5.1a)

1, One vehicle (5.1b)

A normal vehicle in the position Cel_2,1 is shown in Fig. 5.1 with all the possible actions (i.e., the green number corresponds to the index of actions in Table5.1). Other vehicles with red marks (i.e., the value of the target lane) will take the corresponding action to arrive at the target lane from the current lane according to the surrounding environment. E.g., the vehicle with the element L in the position Cel_4,3 will have a left-turn, so it approaches lane 2 to complete a MLC maneuver. After arriving at lane 2, it evaluates if a DLC maneuver is needed to change to lane 1 based on the traffic situation. Thus, the value of the target lane can also be stored in a basic character

Fig. 5.1: The schematic of the lane-changing system.

matrix Tag corresponding to the matrix Cel_i,jshown as follows:

Tag_i,j=











T∈ {1} (5.2a)

L∈ {1,2} (5.2b)

S∈ {3,4} (5.2c)

R∈ {5} (5.2d)

where i,j are the same as the index of matrix Cel. Elements such as T: U-turn, L: left- turn, R: right-turn, S: straight are the values of the target lane, which can be converted to a digital number (i.e., the index of the target lane). E.g., element L can be converted to either 1 or 2 according to the traffic situation.

The new coordination of the vehicle and the corresponding value of target lane in the next iteration can be updated as in (5.3) and (5.4):

Cel(t+1) = f(Cel(t),a_1,1[ℓ], . . .,a_i,_j[ℓ], . . . ,a_n,J[ℓ]),∀i∈ΩN,∀j∈ΩJ,∀a[ℓ]∈ΩA

(5.3) Tag(t+1) =g(Tag(t),a_1,1[ℓ], . . . ,a_i,_j[ℓ], . . . ,a_n,J[ℓ]),∀i∈ΩN,∀j∈ΩJ,∀a[ℓ]∈ΩA

(5.4) where Cel(t+1) and Tag(t+1) are the matrixes of the next state, f(·) and g(·) are functions that maps the previous matrixes Cel(t), Tag(t)into the new matrixes Cel(t+ 1), Tag(t+1)with the corresponding joint actions, respectively. In specific, the joint

i-1 i+1

3 1

4 4

i-1 i+1

2 2 5 6 2

i-1 i+1 1 « n

3 2

4 5 6

1 1 3

4 6

(a) (b)

(c) (d)

Fig. 5.2: Collision cases in the lane-changing system. (a) Position collision of mobile vehicles from the same layer. (b) Position collision of mobile vehicles from the different layers. (c) Path collision of mobile vehicles. (d) Boundary collision of mobile vehicles.

actions of all vehicles in the lane-changing sector are (a1,1[ℓ], . . . ,a_i,j[ℓ], . . . ,a_n,J[ℓ]), a_i,_j[ℓ] represents an action a[ℓ] (ℓ is the index of actions in Table 5.1) of the vehicle coordinated in the position(i,j), a[ℓ]∈ΩA ={a[1],a[2], . . .,a[ℓ], . . .,a[z]} is a set of actions of vehicles. The action can also be represented by the lateral velocity with the directional sign le f t :^′−^′ and right :^′+^′, and the longitudinal velocity with the sign up :^′+^′and down :^′−^′shown in Table5.1.

The vehicles close to the outgoing sector have less time to turn to the target lane than those behind them, so they have higher priority when conflicts happen with other vehi- cles falling behind them. At the end of a period, the vehicle’s position in the road map will be updated with the generated strategy. The new variable can be figured out based on that, i.e., queues. The total queues of the lane-changing sector can be represented

Table 5.1: Actions of vehicles Index (ℓ) Action (a[ℓ]) Lateral velocity

(cells/iteration)

Longitudinal velocity (cells/iteration)

1 Turn Left -1 +1

2 Remain 0 +1

3 Turn Right +1 +1

4 Accelerate and Turn Left -1 +2

5 Accelerate 0 +2

6 Accelerate and Turn Right +1 +2

7 Decelerate 0 0

as LV =∑ⁿ_i=1Liindicating how many vehicles are on the lane, where Lirepresents the queue of each lane i, i∈ΩN ={1,2, . . .,n}. The queue for each lane can be expressed as follows:

Li=

∑

n j=1

Celi,j (5.5)

This lane-changing system cannot avoid the existing constraints, i.e., the collision problem. Fig.5.2shows the possible collision cases of the driving vehicles in the traffic lanes. The collision problem can be divided into four cases in this system. Fig. 5.2(a) demonstrates the situation of a position collision between two mobile vehicles from the same layer (i.e., row), e.g., the vehicle on the lane i−1 is going to move to neighbour lane i with Turn Right action a3shown in Table5.1. Meanwhile, the vehicle on the lane i+1 from the same layer also plans to move to the neighbour lane i with Turn Left action a₁, which results in the collision problem at the same position on lane i. In Fig.5.2(b), the position collision cannot be neglected either between the different layers, e.g., one vehicle is accelerating (i.e., actions such as a₄, a₅ and a₆) on the lane i and another vehicle on the adjacent lane is going to the same destination. Fig. 5.2(c) shows the typical example of the path collision, i.e., the pairwise collision could happen on the trajectory where vehicles are driving from different lanes. In Fig. 5.2(d), it is obvious that the first and the last traffic lanes have the boundaries to block the vehicles so they cannot pass through.

In document Multi-agent traffic control using game theory and reinforcement learning (Pldal 72-78)