arXiv:2002.04406v1 [physics.soc-ph] 21 Jan 2020 · 2020. 2. 12. · Tra c Data Imputation using Deep Convolutional Neural Networks Ouafa Benkraouda1, Bilal Thonnam Thodi1, Hwasoo

Traffic Data Imputation using Deep Convolutional Neural Networks

Ouafa Benkraouda1, Bilal Thonnam Thodi1, Hwasoo Yeo3, Monica Menendez4, and Saif EddinJabari?,1,4

1University of Illinois at Urbana-Champaign, Champaign, IL 61820, U.S.A.2New York University Tandon School of Engineering, Brooklyn NY, U.S.A.

3Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea4New York University Abu Dhabi, Saadiyat Island, P.O. Box 129188, Abu Dhabi, U.A.E.

Abstract

We propose a statistical learning-based traffic speed estimation method that uses sparse vehi-cle trajectory information. Using a convolutional encoder-decoder based architecture, we showthat a well trained neural network can learn spatio-temporal traffic speed dynamics from time-space diagrams. We demonstrate this for a homogeneous road section using simulated vehicletrajectories, and then validate it using real-world data from NGSIM. Our results show that withprobe vehicle penetration levels as low as 5%, the proposed estimation method can provide asound reconstruction of macroscopic traffic speeds and reproduce realistic shockwave patterns,implying applicability in a variety of traffic conditions. We further discuss the model’s recon-struction mechanisms and confirm its ability to differentiate various traffic behaviors such ascongested and free flow traffic states, transition dynamics, and shockwave propagation.

Keywords: Convolutional neural networks, data expansion, data imputation, estimation, fil-tering, traffic dynamics, traffic state estimation.

1 Introduction

Recent studies have shown that connected and au-tonomous vehicles (CAVs) can ease traffic flow in-stabilities at low CAV penetration levels [5, 33, 34,39]. To do so, these systems require accurate knowl-edge of traffic conditions. In general, most ad-vanced traffic management and control tools re-quire accurate inputs. However, measurements re-main sparse in today’s road traffic networks andhigh penetration levels of CAVs are not expected inthe coming decade or two. In the meantime, trafficstate reconstruction methods are needed to fill thisgap. Most traffic state estimation techniques in theliterature have focused on reproducing traffic con-ditions at aggregate levels (averaged over segment

? Corresponding author, Email: [email protected]

lengths greater than 100 meters and time intervalsgreater than one minute [23, 28]). Advanced traf-fic management tools (e.g., adaptive traffic signalcontrol) require knowledge of traffic conditions ata much finer scale [9, 17, 19, 42, 43]; see [30] for acomprehensive review of past traffic state estima-tion methods.

We present here a data driven methodology toestimate traffic speeds across a road section atfine spatio-temporal resolutions and using limitedprobe vehicle information. Characterizing traf-fic speed across small intervals of space and timeis especially challenging because of the inherentstochastic nature of traffic flow and noisy sparsetraffic observations [15, 16, 41]. Researchers of-ten resort to different statistical learning methodsto model traffic speed variations from historical

1

arX

iv:2

002.

0440

6v1

[ph

ysic

s.so

c-ph

] 2

1 Ja

n 20

20

[email protected]

datasets [13, 21, 25]. However, the majority of thesemethods only succeed in capturing recurring trafficpatterns such as daily variations or within-day vari-ations, and do not necessarily learn the actual flowdynamics. A few studies have focused on modelingthe sharp discontinuities found in spatio-temporaltraffic dynamics (e.g., caused by transition from freeflow to congested state). Nonetheless, the low inter-pretability of these non-linear models poses reliabil-ity and robustness issues when applied [20, 28, 37].

Various kernel-based estimation schemes havealso been proposed to interpolate spatio-temporalmacroscopic traffic speeds from heterogeneous datasources, mainly, loop detectors and probe vehi-cles [1, 4, 6, 35, 36]. However, these interpola-tion methods need field calibration of static modelparameters (offline or online) such as shockwavespeeds and free flow speeds, which can dynami-cally change depending on the local-traffic condi-tions, leading to biased results.

In this paper, we propose a method for trafficstate reconstruction of traffic speeds dynamics us-ing vehicle trajectory information obtained fromsparse probe data (with as low as 5% penetrationrate). We employ a deep convolutional neural net-work (CNN) [8, 24] to learn the traffic speed dy-namics from a time-space diagram plotted usingpartial observations of vehicle trajectories. Usingan encoder-decoder architecture [26], we developa CNN that learns the local spatial and temporalcorrelations (or features) from the time-space traf-fic speed profiles, and reconstructs the full (macro-scopic) traffic speed dynamics for any given sparsevehicle trajectories. We show that a well traineddeep CNN can learn to reproduce short-term non-recurring traffic features such as congestion wavesand free flow speed dynamics, which state-of-the-art estimation methods often fail to capture. Theproposed learning method can adapt to local trafficspeed conditions, and reproduce observed shock-wave patterns and stop-and-go traffic. We furtherprovide critical insights into the learned neural net-work model to understand the essential featuresdetected while reconstructing the full speed map.Such insights can help unlock some of the black-boxfeatures of neural networks when applied to trafficstate estimation problems.

In the rest of the paper, we formally state the traf-

fic speed estimation problem and motivate the useof an encoder-decoder based convolutional neuralnetwork architecture to learn the spatio-temporaltraffic speeds in Sec. 2. We perform numerical ex-periments using the proposed model on a homo-geneous road section and discuss its efficacy in re-constructing the traffic speed maps on a simulatedscenario, as well as on the Next Generation Simu-lation (NGSIM) dataset in Sec. 3. This is followedby a brief discussion of the model’s interpretabilityand the role played by the hidden layers in the esti-mation of speed dynamics. We conclude the paperwith a discussion of the main insights and suggestpotential future research directions in Sec. 4.

2 Methodology

2.1 Traffic Speed Estimation Problem

Consider a set of vehicles N passing through agiven road section of length X = |X | over time pe-riod T = |T |. Let (xi, ti, vi) denote the coordinatesof vehicle i ∈ N , where xi and vi represent its rela-tive longitudinal position (with respect to the start-ing position of the road section) and speed at time ti.We denote by G := {(xi, ti, vi) | i ∈ N} the set of allN vehicle trajectories. Let Gp ⊆ G represent the tra-jectories of sampled vehicles in N (with samplingpercentage p), within the space-time domain X× T.Note that the cardinalities of G and Gp can vary de-pending on the number of vehicles in the systemand the resolution of vehicle coordinates (the sam-pling cadence).

The estimation problem is to determine an arbi-trary function from the domain of partial trajecto-ries Gp to the full trajectories G (or some functionof G, such as macroscopic traffic speeds or trafficdensity). However, the variable (and not necessar-ily known) sized nature of G (and Gp) presents achallenge, as a unique mapping function may notexist. Hence, we represent G (and Gp) on a two-dimensional space-time plane as plotted vehicle tra-jectories, and color code it with the respective speeddata; see Fig. 1 for an example.

To interpret this graphical representation, we dis-cretize the entire space-time domain into finite re-gions called cells, each of which has dimensions xand t, the cell’s length and time duration, respec-

2

0 25 50Time (secs)

0

200

400

600

800

Spac

e (m

)

0

10

20

30

40

50

60

Spee

d (k

mph

)

(a) All vehicles

0 25 50Time (secs)

0

200

400

600

800

Spac

e (m

)

0

10

20

30

40

50

60Sp

eed

(km

ph)

(b) 10% sampled vehicles

Figure 1: Vehicle trajectories plotted on a space-time di-agram and color coded with speeds (X = 800 m, T = 60sec). We convert these into three-dimensional tensors,which serve as output and input, respectively, to thespeed reconstruction model.

tively. The values in each cell represent the traf-fic state in that cell, defined by a three-scale colorvalue based on the RGB format. The three-scalecolor representation of traffic states allows us toclearly differentiate between the empty cells (wherethere are no vehicles), free flowing cells (where ve-hicles travel at the free-flow speed), and heavilycongested cells (where vehicles cannot move). Forexample, empty cells are represented by a whitecolor with RGB value (255, 255, 255) whereas heav-ily congested cells are represented by a red colorwith RGB value (231, 11, 5). Colors ranging be-tween red and blue represent traffic states in be-tween free-flow and heavily congested. Using thisrepresentation, we encode the vehicle trajectory in-formation as a three-dimensional tensor, denotedby z ∈ {1, . . . , 255}X×T×C, where C is the number of

color channel arrays. Note here that when treatingthe time-space diagrams as images using RGB, thetensor values are restricted to the set {1, . . . , 255}.Let zp denote the respective tensor with partial tra-jectory information, and zf the tensor with full vehi-cle trajectory information. The traffic state estima-tion problem can be stated as one that seeks to de-termine a mapping function g, such that g(zp) = zf,where zp and zf are now fixed sized tensors. Werefer to the mapping function g as the traffic speedreconstruction model.

2.2 Representation of Speed ReconstructionModel

We approximate the speed reconstruction functiong using a neural network model. We chose anencoder-decoder convolutional neural network ar-chitecture [8, 26] to represent the reconstructionmodel as shown in Fig. 2a. The encoder model ge

extracts the primary features from the sparse inputvehicle trajectory tensor zp and maps them into a la-tent space representation h, which can be thought ofas an abstract projection of the sparse-vehicle trajec-tory information contained in zp onto some higherdimensional space (often unintrepretable). The de-coder model, gd, uses the information contained inh to reconstruct the output tensor zf, which in thiscase is the macroscopic traffic speed map.

The encoder model (ge) is a sequence of CNN lay-ers Λ stacked horizontally, where each layer l ∈Λ consists of three operations - convolution, non-linear activation and down-sampling (up-samplingfor the decoder model); see Fig. 2b. The convolutionoperation is performed using a set of layer-specifickernels Θ(l) ≡ [Θ

(l)k ]k∈K(l) , where K(l) denotes the

set of kernels in layer l. Each kernel is a tensorΘ

(l)k ∈ RM×N×C, where M × N is the kernel size

(a hyper-parameter) and C is the number of colorchannels used in the feature maps. The convolutionoperation is given by

z(l),ck = ∑i∈M(l−1)

z(l−1),ini ∗Θ

(l)k ,

k = 1, . . . , K, l ∈ Λ, (1)

where z(l−1),ini is the ith input feature map andM(l−1)

is the set of feature maps in layer l− 1, while z(l),ck is

3

Input zp Encoder model ge Latent h Decoder model gd Output zf

CNN layer CNN layerCNN layer CNN layer

Input zp Encoder model ge Latent h Decoder model gd Output zf

CNN layer CNN layerCNN layer CNN layer

(a) Convolutional Encoder-decoder architecture

(a) Encoder model ge

z2k = Σ z1

k * θfk z4

k ~ p (z3k)z3

l = max (z2k, 0)

z1k z2

kz3

k

z4k

Convolution Non-linear activation

Subsampling

(b) Decoder model gd

z2k = Σ z1

k * θfk z4

k ~ p (z3k)z3

k = max (z2k, 0)

z2k

z1k z3

l

z4k


Upsampling

z3k

zck(l) = Σizsi

(l-1)*θk(l) zsk

(l) = pd(zak(l)) zak

(l) = σ(zck(l))

zsk(l-1) zck

(l) zak(l)

zsk(l)

2D Convolution Non-linear activation Subsampling

z3lzsk

(l-1) zck(l) zak

(l)zsk

(l)

Convolution Non-linear activation Up-sampling

zck(l) = Σizik

(l-1)*θk(l) zak

(l) = σ(zck(l)) zsk

(l) = pu(zak(l))

zk(l),a = σ(zk(l),c)

z(l-1),in z(l),c z(l),a

z(l),dn

Non-linear activation Down-sampling

zk(l),dn = pdn(zk

(l),a)

2D Convolution

zk(l),c = Σizi

(l-1),in * θk(l)

z3l

2D Convolution Non-linear activation Up-sampling

zcon,k(l) = Σizinp,i

(l-1)*θk(l) zact,k

(l) = σ(zcon,k(l)) zup,k

(l) = pu(zact,k(l))

zinp(l-1) zcon

(l) zact(l)

zup(l)

(b) A single CNN layer in the encoder model

(a) Encoder model ge

z2k = Σ z1

k * θfk z4

k ~ p (z3k)z3

l = max (z2k, 0)

z1k z2

kz3

k

z4k


Subsampling

(b) Decoder model gd

z2k = Σ z1

k * θfk z4

k ~ p (z3k)z3

k = max (z2k, 0)

z2k

z1k z3

l

z4k


Upsampling

z3k

zck(l) = Σizsi

(l-1)*θk(l) zsk

(l) = pd(zak(l)) zak

(l) = σ(zck(l))

zsk(l-1) zck

(l) zak(l)

zsk(l)


z3lzsk

(l-1) zck(l) zak

(l)zsk

(l)

Convolution Non-linear activation Up-sampling

zck(l) = Σizik

(l-1)*θk(l) zak

(l) = σ(zck(l)) zsk

(l) = pu(zak(l))

zcon,k(l) = Σizinp,i

(l-1)*θk(l) zsub,k

(l) = pd(zact,k(l)) zact,k

(l) = σ(zcon,k(l))

zinp(l-1) zcon

(l) zact(l)

zsub(l)


2D Convolution Non-linear activation Up-sampling

zk(l),c = Σizi

(l-1),in * θk(l)

zk(l),a = σ(zk

(l),c) zk(l),up = pup(zk

(l),a)

z(l-1),in z(l),c z(l),a z(l),up

(c) A single CNN layer in the decoder model

Figure 2: Representation of the speed reconstructionmodel using a convolutional encoder-decoder neuralnetwork architecture.

the convolved feature map produced using the k ker-nel in layer l. This is followed by a clipping opera-tion using a non-linear activation function σ:

z(l),ak = σ(z(l),ck

), k = 1, . . . , K, l ∈ Λ, (2)

where z(l),ak is the kth activated feature map in layer l.The kernels Θ(l) represent the primary feature de-tectors in layer (l), where each kernel Θ

(l)k detects

distinct patterns (or features) from the input feature

maps z(l),in ≡ [z(l)i ]i∈M(l) . For instance, the initiallayer can detect discrete traffic states such as freeflow and congested traffic. Successive layers detectmore complex patterns such as shockwave propa-gation and transient dynamics. We provide a moredetailed interpretation of the kernels in the Resultsand Discussion section.

In the encoder model (Fig. 2a), the activated fea-ture maps undergo a down-sampling operation toproduce an abstract (lower dimensional) represen-tation of the feature activations as shown below:

z(l),dn = pdn(z(l),a)

= maxm,n

(z(l),a · u(m, n)

), l ∈ Λ, (3)

where u(m, n) is a two-dimensional spatial-window applied to the input tensor z(l),a. Note thatthese depend on the kernel dimensions M × N.The down-sampling (3) returns the largest valueof z(l),a (a.k.a. max pooling function [29]). Notethat (3) reduces the dimension of z(l),a by a factorof (m× n), where (m, n) represents the max-poolingfilter size. The max-pooling operation summarizesthe primary features detected over various spa-tial regions in the time-space diagram using theinformation contained in its smaller sub-regions.

The down-sampling operation can handle thesparsity in the vehicle trajectory information con-tained in the input tensor. We illustrate this us-ing an arbitrary (sparse) input vehicle trajectorydataset, and its corresponding activated featuremap obtained before and after the down-samplingoperation in Fig. 3. The traffic state informationcontained in cells (2, 3) and (3, 2) is absent fromthe activated feature maps (middle part of Fig. 3).However, the down-sampling operation providesan improved estimate of the states of these cells(as free flow and congested states, respectively) byutilizing information contained in neighboring cells(left part of Fig. 3). This partially explains why thespatial distribution of probe vehicles along a roadsection can affect state estimation accuracy [7, 18].The output of the down-sampling operation, z(l),dn

serves as the input into the next layer in the encodermodel ge, or to the hidden state representation h.The decoder model gd also consists of a similar setof layers as ge, except that the down-sampling oper-ation is replaced by an up-sampling operation; see

4

Sparse input trajectory Before sub-sampling After sub-sampling

(1) (2) (3) (4)

(1)

(2)

(3)

(4)

(5)

(6)

(1) (2)

(1)

(2)

(3)

Free-flow Free-flow

Congested Transition

TransitionFree-flow

Figure 3: Illustrating the need for down-sampling tohandle the sparsity in vehicle trajectory information. Thedown-sampling operation provides educated guesses oftraffic states in each spatial region by inference from traf-fic states detected in smaller sub-regions.

Fig. 2b. The most common up-sampling operationis the nearest neighbor function [29], defined as

z(l),up = pup(z(l),a)= z(l),a ⊗ 1m′×n′ , l ∈ Λ, (4)

where ⊗ is the Kronecker product and 1m′×n′ is anm′ × n′ matrix of ones. The dimensions m′ × n′ de-pend on the size of the full feature maps X × T.The up-sampling operation in (4), besides scalingthe activated feature maps by (m′ × n′), enforces aplausible and logical reconstruction of traffic stateby drawing inferences from super-regions. For ex-ample, in left part of Fig. 3, one can reconstruct thetraffic state in its sub-regions by logically looking atthe states in the neighboring regions.

The motivation to use a CNN layer as opposed toa naive feed forward neural network in the encoder-decoder architecture (Fig. 2) is the parameter shar-ing and sparse connectivity properties of CNNs [8].For traffic data, parameter sharing implies that aspecific set of learned features Θ(l) (such as for rec-ognizing the free flow speed, congested traffic, andtransition dynamics) can be used anywhere in thetime-space plane (in other words, it is translation in-variant). Sparse connectivity ensures that the speedestimates in certain regions in space and time heav-ily depend on the states in the immediate (local)surroundings, which is especially true for traffic dy-namics [36]. Moreover, the estimation from sparsedata problem can be thought of as a spatial impu-tation/interpolation problem, to which CNNs are wellsuited.

2.3 Parameter Optimization

The output from the decoder model, gd is the recon-structed speed map denoted as zf

Θ, and is a functionof the kernel set Θ ≡ {Θ(1), . . . , Θ(|Λ|)}. Thus, thetraffic speed estimation problem is cast as a kernelset learning problem:

Θ∗ = arg minΘ

L(zf, zfΘ), (5)

where L denotes a loss function over the esti-mated and actual speed values. Here, (5) is a dif-ferentiable function in Θ(l) ∈ Θ, and any gra-dient based optimization technique can be usedto find the optimal kernel set [8]. Note that (5)only optimizes the kernel set Θ, and does not in-clude other hyper-parameters such as the numberof CNN layers, |Λ|, in the encoder and decodermodel (ge and gd), the number of kernels in eachlayer ({|K(l)|}l∈Λ), the kernel dimensions (M andN), the down-sampling filter dimensions (m and n),and the up-sampling filter dimensions (m′ and n′).These hyper-parameters have to be fine tuned sep-arately.

3 Experiments and Results

Here we demonstrate the proposed speed recon-struction model for a homogeneous road segment(a freeway section) using both simulated data andNGSIM data. By looking at different initial andboundary conditions, we produce various trafficconditions corresponding to free-flow, congested,and stop-and-go traffic. The trajectories of all ve-hicles passing through the segment are recorded ata one second cadence.

3.1 Learning Speed Reconstruction Model fromData

Using the simulated vehicle trajectories (for a 3-lane road that is 800 m long), we generate input-output samples for training the speed reconstruc-tion model, with the following parameters: X = 800m, T = 60 sec, x = 10 m, and τ = 1 sec. The in-put samples are generated using a p = 5% pene-tration rate sampled from an unknown probabilitydistribution. This ensures that the sampling repli-cates actual field observations and no bias in the

5

spatial-distribution of probe vehicles is introduced.The output samples are generated using all vehicletrajectories. In each sample, the vehicle trajectoriesare color-coded using a fixed speed gradient of 0-60kmph, where 0kmph is represented by red and 60kmph is represented by blue; see Fig. 1 for example.These figures are then converted to 80× 60× 3 ten-sor, where 3 corresponds to the RGB channel arrays.

We now test our model using two differentdatasets. The first set is generated from the simu-lated vehicle trajectories for the same road sectionused to train the model but under different traf-fic demand conditions. The second set is producedfrom the NGSIM vehicle trajectory dataset for I-80[27]. This allows us not only to test the transfer-ability of our model, but to validate it against realworld field data, and check its applicability to traf-fic behaviors not seen in the training sets. For bothdatasets we sample 5% of vehicle trajectories as in-put data.

The encoder and decoder models used here con-sists of three CNN layers, and the kernel size, num-ber of kernels, and sampling stride for each layerare shown in Table 1. We use a random searchtechnique to determine the best number of layers,the number of kernels in each layer, and the ker-nel sizes. One can also resort to more sophisticatedapproaches such as Bayesian-optimization to deter-mine the optimal hyper-parameters [32]. For all thelayers we use a rectified linear unit (ReLU) for ac-tivation (defined as ReLU(z) = max(z, 0) for anygiven input z) except for the last layer which hasa Sigmoid activation function (defined as σ(z) =(1 + exp(−z))−1 for any given input z). As wetreat the reconstructed speed maps as images, the(normalized) model outputs are bounded between0 and 1, and this can be obtained using a sigmoidactivation function.

We use a larger kernel size (5 × 5) for the ini-tial and the final CNN layers as opposed to thesmaller (3 × 3) kernels commonly found in tra-ditional image processing architectures (see [24]for instance). This is because our input tensor isvery sparse (only 5-10% of the pixels have val-ues other than 1), and as the convolution opera-tion primarily detects local correlations in the data,smaller kernels may fail to detect essential featuresin the input tensor, can hence produce unrealistic

results. We overcome this by resorting to largerkernels, which when convolved can better learnthe spatio-temporal speed dynamics, especially thebackward propagating shockwave patterns, evenfrom sparsely observed speeds.

Notice that the number of kernels in the middlelayers of the CNN (the bottleneck region) is largerthan the number of kernels in the boundary lay-ers of the CNN (see Table 1). This is attributed tothe feature learning capability of each layer in thenetwork. For instance, kernels in the initial layerscan be considered as primary feature detectors, use-ful for example, to detect colors (corresponding tocongested and free-flowing traffic) and slow mov-ing traffic, which are primitive and can be charac-terized using fewer kernels. However, subsequentlayers use these primary features to capture morecomplex patterns such as free flowing regimes andtransition dynamics, which may require a largernumber of kernels to capture all relevant patterns.We use an iterative optimization algorithm knownas Adam [22] to train the model. Adam optimiza-tion is based on first-order gradient descent with anadaptive learning rate, and is found to have goodconvergence rates in practice. The test results anddiscussion are presented below.

3.2 Results and Discussion

We first provide a visual comparison between theactual vehicle trajectories and the reconstructedspeed map for the test datasets, and quantify the re-construction error using a Structural Similarity In-dex. We then analyze and interpret the model’s re-construction behavior by visualizing what the ker-nels learned from the data and its representation be-havior in the latent space.

3.2.1 Reconstruction Performance

The test results are presented in Figures 4– 6. FromFig. 4, we can see examples where the proposedapproach reproduced the traffic speeds correctly.We observe that the model has learned to recon-struct the actual spatio-temporal speed maps andcan clearly differentiate between the direction andregime of free flow (blue), congested (red), andtransient dynamics (orange, yellow, and cyan). In-

6

Table 1: Hyper-parameters for the speed reconstruction model.

Layers Convolution operation Non-linearity Sampling operation

Input layer Identity - -

Encoder modelCNN-1 Convolution (5× 5× 8) Relu Max-pooling (2× 3)CNN-2 Convolution (5× 5× 32) ReLU Max-pooling (2× 2)CNN-3 Convolution (5× 5× 64) ReLU Max-pooling (2× 2)

Decoder modelCNN-4 Convolution (3× 3× 64) ReLU Nearest neighbour (2× 2)CNN-5 Convolution (5× 5× 32) ReLU Nearest neighbour (2× 2)CNN-6 Convolution (5× 5× 8) ReLU Nearest neighbour (2× 3)

Output layer Convolution (5× 5× 3) σ -

Full Vehicle Trajectories

(i) (ii) (iii) (iv)

t

x

t

x

t

x

t

x

Sample Vehicle Trajectoriest

x

t

x

t

x

t

x

t

x

t

x

t

x

t

x

(v)

t

x

t

x

t

x

Reconstructed Vehicle Trajectories

Figure 4: Selected examples where the model accurately reconstructed the macroscopic speed states (simulationdata).

terestingly, it succeeds in tracing backward prop-agating shockwave patterns by tying together thelimited information obtained from the sparse tra-jectories. In other words, the model is able to recon-

struct shockwave patterns with varying sizes andshapes depending on the local traffic conditions,rather than using a mere interpolation assuming aconstant wave speed [2, 3, 10–12, 14, 34, 40]. Fig. 4

7


(i) (ii) (iii) (iv)

t

x

t

x

t

x

t

x


x

t

x

t

x

t

x

t

x

t

x

t

x

t

x

(v)

t

x

t

x

t

x


Figure 5: Selected examples where the model partially failed to provide a sound reconstruction of traffic states(simulation data).

(i, iv, v) shows a scenario where the model cap-tured the dynamic stop-and-go traffic regime (redregions), which cannot be characterized by a singleshockwave speed.

Moreover, the model cleverly extrapolates theshockwave occurrence using few speed measure-ments obtained from the transient dynamics (or theonset of congestion). This can be clearly seen in thelower pane of Fig. 4 (i, iii), where the probe trajec-tories only capture slowing traffic (cyan color), yetthe model still managed to detect the occurrence ofa shockwave and reconstruct it accordingly. Also,in cases when there was a sufficient number of rep-resentative trajectories, the neural network learnedto reconstruct smaller shockwaves and their dissi-pation behavior as seen from Fig. 4 (i, iv, v).

There are, however, a few cases where themethod did not perform well. Where probe trajec-tories are very sparse, the method produces poor

estimates of the propagation length and dissipationbehavior of the shockwaves. As a result, the recon-structed upstream and downstream jams are eitherexaggerated or missed; see Fig. 5 (i, ii, v). Anothershortcoming is the model’s extrapolation of nonex-istent shockwaves. The model learns that with theoccurrence of any lower speeds (cyan or yellow),a congestion instance will take place, which is notalways true (see Fig. 5 (i, iv)). This is caused byinsufficient information about the local traffic con-ditions. The backward propagation of stop-and-gotraffic depends on the local densities, and if the ob-served trajectories failed to capture this, then themodel may overestimate a shockwave occurrence.Also, if the sample trajectories are very sparse (as inFig. 5 (iii)), the model only succeeds in predictingthe represented parts and fails elsewhere by com-pletely ignoring the section for which no informa-tion exists. Additionally, one can notice the ran-

8


(i) (ii) (iii) (iv)

t

x

t

x

t

x

t

x


x

t

x

t

x

t

x

t

x

t

x

t

x

t

x

(v)

t

x

t

x

t

x


Figure 6: Selected examples from the reconstruction of the NSGIM dataset.

dom reconstruction of white areas which depict theabsence of vehicle trajectories. If the areas are toolarge, the model almost always fails in reconstruct-ing these areas as they don’t follow any specific pat-tern. Notice, however, that the undesirable casesdescribed above are very few, and not representa-tive of the model’s overall performance. They arejust included here for completeness.

In order to assess the model’s estimation preci-sion with respect to real traffic behavior, we nowvalidate it against NGSIM data Fig. 6. The rea-son behind cross validating our model with unseenand real data is to ensure that the encoder-decoderneural network has in fact learned to mimic traf-fic behavior rather than memorizing images sim-ilar to the training data. Results show that themethod successfully captures most of the shock-waves and their varying characteristics. However,since the neural network was only trained on sim-ulation data, it preforms less accurately when esti-

mating shockwaves than in Fig. 4. In real traffic,congestion and free flow patterns depend on unob-servable aspects that a simulator cannot always re-produce, and therefore we can see how the modeldidn’t accurately reconstruct the shockwave prop-agation, for example in Fig. 6 (v). The same ap-plies to the free flow regimes. The method wasnot trained on cases with pure free flow behavior,which explains the encoder-decoder’s poor estima-tion power in uncongested areas Fig. 6 (iii, iv).

To properly quantify the error difference betweenthe reconstructed and the actual traffic speed maps,we use the Structural Similarity Index Method [38],SSIM. Unlike conventional distance functions suchas mean squared error, SSIM captures changes oc-curring in the image’s structural information. It isdefined as:

SSIM(a, b) =(2µaµb + c1)(2σab + c2)

(µ2a + µ2

b + c1)(σ2a + σ2

b + c2), (6)

9

where a and b represent a sliding Gaussian windowwith a size of 11x11, µa and µb are the means of aand b respectively, σ2

a and σ2b are the variances of a

and b, respectively, σab is the covariance of a and b,c1 and c2 are stabilizing variables where c1 = (k1L)2

and c2 = (k2L)2, L is the dynamic range of pixels, k1and k2 are default constants set to 0.01 and 0.03, re-spectively (as defined by [38]). The resulting SSIMindex value ranges between [−1, 1] where 1 and -1indicate perfect similarity and 0 indicates no struc-tural similarity. We compute the SSIM for the freeflow and congested regimes separately and the re-sults are shown in Fig. 7 for the validation dataset.

0.0 0.2 0.4 0.6 0.8 1.0Structural Similarity Index

0

25

50

75

100

125

150

175

coun

t

SSIM Summary

Free Flow RegimeCongestion Regime

(a) Simulation data

0.0 0.2 0.4 0.6 0.8 1.0Structural Similarity Index

0

20

40

60

80

100

120

coun

t

SSIM Summary

Free Flow RegimeCongestion Regime

(b) NGSIM data

Figure 7: Histogram showing similarity indices be-tween the actual and reconstructed spatiotemporal mapsfor the congested and free flow regimes.

We observe that the average SSIM for simula-

tion data in the congested and free flow regimesare 0.6 and 0.5 respectively (Fig. 7a), implying themodel’s strong reconstruction power. On the otherhand, the average SSIM values for the NGSIM dataare lower with 0.4 and 0.45 for congestion and freeflow regimes respectively (Fig. 7b). However, hav-ing SSIMcong > SSIMfree in both cases, indicates thatthe model is better at estimating slow-moving traf-fic than free flowing traffic. This, however, couldprobably be changed if the model was better trainedwith pure free flow behavior.

3.2.2 Interpreting the Speed Reconstruction Model

We provide here a visual interpretation of the fea-tures or patterns that each kernel in the encodermodel has learned to detect. Each kernel is acti-vated if it sees a specific pattern in the input space,and this is determined as follows [31]:

zk = arg maxz(l),ak

L′(

z(l),ak

(Θ

(l)k

)), (7)

where z(l),ak

(Θ

(l)k

)is the activated feature map in

layer (l) using kernel k, and L′(·) can be any mono-tonically increasing function (such as L1-norm orL2-norm) defined over the activated feature map forthe kernel k in layer (l). Thus, (7) computes an inputspace zk, which produces the maximum activationsfor the kernel Θ

(l)k . This is shown below in Fig. 8

for a selected subset of kernels in the first 3 layersof our trained speed reconstruction model.

Interestingly, we see that the patterns or fea-tures learned by the model are more relevant tothe macroscopic traffic speed states. For instance,the kernels in the first layer (Fig. 8a) detect discretetraffic states such as congestion (pale red color),free flow (purple color), slow moving traffic (redstripes) and vehicles in the transition regime (yel-low). In the second layer (Fig. 8b), the kernels de-tect information propagation for congested and freeflow traffic as seen from the sloped green, red, andpurple regions. The third layer (Fig. 8c) combinesthese primitive features and produces more realisticspatio-temporal patterns such as free flow regime,congested regime, transient dynamics (from freeflow to congestion and vice-versa). This clearlydepicts the model’s ability to recognize different

10

(a) Learned kernels in layer 1

(b) Learned kernels in layer 2

(c) Learned kernels in layer 3

Figure 8: Primary features learned by each layer of the trained speed reconstruction model (only a subset of kernelsin each layer are shown here).

traffic phenomena, and the mechanisms by whichthis information is used to reconstruct the completespatio-temporal speed maps.

To further understand the reconstruction abili-ties of the model, we investigate the latent spacerepresentation (h) for different probe levels. Here,h ∈ R(10×5×64) contains 64 feature maps, each ofwhich has a dimension of (10× 5) (see Table 1). Wetake the (normalized) sum of all these feature mapactivations in order to visualize them over a two-dimensional plane; this is shown in Fig. 9 for probelevels of 1%, 5%, 10%, and 100%.

Recall that h encodes the primary informationfrom the sparse input vehicle trajectories, and thedecoder model uses this encoded information to re-

construct the macroscopic speed states; i.e., the out-put from the decoder model only depends on h. As-suming that the visualized activations in Fig. 9 area surrogate for the information contained in h, wesee similar patterns at 100% and 5% probe levels (inFig. 9c and Fig. 9e), implying that the reconstructedspeed states will also be the same. This explainswhy the estimation model performs well even at 5%probe penetration rates.

4 Conclusions

In this paper, we address the problem of estimat-ing dynamic traffic states (namely speeds) from lim-ited probe vehicle data. We proposed a convolu-

11

0 20 40 600

100

200

300

400

500

600

700

800

(a) Full trajectory

0 2 4

0

2

4

6

8

0.0

0.2

0.4

0.6

0.8

1.0

(b) 1%

0 2 4

0

2

4

6

8

0.0

0.2

0.4

0.6

0.8

1.0

(c) 5%

0 2 4

0

2

4

6

8

0.0

0.2

0.4

0.6

0.8

1.0

(d) 10%

0 2 4

0

2

4

6

8

0.0

0.2

0.4

0.6

0.8

1.0

(e) 100%

Figure 9: Normalized latent space activations for different probe levels. To reduce the variance, the input trajecto-ries are sampled 100 times and average activations are shown.

tional encoder-decoder neural network model tolearn traffic speed dynamics from space-time dia-grams. We illustrated this for a long road sectionand the results showed a sound and accurate recon-struction of macroscopic speed maps, even at 5%probe penetration rate. The model captured vari-ous spatio-temporal traffic behaviors such as back-ward propagating shockwaves, free-flow regimesand transient dynamics, which existing estimationmethods fail to reproduce at low probe penetrationrates. This was further confirmed with an analy-sis of the model’s reconstruction mechanism wherewe visually observed that the kernels in the initiallayers of the CNN learned to identify discrete traf-fic states: free flow, congested, and transient states.Successive layers recognized more compound pat-terns such as stop-and-go traffic, free flow propaga-tion, and transient dynamics.

We also found that the model learned to repro-duce dynamically varying shockwave patterns de-pending on the local traffic conditions, thus cir-cumventing the constant wave speed assumptionwidely used in many of the existing estimationtools. Furthermore, when validated against theNGSIM dataset, the neural network successfullyidentified all shockwaves, despite being trained usingsimulated data. These insights attest to the viabilityof data-driven methods for real world applications.In the future, our efforts will continue along thelines of rigorous analysis of the model’s reconstruc-

tion mechanism, as well as further applications toarterials with signalized intersections.

Acknowledgment

This work was supported by the NYUAD Centerfor Interacting Urban Networks (CITIES), fundedby Tamkeen under the NYUAD Research InstituteAward CG001 and by the Swiss Re Institute underthe Quantum CitiesTM initiative.

References

[1] L. Ambuhl and M. Menendez. Data fusionalgorithm for macroscopic fundamental dia-gram estimation. Transportation Research PartC: Emerging Technologies, 71:184–197, 2016.

[2] X. Ban, R. Herring, P. Hao, and A. Bayen. De-lay pattern estimation for signalized intersec-tions using sampled travel times. Transporta-tion Research Record: Journal of the TransportationResearch Board, 2130(1):109–119, 2009.

[3] X. Ban, P. Hao, and Z. Sun. Real time queuelength estimation for signalized intersectionsusing travel times from mobile sensors. Trans-portation Research Part C: Emerging Technologies,19(6):1133–1156, 2011.

12

[4] X. Chen, S. Zhang, L. Li, and L. Li. Adaptiverolling smoothing with heterogeneous data fortraffic state estimation and prediction. IEEETransactions on Intelligent Transportation Sys-tems, 20(4):1247–1258, 2019.

[5] S. Cui, B. Seibold, R. Stern, and D. Work. Sta-bilizing traffic flow via a single autonomousvehicle: Possibilities and limitations. In 2017IEEE Intelligent Vehicles Symposium (IV), pages1336–1341, June 2017.

[6] I. Dakic and M. Menendez. On the use of La-grangian observations from public transportand probe vehicles to estimate car space-meanspeeds in bi-modal urban networks. Trans-portation Research Part C: Emerging Technologies,91:317–334, 2018.

[7] D. Dilip, N. Freris, and S.E. Jabari. Sparseestimation of travel time distributions usingGamma kernels. In The 96th Annual Meetingof the Transportation Research Board, WashingtonD.C, pages 17–02971, 2017.

[8] I. Goodfellow, Y. Bengio, and A. Courville.Deep Learning. MIT Press, 2016.

[9] S. Guler, M. Menendez, and L. Meier. Usingconnected vehicle technology to improve theefficiency of intersections. Transportation Re-search Part C: Emerging Technologies, 46:121–131,2014.

[10] P. Hao and X. Ban. Long queue estimationfor signalized intersections using mobile data.Transportation Research Part B: Methodological,82:54–73, 2015.

[11] P. Hao, X. Ban, K. Bennett, Q. Ji, and Z. Sun.Signal timing estimation using sample inter-section travel times. IEEE Transactions on In-telligent Transportation Systems, 13(2):792–804,2012.

[12] P. Hao, X. Ban, D. Guo, and Q. Ji. Cycle-by-cycle intersection queue length distribution es-timation using sample travel times. Transporta-tion Research Part B: Methodological, 68:185–204,2014.

[13] A. Hofleitner, R. Herring, P. Abbeel, andA. Bayen. Learning the dynamics of arte-rial traffic from probe data using a dynamicBayesian network. IEEE Transactions on Intel-ligent Transportation Systems, 13(4):1679–1693,2012.

[14] P. Izadpanah, B. Hellinga, and L. Fu. Au-tomatic traffic shockwave identification usingvehicles trajectories. In Proceedings of the 88thAnnual Meeting of the Transportation ResearchBoard, pages 09–0807, 2009.

[15] S.E. Jabari and H. Liu. A stochastic model oftraffic flow: Theoretical foundations. Trans-portation Research Part B: Methodological, 46(1):156–174, 2012.

[16] S.E. Jabari, J. Zheng, and H. Liu. A proba-bilistic stationary speeddensity relation basedon Newells simplified car-following model.Transportation Research Part B: Methodological,68:205–223, 2014.

[17] S.E. Jabari, F. Zheng, H. Liu, and M. Filipovska.Stochastic Lagrangian modeling of traffic dy-namics. In The 97th Annual Meeting of the Trans-portation Research Board, Washington D.C, pages18–04170, 2018.

[18] S.E. Jabari, D. Dilip, D. Lin, and B. Thon-nam Thodi. Learning traffic flow dynamicsusing random fields. IEEE Access, 7:130566–130577, 2019.

[19] S.E. Jabari, N. Freris, and D. Dilip. Sparsetravel time estimation from streaming data.Transportation Science, Forthcoming. doi: 10.1287/trsc.2019.0920.

[20] Y. Jia, J. Wu, and Y. Du. Traffic speed pre-diction using deep learning method. In 2016IEEE 19th International Conference on Intelli-gent Transportation Systems (ITSC), pages 1217–1222, 2016.

[21] Y. Jia, J. Wu, and Y. Du. Traffic speed pre-diction using deep learning method. In 2016IEEE 19th International Conference on Intelli-gent Transportation Systems (ITSC), pages 1217–1222, 2016.

13

[22] D. Kingma and J. Ba. Adam: A methodfor stochastic optimization. arXiv preprintarXiv:1412.6980, 2014.

[23] A. Koesdwiady, R. Soua, and F. Karray. Im-proving traffic flow prediction with weatherinformation in connected cars: A deep learn-ing approach. IEEE Transactions on VehicularTechnology, 65(12):9508–9517, 2016.

[24] A. Krizhevsky, I. Sutskever, and G. Hinton. Im-agenet classification with deep convolutionalneural networks. In Advances in Neural Informa-tion Processing Systems, pages 1097–1105, 2012.

[25] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang.Long short-term memory neural network fortraffic speed prediction using remote mi-crowave sensor data. Transportation ResearchPart C: Emerging Technologies, 54:187–197, 2015.

[26] J. Masci, U. Meier, D. Ciresan, and J. Schmid-huber. Stacked convolutional auto-encodersfor hierarchical feature extraction. InT. Honkela, W. Duch, M. Girolami, andS. Kaski, editors, Artificial Neural Networks andMachine Learning – ICANN 2011, pages 52–59.Springer, Berlin Heidelberg, 2011.

[27] US Department of Transportation.NGSIMNext Generation Simulation,2006. URL https://ops.fhwa.dot.gov/

trafficanalysistools/ngsim.htm.

[28] N. Polson and V. Sokolov. Deep learning forshort-term traffic flow prediction. Transporta-tion Research Part C: Emerging Technologies, 79:1–17, 2017.

[29] D. Scherer, A. Muller, and S. Behnke. Eval-uation of pooling operations in convolutionalarchitectures for object recognition. In Inter-national Conference on Artificial Neural Networks,pages 92–101, 2010.

[30] T. Seo, A. Bayen, T. Kusakabe, and Y. Asakura.Traffic state estimation on highway: A compre-hensive survey. Annual Reviews in Control, 43:128–151, 2017.

[31] K. Simonyan, A. Vedaldi, and A. Zisserman.Deep inside convolutional networks: Visual-ising image classification models and saliencymaps. arXiv preprint arXiv:1312.6034, 2013.

[32] Jasper Snoek, Hugo Larochelle, and Ryan P.Adams. Practical Bayesian optimization of ma-chine learning algorithms. Neural InformationProcessing Systems,, 2012.

[33] R. Stern, S. Cui, M. Delle Monache, R. Bhadani,M. Bunting, M. Churchill, N. Hamilton,R. Haulcy, H. Pohlmann, F. Wu, B. Piccoli,B. Seibold, J. Sprinkle, and D. Work. Dissi-pation of stop-and-go waves via control of au-tonomous vehicles: Field experiments. Trans-portation Research Part C: Emerging Technologies,89:205–221, 2018.

[34] G. Tilg, K. Yang, and M. Menendez. Evaluat-ing the effects of automated vehicle technol-ogy on the capacity of freeway weaving sec-tions. Transportation Research Part C: EmergingTechnologies, 96:3–21, 2018.

[35] M. Treiber and D. Helbing. Reconstructing thespatio-temporal traffic dynamics from station-ary detector data. Cooper@tive Tr@nsport@tionDyn@mics, 1(3):3–1, 2002.

[36] Martin Treiber, Arne Kesting, and R. EddieWilson. Reconstructing the traffic state by fu-sion of heterogeneous data. Computer-AidedCivil and Infrastructure Engineering, 26(6):408–419, 2011.

[37] J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong.Traffic speed prediction and congestion sourceexploration: A deep learning method. In 2016IEEE 16th International Conference on Data Min-ing (ICDM), pages 499–508, 2016.

[38] Z. Wang, A. Bovik, H. Sheikh, and E. Simon-celli. Image quality assessment: From errorvisibility to structural similarity. IEEE Trans-actions on Image Processing, 13(4):600–612, 2004.

[39] C. Wu, A. Bayen, and A. Mehta. Stabilizingtraffic with autonomous vehicles. In 2018 IEEEInternational Conference on Robotics and Automa-tion (ICRA), pages 1–7, 2018.

14

https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm

https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm

[40] C. Yang, Q. Xiao, J. Jing, and B. Ran. An ex-ploratory shockwave approach to estimatingqueue length using probe trajectories. Journalof Intelligent Transportation Systems, 16(1):12–23,2012.

[41] K. Yang and M. Menendez. Queue estimationin a connected vehicle environment: A con-vex approach. IEEE Transactions on IntelligentTransportation Systems, 20(7):2480–2496, 2019.

[42] K. Yang, S. Guler, and M. Menendez. Isolatedintersection control for various levels of vehi-cle technology: Conventional, connected, andautomated vehicles. Transportation ResearchPart C: Emerging Technologies, 72:109–129, 2016.

[43] F. Zheng, S.E. Jabari, H. Liu, and D. Lin.Traffic state estimation using stochastic La-grangian dynamics. Transportation ResearchPart B: Methodological, 115:143–165, 2018.

15

Documents

arXiv:2002.04406v1 [physics.soc-ph] 21 Jan 2020 · 2020. 2. 12. · Tra c Data Imputation using Deep Convolutional Neural Networks Ouafa Benkraouda1, Bilal Thonnam Thodi1, Hwasoo