Fusion convolutional neural network-based interpretation of unobserved heterogeneous factors in driver injury severity outcomes in single-vehicle crashes
abstract
-
Model:
-
a fusion convolutional neural network with random term (FCNN-R) model is proposed for driver injury severity analysis
-
sub-NNs ( for categorical variables ) + multi-layer convolutional neural network, CNN ( captures the potential nonlinear relationship between impact factors and driver injury severity outcomes )
-
Introduction
Huge loss caused by traffic accident -> need understanding relations between driver injury severity and the impact factors
- Problem :unobserved heterogeneity:
- multinomial logit models are associated with the constraint of the independence of irrelevant alternatives (IIA) property, which assumes that unobserved factors are independent across individual crashes.
- Statistical techniques:
- nested logit models: hierarchically capture the correlation of unobserved factors among severity levels
- random parameter logit models: uncover unobserved heterogeneity by allowing parameters varying across observations
- ordered logit/probit model with random parameters: handle the ordered nature of crash injury severity and unobserved heterogeneities simultaneously
- latent class models: address the unobserved heterogeneity by classifying crash data into homogeneous groups
- **Machine learning techniques **:
- only one-hidden layer( shallow feature learning ). better or comparable predictive performance, depend on the data characteristics used in the training process and may be prone to overfitting data.
- limited studies have been conducted to apply DNN approaches in driver injury severity analysis
- limitations:
- black box, do not provide the interpretable parameters we get when using statistical models
- randomly initialized weights and local optimized training algorithm - (caused) -> training results of multiple runs are usually different (e.g. some variables may be essential in one run but not in the other run)
- overfitting
- limitations:
- primary objective:
- apply the fusion convolutional neural networks to investigate the relationships between the impact factors and driver injury severity
- The unobserved heterogeneity across different crash records is illustrated using a random error term with zero means
- Marginal effect analysis is applied to uncover the essentiality of each variable for driver injury severity
- apply the fusion convolutional neural networks to investigate the relationships between the impact factors and driver injury severity
Data description
still Washington State Department of Transportation (WDOT) , single-vehicle crashes
See [A latent class approach for driver injury severity analysis in highway single vehicle crash considering unobserved heterogeneity and temporal influence](./A latent class approach for driver injury severity analysis in highway single vehicle crash.md)
- remove incorrect information
- examined and combine existing variables into classes
Methodology
general description
- fusion convolutional neural network for analyzing and predicting the driver injury severity in highway single-vehicle crashes
- FCNN-R model includes two components:
- sub-neural networks dealing with the input issue of various categorical variables
- deep convolutional neural network capturing the potential nonlinear relationship among the impact factors and driver injury severity outcomes
Input structure for crash characteristics
-
most ML crash analysis, the information is imported using a single data input layer or using separated sub-models based on their spatiotemporal patterns
-
one-hot variables -> zero inputs do make interactions with other variables -> each categorical variable has a sub-NN to deal with relationships among the alternatives within a variable -> only affect the value of the sub-NN’s output
-
random generator : captured unobserved heterogeneity for each record
23 sub-NNs + randomly generated parameter 24-element vector represents the crash features
Multiple hidden layers
-
convolutional layer ( learnable filters ) + pooling ( max-pooling ) layer
-
output layer contains three cells associated with the three injury severity levels, respectively, and the driver injury severity for the input crash record is the injury severity level with the most substantial output value.
Objective function
minimize the cross-entropy loss ( between the model outputs and the real driver injury severity in each crash record )
- where is the label value for sth driver injury severity obtained from the original crash record, and is the model prediction result for sth driver injury severity
- dropout method with a probability of 0.5 at the second dense layer and the L2-norm regularization approaches
- W indicates all the weights in the proposed FCNN-R model, and k indicates the regularization parameter, which bal- ances the bias-variance tradeoff
- two measurements, i.e., the prediction accuracy rate (PAR) and false positive rate (FPR)
Marginal effect analysis
……
Results of data analysis
compared these 4 models below
- I: simple FCNN-R model
- II: FCNN-R model with a dropout layer
- III: FCNN-R model with L2-norm regularization term in the objective function
- IV: FCNN-R model with dropout layer and L2-norm regularization term in the objective function
then compared with 2statistical methods:
-
the multinomial logit (MNL) model
-
the mixed multinomial logit (MMNL) model
and three neural network-related approaches: NN CNN FCNN
选择的数据集分为三个部分,包括训练集、验证集和测试集。更具体地说,从 7 年的数据集中随机选择了总共 9000 条崩溃记录(约占整个数据集的 30%)并作为测试集保留。在此之后,我们随机选择其余的崩溃记录中的 80% 作为训练集,而剩下的 20% 作为验证集。验证集和测试集的区别在于验证集用于为调整模型超参数(模型布局的参数)提供无偏评估,而测试集用于评估最终模型。
it is noted that the regularization and dropout techniques do improve the stability of the proposed model. However, they do not improve the predictive performance of the proposed model
terminology
hyperparameter
is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are derived via training.