Anticipating Injury Severity After
Car Accidents

Group 20
Dhruv Samdani, Devashru Patel, Eric Ricci, Thomas Malchodi

View on GitHub
Image

Introduction

Road incidents are one of the leading causes of death worldwide. Automobile companies constantly try to innovate their safety mechanisms to minimize the risk of such injuries and fatalities. Often these focus on mitigating damage once an accident has already occurred, such as improved air bags. However, it is also important to analyze the factors that go into causing fatal accidents such as driver behavior, environment, etc. The goal of this project is to identify the components that are reliable predictors of accidents and use them to predict injury level.

The data set used in this analysis comes from the Fatality Analysis Reporting System (FARS) run by the National Highway Traffic Saftey Administration (NHTSA). The data is collected on crashes that occur throughout the US and includes all crashes resulting in injury or fatality in 2018.

Data Set

The full data set includes a wealth of information on outcomes for each vehicle and person involved in the crash, as well as a number of variables relating to the crash and the emergency response. In total, it includes over 80,000 data points representing persons involved in the crash and 61 input features. The features range from type of road to collision orientation and many others. The outcome for each individual has 5 possible levels of injury: "No Apparent Injury", "Possible Injury", "Suspected Minor Injury", "Suspected Serious Injury", and "Fatal Injury". We hope to perform an analysis on the features present in the data and determine factors that are the most likely to predict this level of injury.

Data Formatting

Factors that occured after the crash, such as emergency response time and time of death were ignored. This decision was made in part because it would be trivial to determine the injury level with features such as whether or not the person was taken to the hospital or the time of death. This also does not help accomplish the goal of detemrining how to improve safety precautions in vehicles, so we excluded all features that occured as a result of the crash. This reduced the number of features to 17 and when incomplete data was trimmed from the set the final data had approximately 60,000 data points. The final change was grouping injury severity into 3 outcomes: "No Injury", "Injury", and "Fatality" as five categories would lead to poor prediction accuracy.

Analysis Methods

Preliminary data analysis focused on choosing the relevant factors to remove extraneous factors and to improve runtime of model training. Unsupervised methods used included using mutual information to select the most relevant columns of data. Catagorical factors were expanded with binary dummy variables to allow for inclusion of all possible factors, so PCA was also applied to reduce dimensionality and speed up model training since the majority of feature were categorical. Supervised learning methods were used on the final data set to predict the injury level of people in the test data set. Methods tested include decision trees, neural networks, SVM, and random forests to classify each person by injury level.

References

  1. Chunjiao Dong, Chunfu Shao, Juan Li, and Zhihua Xiong, “An Improved Deep Learning Model for Traffic Crash Prediction,” Journal of Advanced Transportation, vol. 2018, Article ID 3869106, 13 pages, 2018. https://doi.org/10.1155/2018/3869106.

  2. F. L. Mannering and C. R. Bhat, “Analytic methods in accident research: methodological frontier and future directions,” Analytic Methods in Accident Research, vol. 1, pp. 1–22, 2014.

  3. Chong, Miao & Abraham, Ajith & Paprzycki, Marcin. (2005). Traffic Accident Analysis Using Machine Learning Paradigms.. Informatica (Slovenia). 29. 89-98.

  4. Doquire, G. & Verleysen, Michel. (2011). An hybrid approach to feature selection for mixed categorical and continuous data. KDIR 2011 - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval. 394-401.