Prioritizing Recall: Recall‑First Machine Learning for Traffic Accident Severity Detection under Class Imbalance
Keywords:
Traffic Crash Risk Prediction, Class Imbalance, Recall-First Evaluation, Machine Learning Classification, Cost-Sensitive Decision Thresholds, Public-Safety OperationsAbstract
Traffic safety agencies seek predictive models that identify high-risk crashes before traffic accidents occur, but realworld crash data are highly imbalanced, rendering the overall accuracy of machine learning models a misleading metric for predictive performance. This study investigates a recall-first approach to crash-risk classification in Virginia, arguing that missing a dangerous event carries a much higher cost than issuing an extra alert based on the data provided by Virginia Department of Transportation (VDOT). The modeling framed high-risk identification as a binary classification task and used standard tools, Logistic Regression and Random Forest, augmented with imbalance-aware training. Evaluation emphasized recall and precision, utilizing precision–recall curves and confusion matrices. Across experiments, recall-oriented training and thresholding consistently improved detection of high-risk cases relative to accuracy-optimized baselines, with expected trade-offs resulting in lower precision and overall accuracy. This paper further translates operational priorities into simple threshold policies, showing how agencies can tune models based on recall to align with resource constraints while minimizing costly misses. In conclusion, for safety-critical applications, recall should be the primary metric of success; furthermore, straightforward imbalance treatments—without complex model additions—can realign model performance with public-safety goals.
