DEVELOPMENT... OpenML
Data
US-Accidents-(4.2-million-records)

US-Accidents-(4.2-million-records)

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 22-03-2022 by Stewart
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Description This is a countrywide car accident dataset, which covers 49 states of the USA. The accident data are collected from February 2016 to Dec 2020, using two APIs that provide streaming traffic incident (or event) data. These APIs broadcast traffic data captured by a variety of entities, such as the US and state departments of transportation, law enforcement agencies, traffic cameras, and traffic sensors within the road-networks. Currently, there are about 4.2 million accident records in this dataset. Check here to learn more about this dataset. Acknowledgements Please cite the following papers if you use this dataset: Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. "A Countrywide Traffic Accident Dataset.', 2019. Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019. Content This dataset has been collected in real-time, using multiple Traffic APIs. Currently, it contains accident data that are collected from February 2016 to Dec 2020 for the Contiguous United States. Check here to learn more about this dataset. Inspiration US-Accidents can be used for numerous applications such as real-time car accident prediction, studying car accidents hotspot locations, casualty analysis and extracting cause and effect rules to predict car accidents, and studying the impact of precipitation or other environmental stimuli on accident occurrence. The most recent release of the dataset can also be useful to study the impact of COVID-19 on traffic behavior and accidents. Usage Policy and Legal Disclaimer This dataset is being distributed only for Research purposes, under Creative Commons Attribution-Noncommercial-ShareAlike license (CC BY-NC-SA 4.0). By clicking on download button(s) below, you are agreeing to use this data only for non-commercial, research, or academic applications. You may need to cite the above papers if you use this dataset.

46 features

No_Exitnominal2 unique values
0 missing
Pressure(in)numeric1068 unique values
59200 missing
Visibility(mi)numeric76 unique values
70546 missing
Wind_Directionstring24 unique values
73775 missing
Wind_Speed(mph)numeric136 unique values
157944 missing
Precipitation(in)numeric230 unique values
549458 missing
Weather_Conditionstring127 unique values
70636 missing
Amenitynominal2 unique values
0 missing
Bumpnominal2 unique values
0 missing
Crossingnominal2 unique values
0 missing
Give_Waynominal2 unique values
0 missing
Junctionnominal2 unique values
0 missing
Humidity(%)numeric100 unique values
73092 missing
Railwaynominal2 unique values
0 missing
Roundaboutnominal2 unique values
0 missing
Stationnominal2 unique values
0 missing
Stopnominal2 unique values
0 missing
Traffic_Calmingnominal2 unique values
0 missing
Traffic_Signalnominal2 unique values
0 missing
Turning_Loopnominal1 unique values
0 missing
Sunrise_Sunsetstring2 unique values
2867 missing
Civil_Twilightstring2 unique values
2867 missing
Nautical_Twilightstring2 unique values
2867 missing
Astronomical_Twilightstring2 unique values
2867 missing
Sidestring3 unique values
0 missing
Severitynumeric4 unique values
0 missing
Start_Timestring1959333 unique values
0 missing
End_Timestring2351505 unique values
0 missing
Start_Latnumeric1093622 unique values
0 missing
Start_Lngnumeric1120374 unique values
0 missing
End_Latnumeric1080817 unique values
0 missing
End_Lngnumeric1105411 unique values
0 missing
Distance(mi)numeric14165 unique values
0 missing
Descriptionstring1174563 unique values
0 missing
Numbernumeric46402 unique values
1743911 missing
Streetstring159651 unique values
2 missing
ID (ignore)string2845342 unique values
0 missing
Citystring11681 unique values
137 missing
Countystring1707 unique values
0 missing
Statestring49 unique values
0 missing
Zipcodestring363085 unique values
1319 missing
Countrystring1 unique values
0 missing
Timezonestring4 unique values
3659 missing
Airport_Codestring2004 unique values
9549 missing
Weather_Timestampstring474214 unique values
50736 missing
Temperature(F)numeric789 unique values
69274 missing
Wind_Chill(F)numeric897 unique values
469643 missing

19 properties

2845342
Number of instances (rows) of the dataset.
46
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
3414349
Number of missing values in the dataset.
1902024
Number of instances with at least one value missing.
14
Number of numeric attributes.
13
Number of nominal attributes.
28.26
Percentage of nominal attributes.
Average class difference between consecutive instances.
30.43
Percentage of numeric attributes.
2.61
Percentage of missing values.
66.85
Percentage of instances having missing values.
28.26
Percentage of binary attributes.
13
Number of binary attributes.
Number of instances belonging to the least frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the most frequent class.
0
Number of attributes divided by the number of instances.

0 tasks

Define a new task