DEVELOPMENT... OpenML
Data
WHO-national-life-expectancy

WHO-national-life-expectancy

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Mark Murphy
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information. This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences. Content A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it. Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set. Inspiration There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? Unraveling the correlations between the features would take significant work.

32 features

diphtherianumeric79 unique values
19 missing
une_schoolnumeric805 unique values
2306 missing
une_literacynumeric565 unique values
2540 missing
une_edu_spendnumeric1824 unique values
1286 missing
une_povertynumeric291 unique values
2198 missing
une_gninumeric1870 unique values
117 missing
une_hivnumeric187 unique values
741 missing
une_lifenumeric2944 unique values
0 missing
une_infantnumeric867 unique values
0 missing
une_popnumeric3073 unique values
37 missing
che_gdpnumeric2988 unique values
117 missing
gghe-dnumeric3004 unique values
100 missing
gni_capitanumeric1559 unique values
682 missing
hospitalsnumeric128 unique values
2981 missing
doctorsnumeric1737 unique values
1331 missing
basic_waternumeric2699 unique values
32 missing
countrystring183 unique values
0 missing
polionumeric77 unique values
19 missing
measlesnumeric78 unique values
19 missing
hepatitisnumeric92 unique values
569 missing
age5-19obesitynumeric213 unique values
34 missing
age5-19thinnessnumeric227 unique values
34 missing
bminumeric122 unique values
34 missing
alcoholnumeric2980 unique values
50 missing
age1-4mortnumeric1360 unique values
0 missing
infant_mortnumeric2758 unique values
0 missing
adult_mortalitynumeric3110 unique values
0 missing
life_exp60numeric3107 unique values
0 missing
life_expectnumeric3109 unique values
0 missing
yearnumeric17 unique values
0 missing
regionstring6 unique values
0 missing
country_codestring183 unique values
0 missing

19 properties

3111
Number of instances (rows) of the dataset.
32
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
15246
Number of missing values in the dataset.
3109
Number of instances with at least one value missing.
29
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of nominal attributes.
Average class difference between consecutive instances.
90.63
Percentage of numeric attributes.
15.31
Percentage of missing values.
99.94
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
Number of instances belonging to the least frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the most frequent class.
0.01
Number of attributes divided by the number of instances.

0 tasks

Define a new task