DEVELOPMENT... OpenML
Data
COVID19-Dataset-with-100-World-Countries

COVID19-Dataset-with-100-World-Countries

active ARFF Attribution 4.0 International (CC BY 4.0) Visibility: public Uploaded 23-03-2022 by Lowe
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
COVID19-Algeria-and-World-Dataset A coronavirus dataset with 104 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the COVID-19. The assumptions for the different factors are as follows: Geography: some continents/areas may be more affected by the disease Climate: cold temperatures may promote the spread of the virus Healthcare: lack of hospital beds/doctors may lead to more human losses Economy: weak economies (GDP) have fewer means to fight the disease Demography: older populations may be at higher risk of the disease The last column represents the number of daily tests performed and the total number of cases and deaths reported each day. Data description Countries in the dataset by geographic coordinates Europe: 33 countries Asia: 28 countries Africa: 21 countries North America: 11 countries South America: 8 countries Oceania: 3 countries Statistical description of the data Data distribution Download The dataset is available in an encoded CSV form on GitHub. Python code The Python Jupyter Notebook to read and visualize the data is available on nbviewer. Data update The dataset is updated every month with the latest numbers of COVID-19 cases, deaths, and tests. The last update was on March 01, 2021. Data construction The dataset is constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the coronavirus. Note that we selected only the main factors for which we found data and that other factors can be used. All data were retrieved from the reliable Our World in Data website, except for data on: Continents: www.kaggle.com/statchaitya/country-to-continent Geographic-coordinates: www.kaggle.com/eidanch/counties-geographic-coordinates Temperatures: www.kaggle.com/berkeleyearth/climate-change-earth-surface-temperature-data Share of the population over 65 years old: https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS GDP/Capita: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD Citation If you want to use the dataset please cite the following arXiv paper, more details about the data construction are provided in it. articlebelkacem_covid-19_2020, title = COVID-19 data analysis and forecasting: Algeria and the world, shorttitle = COVID-19 data analysis and forecasting, journal = arXiv preprint arXiv:2007.09755, author = Belkacem, Sami, year = 2020 Contact If you have any question or suggestion, please contact me at this email address: s.belkacemusthb.dz

15 features

Entitystring104 unique values
0 missing
Continentstring6 unique values
0 missing
Latitudenumeric104 unique values
0 missing
Longitudenumeric104 unique values
0 missing
Average_temperature_per_yearnumeric28 unique values
0 missing
Hospital_beds_per_1000_peoplenumeric77 unique values
0 missing
Medical_doctors_per_1000_peoplenumeric86 unique values
0 missing
GDP/Capitanumeric104 unique values
0 missing
Populationnumeric104 unique values
0 missing
Median_agenumeric32 unique values
0 missing
Population_aged_65_and_over_(%)numeric23 unique values
0 missing
Datestring425 unique values
0 missing
Daily_testsnumeric17747 unique values
7895 missing
Casesnumeric26843 unique values
254 missing
Deathsnumeric10404 unique values
3610 missing

19 properties

38472
Number of instances (rows) of the dataset.
15
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
11759
Number of missing values in the dataset.
9603
Number of instances with at least one value missing.
12
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of nominal attributes.
Average class difference between consecutive instances.
80
Percentage of numeric attributes.
2.04
Percentage of missing values.
24.96
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
Number of instances belonging to the least frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the most frequent class.
0
Number of attributes divided by the number of instances.

0 tasks

Define a new task