DEVELOPMENT... OpenML
Data
penguins

penguins

active ARFF Public Domain (CC0) Visibility: public Uploaded 22-07-2020 by Robinson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
![palmerpenguins](https://github.com/allisonhorst/palmerpenguins/raw/master/man/figures/logo.png) ## Description The goal of palmerpenguins is to provide a great dataset for data exploration & visualization, as an alternative to iris. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. Please see [https://github.com/allisonhorst/palmerpenguins](https://github.com/allisonhorst/palmerpenguins) for more information. ## Citation Anyone interested in publishing the data should contact [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) about analysis and working together on any final products. From Gorman et al. (2014): _"Individuals interested in using these data are expected to follow the US LTER Network’s Data Access Policy, Requirements and Use Agreement: https://lternet.edu/data-access-policy/."_ This dataset has been derived from the R package palmerpenguins available from [https://allisonhorst.github.io/palmerpenguins/](https://allisonhorst.github.io/palmerpenguins/). Please cite as follows in publications: Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/ A BibTeX entry for LaTeX users is: @Manual{, title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data}, author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman}, year = {2020}, note = {R package version 0.1.0}, url = {https://allisonhorst.github.io/palmerpenguins/}, } ## Artwork You can download palmerpenguins art (useful for teaching with the data) from the Github repo or the R package. If you use this artwork, please cite with: "Artwork by @allison_horst". #### Meet the Palmer penguins #### Bill dimensions The culmen is the upper ridge of a bird’s bill. In this simplified dataset, culmen length and depth are renamed as variables bill_length_mm and bill_depth_mm to be more intuitive. For this penguin data, the culmen (bill) length and depth are measured as shown below (thanks Kristen Gorman for clarifying!):

7 features

species (target)nominal3 unique values
0 missing
islandnominal3 unique values
0 missing
culmen_length_mmnumeric164 unique values
2 missing
culmen_depth_mmnumeric80 unique values
2 missing
flipper_length_mmnumeric55 unique values
2 missing
body_mass_gnumeric94 unique values
2 missing
sexnominal3 unique values
10 missing

19 properties

344
Number of instances (rows) of the dataset.
7
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
18
Number of missing values in the dataset.
10
Number of instances with at least one value missing.
4
Number of numeric attributes.
3
Number of nominal attributes.
42.86
Percentage of nominal attributes.
0.99
Average class difference between consecutive instances.
57.14
Percentage of numeric attributes.
0.75
Percentage of missing values.
2.91
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
68
Number of instances belonging to the least frequent class.
19.77
Percentage of instances belonging to the least frequent class.
152
Number of instances belonging to the most frequent class.
44.19
Percentage of instances belonging to the most frequent class.
0.02
Number of attributes divided by the number of instances.

8 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: accuracy - target_feature: species
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: species
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task