OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

fruitfly

active ARFF Publicly available Visibility: public Uploaded 23-04-2014 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identifier attribute deleted. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: Sexual activity and the lifespan of male fruitflies TYPE: Designed (almost factorial) experiment SIZE: 125 observations, 5 variables DESCRIPTIVE ABSTRACT: A cost of increased reproduction in terms of reduced longevity has been shown for female fruitflies, but not for males. The flies used were an outbred stock. Sexual activity was manipulated by supplying individual males with one or eight receptive virgin females per day. The longevity of these males was compared with that of two control types. The first control consisted of two sets of individual males kept with one or eight newly inseminated females. Newly inseminated females will not usually remate for at least two days, and thus served as a control for any effect of competition with the male for food or space. The second control was a set of individual males kept with no females. There were 25 males in each of the five groups, which were treated identically in number of anaesthetizations (using CO2) and provision of fresh food medium. SOURCE: Figure 2 in the article "Sexual Activity and the Lifespan of Male Fruitflies" by Linda Partridge and Marion Farquhar. _Nature_, 294, 580-581, 1981. VARIABLE DESCRIPTIONS: Columns Variable Description ------- -------- ----------- 1- 2 ID Serial No. (1-25) within each group of 25 (the order in which data points were abstracted) 4 PARTNERS Number of companions (0, 1 or 8) 6 TYPE Type of companion 0: newly pregnant female 1: virgin female 9: not applicable (when PARTNERS=0) 8- 9 LONGEVITY Lifespan, in days 11-14 THORAX Length of thorax, in mm (x.xx) 16-17 SLEEP Percentage of each day spent sleeping SPECIAL NOTES: `Compliance' of the males in the two experimental groups was documented as follows: On two days per week throughout the life of each experimental male, the females that had been supplied as virgins to that male were kept and examined for fertile eggs. The insemination rate declined from approximately 7 females/day at age one week to just under 2/day at age eight weeks in the males supplied with eight virgin females per day, and from just under 1/day at age one week to approximately 0.6/day at age eight weeks in the males supplied with one virgin female per day. These `compliance' data were not supplied for individual males, but the authors say that "There were no significant differences between the individual males within each experimental group." STORY BEHIND THE DATA: James Hanley found this dataset in _Nature_ and was attracted by the way the raw data were presented in classical analysis of covariance style in Figure 2. He read the data points from the graphs and brought them to the attention of a colleague with whom he was teaching the applied statistics course. Dr. Liddell thought that with only three explanatory variables (THORAX, plus PARTNERS and TYPE to describe the five groups), it would not be challenging enough as a data-analysis project. He suggested adding another variable. James Hanley added SLEEP, a variable not mentioned in the published article. Teachers can contact us about the construction of this variable. (We prefer to divulge the details at the end of the data-analysis project.) Further discussion of the background and pedagogical use of this dataset can be found in Hanley (1983) and in Hanley and Shapiro (1994). To obtain the Hanley and Shapiro article, send the one-line e-mail message: send jse/v2n1/datasets.hanley to the address archive@jse.stat.ncsu.edu PEDAGOGICAL NOTES: This has been the most successful and the most memorable dataset we have used in an "applications of statistics" course, which we have taught for ten years. The most common analysis techniques have been analysis of variance, classical analysis of covariance, and multiple regression. Because the variable THORAX is so strong (it explains about 1/3 of the variance in LONGEVITY), it is important to consider it to increase the precision of between-group contrasts. When students first check and find that the distributions of thorax length, and in particular, the mean thorax length, are very similar in the different groups, many of them are willing to say (in epidemiological terminology) that THORAX is not a confounding variable, and that it can be omitted from the analysis. There is usually lively discussion about the primary contrast. The five groups and their special structure allow opportunities for students to understand and verbalize what we mean by the term "statistical interaction." There is also much debate as to whether one should take the SLEEP variable into account. Some students say that it is an `intermediate' variable. Some students formally test the mean level of SLEEP across groups, find one pair where there is a statistically significant difference, and want to treat it as a confounding variable. A few students muse about how it was measured. There is heteroscedasticity in the LONGEVITY variable. One very observant student (now a professor) argued that THORAX cannot be used as a predictor or explanatory variable for the LONGEVITY outcome since fruitflies who die young may not be fully grown, i.e., it is also an intermediate variable. One Ph.D. student who had studied entomology assured us that fruitflies do not grow longer after birth; therefore, the THORAX length is not time-dependent! Curiously, the dataset has seldom been analyzed using techniques from survival analysis. The fact that there are no censored observations is not really an excuse, and one could easily devise a way to introduce censoring of LONGEVITY. REFERENCES: Hanley, J. A. (1983), "Appropriate Uses of Multivariate Analysis," _Annual Review of Public Health_, 4, 155-180. Hanley, J. A., and Shapiro, S. H. (1994), "Sexual Activity and the Lifespan of Male Fruitflies: A Dataset That Gets Attention," _Journal of Statistics Education_, Volume 2, Number 1. SUBMITTED BY: James A. Hanley and Stanley H. Shapiro Department of Epidemiology and Biostatistics McGill University 1020 Pine Avenue West Montreal, Quebec, H3A 1A2 Canada tel: +1 (514) 398-6270 (JH) +1 (514) 398-6272 (SS) fax: +1 (514) 398-4503 INJH@musicb.mcgill.ca, StanS@epid.lan.mcgill.ca

5 features

class (target)	numeric	47 unique values 0 missing
PARTNERS	nominal	3 unique values 0 missing
TYPE	nominal	3 unique values 0 missing
THORAX	numeric	46 unique values 0 missing
SLEEP	numeric	14 unique values 0 missing

Show all 5 features

107 properties

NumberOfInstances

125

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

AutoCorrelation

-16.65

Average class difference between consecutive instances.

CfsSubsetEval_DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

ClassEntropy

Entropy of the target attribute values.

DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Dimensionality

0.04

Number of attributes divided by the number of instances.

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

J48.00001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.0001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MajorityClassPercentage

Percentage of instances belonging to the most frequent class.

MajorityClassSize

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

Maximum entropy among attributes.

MaxKurtosisOfNumericAtts

3.15

Maximum kurtosis among attributes of the numeric type.

MaxMeansOfNumericAtts

57.44

Maximum of means among attributes of the numeric type.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MaxSkewnessOfNumericAtts

1.59

Maximum skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

17.56

Maximum standard deviation of attributes of the numeric type.

MeanAttributeEntropy

Average entropy of the attributes.

MeanKurtosisOfNumericAtts

0.78

Mean kurtosis among attributes of the numeric type.

MeanMeansOfNumericAtts

27.24

Mean of means among attributes of the numeric type.

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

MeanSkewnessOfNumericAtts

0.31

Mean skewness among attributes of the numeric type.

MeanStdDevOfNumericAtts

11.17

Mean standard deviation of attributes of the numeric type.

MinAttributeEntropy

Minimal entropy among attributes.

MinKurtosisOfNumericAtts

-0.41

Minimum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.82

Minimum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-0.64

Minimum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.08

Minimum standard deviation of attributes of the numeric type.

MinorityClassPercentage

Percentage of instances belonging to the least frequent class.

MinorityClassSize

Number of instances belonging to the least frequent class.

NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NumberOfBinaryFeatures

Number of binary attributes.

PercentageOfBinaryFeatures

Percentage of binary attributes.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

PercentageOfMissingValues

Percentage of missing values.

PercentageOfNumericFeatures

Percentage of numeric attributes.

PercentageOfSymbolicFeatures

Percentage of nominal attributes.

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile1KurtosisOfNumericAtts

-0.41

First quartile of kurtosis among attributes of the numeric type.

Quartile1MeansOfNumericAtts

0.82

First quartile of means among attributes of the numeric type.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Quartile1SkewnessOfNumericAtts

-0.64

First quartile of skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

0.08

First quartile of standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

Quartile2KurtosisOfNumericAtts

-0.4

Second quartile (Median) of kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

23.46

Second quartile (Median) of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

-0.01

Second quartile (Median) of skewness among attributes of the numeric type.

Quartile2StdDevOfNumericAtts

15.88

Second quartile (Median) of standard deviation of attributes of the numeric type.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

Quartile3KurtosisOfNumericAtts

3.15

Third quartile of kurtosis among attributes of the numeric type.

Quartile3MeansOfNumericAtts

57.44

Third quartile of means among attributes of the numeric type.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

Quartile3SkewnessOfNumericAtts

1.59

Third quartile of skewness among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

17.56

Third quartile of standard deviation of attributes of the numeric type.

REPTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

RandomTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

Show all 107 properties

18 tasks

Supervised Regression on fruitfly

4 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: Custom 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Regression on fruitfly

0 runs - estimation_procedure: Test on Training Data - evaluation_measure: predictive_accuracy - target_feature: class

Clustering on fruitfly

0 runs

Clustering on fruitfly