OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

analcatdata_authorship

active ARFF Publicly available Visibility: public Uploaded 28-09-2014 by Felicia West
0 likes downloaded by 8 people , 8 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Date unknown Please cite: analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two versions of each of 84 data sets, plus this README file. Each data set is given in comma-delimited ASCII (.csv) form, and Microsoft Excel (.xls) form. NOTICE: These data sets may be used freely for scientific, educational and/or noncommercial purposes, provided suitable acknowledgment is given (by citing the above-named reference). Further details concerning the book, including information on statistical software (including sample S-PLUS/R and SAS code), are available at the web site http://www.stern.nyu.edu/~jsimonof/AnalCatData Information about the dataset CLASSTYPE: nominal CLASSINDEX: last Note: Quotes, Single-Quotes and Backslashes were removed, Blanks replaced with Underscores

71 features

Author (target)	nominal	4 unique values 0 missing
then	numeric	13 unique values 0 missing
not	numeric	37 unique values 0 missing
their	numeric	26 unique values 0 missing
the	numeric	137 unique values 0 missing
that	numeric	38 unique values 0 missing
than	numeric	14 unique values 0 missing
such	numeric	14 unique values 0 missing
some	numeric	13 unique values 0 missing
so	numeric	24 unique values 0 missing
should	numeric	14 unique values 0 missing
our	numeric	30 unique values 0 missing
or	numeric	29 unique values 0 missing
only	numeric	11 unique values 0 missing
one	numeric	18 unique values 0 missing
on	numeric	26 unique values 0 missing
of	numeric	76 unique values 0 missing
now	numeric	16 unique values 0 missing
no	numeric	23 unique values 0 missing
what	numeric	21 unique values 0 missing
BookID	numeric	12 unique values 0 missing
your	numeric	31 unique values 0 missing
would	numeric	23 unique values 0 missing
with	numeric	36 unique values 0 missing
will	numeric	25 unique values 0 missing
who	numeric	16 unique values 0 missing
which	numeric	20 unique values 0 missing
when	numeric	16 unique values 0 missing
there	numeric	16 unique values 0 missing
were	numeric	31 unique values 0 missing
was	numeric	64 unique values 0 missing
upon	numeric	11 unique values 0 missing
up	numeric	14 unique values 0 missing
to	numeric	63 unique values 0 missing
this	numeric	27 unique values 0 missing
things	numeric	10 unique values 0 missing
be	numeric	39 unique values 0 missing
every	numeric	12 unique values 0 missing
even	numeric	9 unique values 0 missing
down	numeric	13 unique values 0 missing
do	numeric	22 unique values 0 missing
can	numeric	12 unique values 0 missing
by	numeric	21 unique values 0 missing
but	numeric	25 unique values 0 missing
been	numeric	24 unique values 0 missing
for	numeric	27 unique values 0 missing
at	numeric	22 unique values 0 missing
as	numeric	31 unique values 0 missing
are	numeric	21 unique values 0 missing
any	numeric	16 unique values 0 missing
and	numeric	83 unique values 0 missing
an	numeric	50 unique values 0 missing
also	numeric	6 unique values 0 missing
all	numeric	27 unique values 0 missing
in	numeric	45 unique values 0 missing
my	numeric	51 unique values 0 missing
must	numeric	16 unique values 0 missing
more	numeric	15 unique values 0 missing
may	numeric	11 unique values 0 missing
its	numeric	12 unique values 0 missing
it	numeric	49 unique values 0 missing
is	numeric	40 unique values 0 missing
into	numeric	15 unique values 0 missing
a	numeric	57 unique values 0 missing
if	numeric	18 unique values 0 missing
his	numeric	54 unique values 0 missing
her	numeric	66 unique values 0 missing
have	numeric	31 unique values 0 missing
has	numeric	15 unique values 0 missing
had	numeric	45 unique values 0 missing
from	numeric	25 unique values 0 missing

Show all 71 features

107 properties

NumberOfInstances

841

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

AutoCorrelation

Average class difference between consecutive instances.

CfsSubsetEval_DecisionStumpAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpKappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesKappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NKappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

ClassEntropy

1.79

Entropy of the target attribute values.

DecisionStumpAUC

0.75

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpErrRate

0.44

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpKappa

0.34

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Dimensionality

0.08

Number of attributes divided by the number of instances.

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

J48.00001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.Kappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.0001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.Kappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.Kappa

0.88

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MajorityClassPercentage

37.69

Percentage of instances belonging to the most frequent class.

MajorityClassSize

317

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

Maximum entropy among attributes.

MaxKurtosisOfNumericAtts

21.23

Maximum kurtosis among attributes of the numeric type.

MaxMeansOfNumericAtts

77.36

Maximum of means among attributes of the numeric type.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MaxSkewnessOfNumericAtts

4.09

Maximum skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

31.07

Maximum standard deviation of attributes of the numeric type.

MeanAttributeEntropy

Average entropy of the attributes.

MeanKurtosisOfNumericAtts

2.36

Mean kurtosis among attributes of the numeric type.

MeanMeansOfNumericAtts

10.13

Mean of means among attributes of the numeric type.

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

MeanSkewnessOfNumericAtts

1.11

Mean skewness among attributes of the numeric type.

MeanStdDevOfNumericAtts

5.36

Mean standard deviation of attributes of the numeric type.

MinAttributeEntropy

Minimal entropy among attributes.

MinKurtosisOfNumericAtts

-0.79

Minimum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.44

Minimum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-0.04

Minimum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.8

Minimum standard deviation of attributes of the numeric type.

MinorityClassPercentage

6.54

Percentage of instances belonging to the least frequent class.

MinorityClassSize

Number of instances belonging to the least frequent class.

NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesErrRate

0.01

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesKappa

0.98

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NumberOfBinaryFeatures

Number of binary attributes.

PercentageOfBinaryFeatures

Percentage of binary attributes.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

PercentageOfMissingValues

Percentage of missing values.

PercentageOfNumericFeatures

98.59

Percentage of numeric attributes.

PercentageOfSymbolicFeatures

1.41

Percentage of nominal attributes.

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile1KurtosisOfNumericAtts

0.36

First quartile of kurtosis among attributes of the numeric type.

Quartile1MeansOfNumericAtts

3.13

First quartile of means among attributes of the numeric type.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Quartile1SkewnessOfNumericAtts

0.7

First quartile of skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

2.52

First quartile of standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

Quartile2KurtosisOfNumericAtts

1.41

Second quartile (Median) of kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

5.03

Second quartile (Median) of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

1.03

Second quartile (Median) of skewness among attributes of the numeric type.

Quartile2StdDevOfNumericAtts

3.9

Second quartile (Median) of standard deviation of attributes of the numeric type.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

Quartile3KurtosisOfNumericAtts

2.44

Third quartile of kurtosis among attributes of the numeric type.

Quartile3MeansOfNumericAtts

11.94

Third quartile of means among attributes of the numeric type.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

Quartile3SkewnessOfNumericAtts

1.34

Third quartile of skewness among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

6.21

Third quartile of standard deviation of attributes of the numeric type.

REPTreeDepth1AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1ErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth2AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2ErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3ErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

RandomTreeDepth1AUC

0.89

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1Kappa

0.77

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth2AUC

0.89

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2Kappa

0.77

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth3AUC

0.89

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3Kappa

0.77

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

kNN1NAUC

0.99

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NErrRate

0.01

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NKappa

0.98

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

Show all 107 properties

36 tasks

Supervised Classification on analcatdata_authorship

23583 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: Author

Supervised Classification on analcatdata_authorship

175 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Author

Supervised Classification on analcatdata_authorship

1 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: Author

Supervised Classification on analcatdata_authorship

0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: Author

Supervised Classification on analcatdata_authorship

0 runs - estimation_procedure: 33% Holdout set - target_feature: Author

Supervised Classification on analcatdata_authorship

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: Author

Learning Curve on analcatdata_authorship

44 runs - estimation_procedure: 10-fold Learning Curve - target_feature: Author

Supervised Data Stream Classification on analcatdata_authorship

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: Author

Clustering on analcatdata_authorship

0 runs - target_feature: Author

Clustering on analcatdata_authorship

0 runs

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Clustering on analcatdata_authorship

0 runs - estimation_procedure: 50 times Clustering

Subgroup Discovery on analcatdata_authorship

1308 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

1307 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

1304 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

1303 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Subgroup Discovery on analcatdata_authorship

0 runs - target_feature: Author

Define a new task

Sign in

analcatdata_authorship

71 features

107 properties

36 tasks