OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

covertype

active ARFF Publicly available Visibility: public Uploaded 23-04-2014 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Covertype Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types). This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value. As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4). The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition. Attribute Information: Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. ``` Name / Data Type / Measurement / Description Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer solstice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation ``` Relevant Papers: - Blackard, Jock A. and Denis J. Dean. 2000. "Comparative Accuracies of Artificial Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Computers and Electronics in Agriculture 24(3):131-151. - Blackard, Jock A. and Denis J. Dean. 1998. "Comparative Accuracies of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Second Southern Forestry GIS Conference. University of Georgia. Athens, GA. Pages 189-199. - Blackard, Jock A. 1998. "Comparison of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types." Ph.D. dissertation. Department of Forest Sciences. Colorado State University. Fort Collins, Colorado. 165 pages.

55 features

class (target)	nominal	7 unique values 0 missing
soil_type_15	nominal	2 unique values 0 missing
soil_type_14	nominal	2 unique values 0 missing
soil_type_16	nominal	2 unique values 0 missing
soil_type_17	nominal	2 unique values 0 missing
soil_type_18	nominal	2 unique values 0 missing
soil_type_19	nominal	2 unique values 0 missing
soil_type_20	nominal	2 unique values 0 missing
soil_type_21	nominal	2 unique values 0 missing
soil_type_22	nominal	2 unique values 0 missing
soil_type_23	nominal	2 unique values 0 missing
soil_type_24	nominal	2 unique values 0 missing
soil_type_25	nominal	2 unique values 0 missing
soil_type_26	nominal	2 unique values 0 missing
soil_type_27	nominal	2 unique values 0 missing
soil_type_28	nominal	2 unique values 0 missing
soil_type_29	nominal	2 unique values 0 missing
soil_type_30	nominal	2 unique values 0 missing
soil_type_31	nominal	2 unique values 0 missing
soil_type_32	nominal	2 unique values 0 missing
soil_type_33	nominal	2 unique values 0 missing
soil_type_34	nominal	2 unique values 0 missing
soil_type_35	nominal	2 unique values 0 missing
soil_type_36	nominal	2 unique values 0 missing
soil_type_37	nominal	2 unique values 0 missing
soil_type_38	nominal	2 unique values 0 missing
soil_type_39	nominal	2 unique values 0 missing
soil_type_40	nominal	2 unique values 0 missing
soil_type_1	nominal	2 unique values 0 missing
aspect	numeric	361 unique values 0 missing
slope	numeric	57 unique values 0 missing
horizontal_distance_to_hydrology	numeric	487 unique values 0 missing
Vertical_Distance_To_Hydrology	numeric	562 unique values 0 missing
Horizontal_Distance_To_Roadways	numeric	5343 unique values 0 missing
Hillshade_9am	numeric	195 unique values 0 missing
Hillshade_Noon	numeric	155 unique values 0 missing
Hillshade_3pm	numeric	253 unique values 0 missing
Horizontal_Distance_To_Fire_Points	numeric	5276 unique values 0 missing
wilderness_area1	numeric	2 unique values 0 missing
wilderness_area2	numeric	2 unique values 0 missing
wilderness_area3	numeric	2 unique values 0 missing
wilderness_area4	numeric	2 unique values 0 missing
elevation	numeric	1769 unique values 0 missing
soil_type_2	nominal	2 unique values 0 missing
soil_type_3	nominal	2 unique values 0 missing
soil_type_4	nominal	2 unique values 0 missing
soil_type_5	nominal	2 unique values 0 missing
soil_type_6	nominal	2 unique values 0 missing
soil_type_7	nominal	2 unique values 0 missing
soil_type_8	nominal	2 unique values 0 missing
soil_type_9	nominal	2 unique values 0 missing
soil_type_10	nominal	2 unique values 0 missing
soil_type_11	nominal	2 unique values 0 missing
soil_type_12	nominal	2 unique values 0 missing
soil_type_13	nominal	2 unique values 0 missing

Show all 55 features

107 properties

NumberOfInstances

110393

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

AutoCorrelation

0.39

Average class difference between consecutive instances.

CfsSubsetEval_DecisionStumpAUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpErrRate

0.33

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpKappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesAUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesErrRate

0.33

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesKappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NErrRate

0.33

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NKappa

0.46

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

ClassEntropy

1.87

Entropy of the target attribute values.

DecisionStumpAUC

0.6

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpErrRate

0.53

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpKappa

0.06

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Dimensionality

Number of attributes divided by the number of instances.

EquivalentNumberOfAtts

148.97

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

J48.00001.AUC

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.ErrRate

0.19

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.Kappa

0.71

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.0001.AUC

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.ErrRate

0.19

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.Kappa

0.71

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.001.AUC

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.ErrRate

0.19

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.Kappa

0.71

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MajorityClassPercentage

46.82

Percentage of instances belonging to the most frequent class.

MajorityClassSize

51682

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

0.72

Maximum entropy among attributes.

MaxKurtosisOfNumericAtts

14.48

Maximum kurtosis among attributes of the numeric type.

MaxMeansOfNumericAtts

2957.98

Maximum of means among attributes of the numeric type.

MaxMutualInformation

0.08

Maximum mutual information between the nominal attributes and the target attribute.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MaxSkewnessOfNumericAtts

4.06

Maximum skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

1558.39

Maximum standard deviation of attributes of the numeric type.

MeanAttributeEntropy

0.14

Average entropy of the attributes.

MeanKurtosisOfNumericAtts

2.35

Mean kurtosis among attributes of the numeric type.

MeanMeansOfNumericAtts

596.3

Mean of means among attributes of the numeric type.

MeanMutualInformation

0.01

Average mutual information between the nominal attributes and the target attribute.

MeanNoiseToSignalRatio

9.98

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

MeanNominalAttDistinctValues

2.12

Average number of distinct values among the attributes of the nominal type.

MeanSkewnessOfNumericAtts

0.77

Mean skewness among attributes of the numeric type.

MeanStdDevOfNumericAtts

259.78

Mean standard deviation of attributes of the numeric type.

MinAttributeEntropy

Minimal entropy among attributes.

MinKurtosisOfNumericAtts

-1.95

Minimum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

0.05

Minimum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-1.2

Minimum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.22

Minimum standard deviation of attributes of the numeric type.

MinorityClassPercentage

1.21

Percentage of instances belonging to the least frequent class.

MinorityClassSize

1339

Number of instances belonging to the least frequent class.

NaiveBayesAUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesErrRate

0.38

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesKappa

0.43

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NumberOfBinaryFeatures

Number of binary attributes.

PercentageOfBinaryFeatures

72.73

Percentage of binary attributes.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

PercentageOfMissingValues

Percentage of missing values.

PercentageOfNumericFeatures

25.45

Percentage of numeric attributes.

PercentageOfSymbolicFeatures

74.55

Percentage of nominal attributes.

Quartile1AttributeEntropy

0.02

First quartile of entropy among attributes.

Quartile1KurtosisOfNumericAtts

-0.59

First quartile of kurtosis among attributes of the numeric type.

Quartile1MeansOfNumericAtts

0.44

First quartile of means among attributes of the numeric type.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Quartile1SkewnessOfNumericAtts

-0.41

First quartile of skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

0.5

First quartile of standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

0.06

Second quartile (Median) of entropy among attributes.

Quartile2KurtosisOfNumericAtts

1.03

Second quartile (Median) of kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

148.95

Second quartile (Median) of means among attributes of the numeric type.

Quartile2MutualInformation

0.01

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.56

Second quartile (Median) of skewness among attributes of the numeric type.

Quartile2StdDevOfNumericAtts

32.56

Second quartile (Median) of standard deviation of attributes of the numeric type.

Quartile3AttributeEntropy

0.22

Third quartile of entropy among attributes.

Quartile3KurtosisOfNumericAtts

2.71

Third quartile of kurtosis among attributes of the numeric type.

Quartile3MeansOfNumericAtts

697.14

Third quartile of means among attributes of the numeric type.

Quartile3MutualInformation

0.02

Third quartile of mutual information between the nominal attributes and the target attribute.

Quartile3SkewnessOfNumericAtts

1.4

Third quartile of skewness among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

229.24

Third quartile of standard deviation of attributes of the numeric type.

REPTreeDepth1AUC

0.88

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1ErrRate

0.22

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1Kappa

0.65

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth2AUC

0.88

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2ErrRate

0.22

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2Kappa

0.65

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

0.88

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3ErrRate

0.22

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3Kappa

0.65

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

RandomTreeDepth1AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1ErrRate

0.25

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1Kappa

0.61

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth2AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2ErrRate

0.25

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2Kappa

0.61

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth3AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3ErrRate

0.25

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3Kappa

0.61

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

StdvNominalAttDistinctValues

0.78

Standard deviation of the number of distinct values among attributes of the nominal type.

kNN1NAUC

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NErrRate

0.19

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NKappa

0.71

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

Show all 107 properties

25 tasks

Supervised Classification on covertype

159 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on covertype

2 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on covertype

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on covertype

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class

Learning Curve on covertype

7 runs - estimation_procedure: 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on covertype

48 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Clustering on covertype

0 runs

Clustering on covertype