DEVELOPMENT... OpenML
Data
covertype

covertype

active ARFF Publicly available Visibility: public Uploaded 23-04-2014 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Covertype Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types). This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value. As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4). The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition. Attribute Information: Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. ``` Name / Data Type / Measurement / Description Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer solstice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation ``` Relevant Papers: - Blackard, Jock A. and Denis J. Dean. 2000. "Comparative Accuracies of Artificial Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Computers and Electronics in Agriculture 24(3):131-151. - Blackard, Jock A. and Denis J. Dean. 1998. "Comparative Accuracies of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Second Southern Forestry GIS Conference. University of Georgia. Athens, GA. Pages 189-199. - Blackard, Jock A. 1998. "Comparison of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types." Ph.D. dissertation. Department of Forest Sciences. Colorado State University. Fort Collins, Colorado. 165 pages.

55 features

class (target)nominal7 unique values
0 missing
soil_type_15nominal2 unique values
0 missing
soil_type_14nominal2 unique values
0 missing
soil_type_16nominal2 unique values
0 missing
soil_type_17nominal2 unique values
0 missing
soil_type_18nominal2 unique values
0 missing
soil_type_19nominal2 unique values
0 missing
soil_type_20nominal2 unique values
0 missing
soil_type_21nominal2 unique values
0 missing
soil_type_22nominal2 unique values
0 missing
soil_type_23nominal2 unique values
0 missing
soil_type_24nominal2 unique values
0 missing
soil_type_25nominal2 unique values
0 missing
soil_type_26nominal2 unique values
0 missing
soil_type_27nominal2 unique values
0 missing
soil_type_28nominal2 unique values
0 missing
soil_type_29nominal2 unique values
0 missing
soil_type_30nominal2 unique values
0 missing
soil_type_31nominal2 unique values
0 missing
soil_type_32nominal2 unique values
0 missing
soil_type_33nominal2 unique values
0 missing
soil_type_34nominal2 unique values
0 missing
soil_type_35nominal2 unique values
0 missing
soil_type_36nominal2 unique values
0 missing
soil_type_37nominal2 unique values
0 missing
soil_type_38nominal2 unique values
0 missing
soil_type_39nominal2 unique values
0 missing
soil_type_40nominal2 unique values
0 missing
soil_type_1nominal2 unique values
0 missing
aspectnumeric361 unique values
0 missing
slopenumeric57 unique values
0 missing
horizontal_distance_to_hydrologynumeric487 unique values
0 missing
Vertical_Distance_To_Hydrologynumeric562 unique values
0 missing
Horizontal_Distance_To_Roadwaysnumeric5343 unique values
0 missing
Hillshade_9amnumeric195 unique values
0 missing
Hillshade_Noonnumeric155 unique values
0 missing
Hillshade_3pmnumeric253 unique values
0 missing
Horizontal_Distance_To_Fire_Pointsnumeric5276 unique values
0 missing
wilderness_area1numeric2 unique values
0 missing
wilderness_area2numeric2 unique values
0 missing
wilderness_area3numeric2 unique values
0 missing
wilderness_area4numeric2 unique values
0 missing
elevationnumeric1769 unique values
0 missing
soil_type_2nominal2 unique values
0 missing
soil_type_3nominal2 unique values
0 missing
soil_type_4nominal2 unique values
0 missing
soil_type_5nominal2 unique values
0 missing
soil_type_6nominal2 unique values
0 missing
soil_type_7nominal2 unique values
0 missing
soil_type_8nominal2 unique values
0 missing
soil_type_9nominal2 unique values
0 missing
soil_type_10nominal2 unique values
0 missing
soil_type_11nominal2 unique values
0 missing
soil_type_12nominal2 unique values
0 missing
soil_type_13nominal2 unique values
0 missing

107 properties

110393
Number of instances (rows) of the dataset.
55
Number of attributes (columns) of the dataset.
7
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
14
Number of numeric attributes.
41
Number of nominal attributes.
0.39
Average class difference between consecutive instances.
0.8
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.33
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.46
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.8
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.33
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.46
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.8
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.33
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.46
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
1.87
Entropy of the target attribute values.
0.6
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
0.53
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
0.06
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0
Number of attributes divided by the number of instances.
148.97
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.19
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.71
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.19
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.71
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.19
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.71
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
46.82
Percentage of instances belonging to the most frequent class.
51682
Number of instances belonging to the most frequent class.
0.72
Maximum entropy among attributes.
14.48
Maximum kurtosis among attributes of the numeric type.
2957.98
Maximum of means among attributes of the numeric type.
0.08
Maximum mutual information between the nominal attributes and the target attribute.
7
The maximum number of distinct values among attributes of the nominal type.
4.06
Maximum skewness among attributes of the numeric type.
1558.39
Maximum standard deviation of attributes of the numeric type.
0.14
Average entropy of the attributes.
2.35
Mean kurtosis among attributes of the numeric type.
596.3
Mean of means among attributes of the numeric type.
0.01
Average mutual information between the nominal attributes and the target attribute.
9.98
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
2.12
Average number of distinct values among the attributes of the nominal type.
0.77
Mean skewness among attributes of the numeric type.
259.78
Mean standard deviation of attributes of the numeric type.
0
Minimal entropy among attributes.
-1.95
Minimum kurtosis among attributes of the numeric type.
0.05
Minimum of means among attributes of the numeric type.
0
Minimal mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
-1.2
Minimum skewness among attributes of the numeric type.
0.22
Minimum standard deviation of attributes of the numeric type.
1.21
Percentage of instances belonging to the least frequent class.
1339
Number of instances belonging to the least frequent class.
0.8
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.38
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.43
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
40
Number of binary attributes.
72.73
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
25.45
Percentage of numeric attributes.
74.55
Percentage of nominal attributes.
0.02
First quartile of entropy among attributes.
-0.59
First quartile of kurtosis among attributes of the numeric type.
0.44
First quartile of means among attributes of the numeric type.
0
First quartile of mutual information between the nominal attributes and the target attribute.
-0.41
First quartile of skewness among attributes of the numeric type.
0.5
First quartile of standard deviation of attributes of the numeric type.
0.06
Second quartile (Median) of entropy among attributes.
1.03
Second quartile (Median) of kurtosis among attributes of the numeric type.
148.95
Second quartile (Median) of means among attributes of the numeric type.
0.01
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.56
Second quartile (Median) of skewness among attributes of the numeric type.
32.56
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.22
Third quartile of entropy among attributes.
2.71
Third quartile of kurtosis among attributes of the numeric type.
697.14
Third quartile of means among attributes of the numeric type.
0.02
Third quartile of mutual information between the nominal attributes and the target attribute.
1.4
Third quartile of skewness among attributes of the numeric type.
229.24
Third quartile of standard deviation of attributes of the numeric type.
0.88
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.22
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.88
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.22
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.88
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.22
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.65
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.25
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.61
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.25
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.61
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.81
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.25
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.61
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.78
Standard deviation of the number of distinct values among attributes of the nominal type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.19
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
0.71
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

25 tasks

159 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
2 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class
7 runs - estimation_procedure: 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
48 runs - estimation_procedure: Interleaved Test then Train - target_feature: class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task