DEVELOPMENT... OpenML
Data
auto93

auto93

active ARFF Publicly available Visibility: public Uploaded 03-10-2014 by unknown
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: 1993 New Car Data TYPE: Sample SIZE: 93 observations, 26 variables DESCRIPTIVE ABSTRACT: Specifications are given for 93 new car models for the 1993 year. Several measures are given to evaluate price, mpg ratings, engine size, body size, and features. SOURCES: _Consumer Reports: The 1993 Cars - Annual Auto Issue_ (April 1993), Yonkers, NY: Consumers Union. _PACE New Car & Truck 1993 Buying Guide_ (1993), Milwaukee, WI: Pace Publications Inc. VARIABLE DESCRIPTIONS: Line 1 Columns 1 - 14 Manufacturer 15 - 29 Model 30 - 36 Type Small, Sporty, Compact, Midsize, Large - as defined in the _Consumer Reports_ article 38 - 41 Minimum Price (in $1,000) - Price for basic version of this model 43 - 46 Midrange Price (in $1,000) - Average of Min and Max prices 48 - 51 Maximum Price (in $1,000) - Price for a premium version 53 - 54 City MPG (miles per gallon by EPA rating) 56 - 57 Highway MPG 59 - 59 Air Bags standard 0 = none, 1 = driver only, 2 = driver & passenger 61 - 61 Drive train type 0 = rear wheel drive 1 = front wheel drive 2 = all wheel drive 63 - 63 Number of cylinders 65 - 67 Engine size (liters) 69 - 71 Horsepower (maximum) 73 - 76 RPM (revs per minute at maximum horsepower) Line 2 Columns 1 - 4 Engine revolutions per mile (in highest gear) 6 - 6 Manual transmission available 0 = No, 1 = Yes 8 - 11 Fuel tank capacity (gallons) 13 - 13 Passenger capacity (persons) 15 - 17 Length (inches) 19 - 21 Wheelbase (inches) 23 - 24 Width (inches) 26 - 27 U-turn space (feet) 29 - 32 Rear seat room (inches) 34 - 35 Luggage capacity (cu. ft.) 37 - 40 Weight (pounds) 42 - 42 Domestic? 0 = non-U.S. manufacturer, 1 = U.S. manufacturer Values are aligned and delimited by blanks. Missing values are denoted with *. There are two data lines for each case. SPECIAL NOTES: The only missing values are for CYLINDERS in the rotary engine Mazda RX-7, REAR SEAT room for the two-seaters (Corvette and RX-7), and LUGGAGE capacity for the vans and two-seaters. WEIGHT is taken from the _Consumer Reports_ data and includes a full fuel tank, automatic transmission (if available), and air conditioning. STORY BEHIND THE DATA: Cars were selected at random from among 1993 passenger car models that were listed in both the _Consumer Reports_ issue and the _PACE Buying Guide_. Pickup trucks and Sport/Utility vehicles were eliminated due to incomplete information in the _Consumer Reports_ source. Duplicate models (e.g., Dodge Shadow and Plymouth Sundance) were listed at most once. A similar dataset for 1989 model cars appeared as one of the sample datasets shipped with the _Student Edition of Execustat_ (PWS-KENT 1990). Further description can be found in the "Datasets and Stories" article "1993 New Car Data" in the _Journal of Statistics Education_ (Lock 1993). Send the message send jse/v1n1/datasets.lock to the address archive@jse.stat.ncsu.edu PEDAGOGICAL NOTES: This is a multi-purpose dataset that can be used at many points in an introductory course. It includes many good numeric variables and several options for dividing the cars up into groups. Students tend to be familiar with most of the variables (and specific car models). They can anticipate and pose explanations for many of the relationships to be found in the data, although some surprises may be encountered. One can easily find examples of pairs of variables that demonstrate strong or weak, positive or negative associations. PRICE and MPG variables tend to be popular choices as "dependent" variables. Basic graphs will often reveal unusual data values (like the price for a Mercedes-Benz). REFERENCES: Lock, R. H. (1993), "1993 New Car Data," _Journal of Statistics Education_, 1, No. 1. _Student Edition of Execustat_ (1990), Boston, MA: PWS-KENT Publishing Co. SUBMITTED BY: Robin H. Lock Mathematics Department St. Lawrence University Canton, NY 13617 (315) 379-5960 rlock@stlawu.bitnet

23 features

class (target)numeric81 unique values
0 missing
Manual_transmission_availablenominal2 unique values
0 missing
Domesticnominal2 unique values
0 missing
Weightnumeric81 unique values
0 missing
Luggage_capacitynumeric16 unique values
11 missing
Rear_seat_roomnumeric24 unique values
2 missing
U-turn_spacenumeric14 unique values
0 missing
Widthnumeric16 unique values
0 missing
Wheelbasenumeric27 unique values
0 missing
Lengthnumeric51 unique values
0 missing
Passenger_capacitynumeric6 unique values
0 missing
Fuel_tank_capacitynumeric38 unique values
0 missing
Manufacturernominal31 unique values
0 missing
Engine_revolutions_per_milenumeric78 unique values
0 missing
RPMnumeric24 unique values
0 missing
Horsepowernumeric57 unique values
0 missing
Engine_sizenumeric26 unique values
0 missing
Number_of_cylindersnumeric5 unique values
1 missing
Drive_train_typenominal3 unique values
0 missing
Air_Bags_standardnominal3 unique values
0 missing
Highway_MPGnumeric22 unique values
0 missing
City_MPGnumeric21 unique values
0 missing
Typenominal6 unique values
0 missing

107 properties

93
Number of instances (rows) of the dataset.
23
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
14
Number of missing values in the dataset.
11
Number of instances with at least one value missing.
17
Number of numeric attributes.
6
Number of nominal attributes.
-6.5
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Entropy of the target attribute values.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.25
Number of attributes divided by the number of instances.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Maximum entropy among attributes.
4
Maximum kurtosis among attributes of the numeric type.
5280.65
Maximum of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
31
The maximum number of distinct values among attributes of the nominal type.
1.7
Maximum skewness among attributes of the numeric type.
596.73
Maximum standard deviation of attributes of the numeric type.
Average entropy of the attributes.
0.67
Mean kurtosis among attributes of the numeric type.
668.65
Mean of means among attributes of the numeric type.
Average mutual information between the nominal attributes and the target attribute.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
7.83
Average number of distinct values among the attributes of the nominal type.
0.45
Mean skewness among attributes of the numeric type.
105.72
Mean standard deviation of attributes of the numeric type.
Minimal entropy among attributes.
-0.86
Minimum kurtosis among attributes of the numeric type.
2.67
Minimum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
-0.26
Minimum skewness among attributes of the numeric type.
1.04
Minimum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
2
Number of binary attributes.
8.7
Percentage of binary attributes.
11.83
Percentage of instances having missing values.
0.65
Percentage of missing values.
73.91
Percentage of numeric attributes.
26.09
Percentage of nominal attributes.
First quartile of entropy among attributes.
-0.33
First quartile of kurtosis among attributes of the numeric type.
15.28
First quartile of means among attributes of the numeric type.
First quartile of mutual information between the nominal attributes and the target attribute.
-0.01
First quartile of skewness among attributes of the numeric type.
2.99
First quartile of standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.38
Second quartile (Median) of kurtosis among attributes of the numeric type.
29.09
Second quartile (Median) of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.23
Second quartile (Median) of skewness among attributes of the numeric type.
5.33
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
1.02
Third quartile of kurtosis among attributes of the numeric type.
163.52
Third quartile of means among attributes of the numeric type.
Third quartile of mutual information between the nominal attributes and the target attribute.
0.91
Third quartile of skewness among attributes of the numeric type.
33.49
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
11.44
Standard deviation of the number of distinct values among attributes of the nominal type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

18 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: Custom 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: Test on Training Data - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task