OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

ipums_la_97-small

active ARFF Publicly available Visibility: public Uploaded 27-09-2014 by Felicia West
0 likes downloaded by 10 people , 11 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: IPUMS (ipums@hist.umn.edu) Donor: Stephen Bay (sbay@ics.uci.edu) Source: [UCI](https://archive.ics.uci.edu/ml/datasets/IPUMS+Census+Database) - 1999 Please cite: IPUMS Database This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be consistent across years. The original source for this data set is the IPUMS project (RugglesSobek, 1997). The IPUMS project is a large collection of federal census data which has standardized coding schemes to make comparisons across time easy. The data is an unweighted 1 in 100 sample of responses from the Los Angeles -- Long Beach area for the years 1970, 1980, and 1990. The household and individual records were flattened into a single table and we used all variables that were available for all three years. When there was more than one version of a variable, such as for race, we used the most general. For occupation and industry we used the 1950 basis. Note that PUMS data is based on cluster samples, i.e. samples are made of households or dwellings from which there may be multiple individuals. Individuals from the same household are no longer independent. Ruggles (1995) considers this issue further and discusses its effect (along with the effects of stratification) on standard errors. The variable schltype appears to have different coding values across the years 1970, 1980, and 1990. There are two versions of this data set. The small data set contains a 1 in 1000 sample of the Los Angeles and Long Beach area. It was formed by sampling from the large data set. The large data set contains a 1 in 100 sample of the Los Angeles and Long Beach area. Past Usage S. D. Bay and M. J. Pazzani. (1999) "Detecting Group Differences: Mining Contrast Sets". submitted. Copyright Information All persons are granted a limited license to use and distribute this documentation and the accompanying data, subject to the following conditions: * No fee may be charged for use or distribution. * Publications and research reports based on the database must cite it appropriately. The citation should include the following: Steven Ruggles and Matthew Sobek et. al. Integrated Public Use Microdata Series: Version 2.0 Minneapolis: Historical Census Projects, University of Minnesota, 1997 If possible, citations should also include the URL for the IPUMS site: http://www.ipums.umn.edu/. In addition, we request that users send us a copy of any publications, research reports, or educational material making use of the data or documentation. Send all electronic material to ipums@hist.umn.edu References 1. http://www.ipums.umn.edu/ 2. mailto:ipums@hist.umn.edu 3. http://www.ics.uci.edu/~sbay 4. mailto:sbay@ics.uci.edu 5. http://www.ipums.umn.edu/ 6. mailto:ipums@hist.umn.edu 7. http://www.ipums.umn.edu/ 8. http://www.census.gov/ 9. http://kdd.ics.uci.edu/ 10. http://www.ics.uci.edu/ 11. http://www.uci.edu/

61 features

movedin (target)	nominal	8 unique values 0 missing
yrlastwk	nominal	7 unique values 4618 missing
marst	nominal	6 unique values 0 missing
chborn	nominal	13 unique values 4283 missing
bplg	nominal	103 unique values 0 missing
school	nominal	2 unique values 344 missing
educrec	nominal	9 unique values 344 missing
schltype	nominal	4 unique values 344 missing
empstatg	nominal	3 unique values 1772 missing
labforce	nominal	2 unique values 1772 missing
occ1950	nominal	191 unique values 3040 missing
occscore	nominal	45 unique values 0 missing
sei	nominal	80 unique values 0 missing
ind1950	nominal	133 unique values 3040 missing
classwkg	nominal	2 unique values 3022 missing
wkswork2	nominal	6 unique values 3625 missing
hrswork2	nominal	8 unique values 4309 missing
raceg	nominal	7 unique values 0 missing
workedyr	nominal	2 unique values 1772 missing
inctot	nominal	288 unique values 0 missing
incwage	nominal	216 unique values 0 missing
incbus	nominal	107 unique values 0 missing
incfarm	nominal	18 unique values 0 missing
incss	nominal	40 unique values 0 missing
incwelfr	nominal	38 unique values 0 missing
incother	nominal	101 unique values 0 missing
poverty	nominal	488 unique values 0 missing
migrat5g	nominal	7 unique values 576 missing
migplac5	nominal	98 unique values 6276 missing
vetstat	nominal	2 unique values 4542 missing
tranwork	nominal	9 unique values 4275 missing
poploc	nominal	8 unique values 0 missing
gq	nominal	3 unique values 0 missing
gqtypeg	nominal	8 unique values 0 missing
farm	nominal	2 unique values 0 missing
ownershg	nominal	2 unique values 135 missing
value	nominal	12 unique values 0 missing
rent	nominal	154 unique values 0 missing
ftotinc	nominal	409 unique values 0 missing
nfams	nominal	5 unique values 0 missing
ncouples	nominal	4 unique values 0 missing
nmothers	nominal	5 unique values 0 missing
nfathers	nominal	3 unique values 0 missing
momloc	nominal	12 unique values 0 missing
stepmom	nominal	4 unique values 0 missing
momrule	nominal	6 unique values 0 missing
year	nominal	1 unique values 0 missing
steppop	nominal	3 unique values 0 missing
poprule	nominal	5 unique values 0 missing
sploc	nominal	8 unique values 0 missing
sprule	nominal	5 unique values 0 missing
famsize	nominal	15 unique values 0 missing
nchild	nominal	10 unique values 0 missing
nchlt5	nominal	6 unique values 0 missing
famunit	nominal	5 unique values 0 missing
eldch	nominal	66 unique values 0 missing
yngch	nominal	65 unique values 0 missing
nsibs	nominal	10 unique values 0 missing
relateg	nominal	13 unique values 0 missing
age	nominal	97 unique values 0 missing
sex	nominal	2 unique values 0 missing

Show all 61 features

107 properties

NumberOfInstances

7019

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

48089

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

7019

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

AutoCorrelation

0.17

Average class difference between consecutive instances.

CfsSubsetEval_DecisionStumpAUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_DecisionStumpKappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesAUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_NaiveBayesKappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NKappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

ClassEntropy

2.76

Entropy of the target attribute values.

DecisionStumpAUC

0.76

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

DecisionStumpKappa

0.28

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Dimensionality

0.01

Number of attributes divided by the number of instances.

EquivalentNumberOfAtts

28.23

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

J48.00001.AUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.ErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.00001.Kappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

J48.0001.AUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.ErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.0001.Kappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

J48.001.AUC

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.ErrRate

0.55

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

J48.001.Kappa

0.29

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MajorityClassPercentage

27.61

Percentage of instances belonging to the most frequent class.

MajorityClassSize

1938

Number of instances belonging to the most frequent class.

MaxAttributeEntropy

7.92

Maximum entropy among attributes.

MaxKurtosisOfNumericAtts

Maximum kurtosis among attributes of the numeric type.

MaxMeansOfNumericAtts

Maximum of means among attributes of the numeric type.

MaxMutualInformation

0.81

Maximum mutual information between the nominal attributes and the target attribute.

MaxNominalAttDistinctValues

488

The maximum number of distinct values among attributes of the nominal type.

MaxSkewnessOfNumericAtts

Maximum skewness among attributes of the numeric type.

MaxStdDevOfNumericAtts

Maximum standard deviation of attributes of the numeric type.

MeanAttributeEntropy

1.92

Average entropy of the attributes.

MeanKurtosisOfNumericAtts

Mean kurtosis among attributes of the numeric type.

MeanMeansOfNumericAtts

Mean of means among attributes of the numeric type.

MeanMutualInformation

0.1

Average mutual information between the nominal attributes and the target attribute.

MeanNoiseToSignalRatio

18.65

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

MeanNominalAttDistinctValues

49.03

Average number of distinct values among the attributes of the nominal type.

MeanSkewnessOfNumericAtts

Mean skewness among attributes of the numeric type.

MeanStdDevOfNumericAtts

Mean standard deviation of attributes of the numeric type.

MinAttributeEntropy

-0

Minimal entropy among attributes.

MinKurtosisOfNumericAtts

Minimum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

Minimum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

Minimum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

Minimum standard deviation of attributes of the numeric type.

MinorityClassPercentage

3.68

Percentage of instances belonging to the least frequent class.

MinorityClassSize

258

Number of instances belonging to the least frequent class.

NaiveBayesAUC

0.74

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesErrRate

0.68

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NaiveBayesKappa

0.21

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

NumberOfBinaryFeatures

Number of binary attributes.

PercentageOfBinaryFeatures

13.11

Percentage of binary attributes.

PercentageOfInstancesWithMissingValues

100

Percentage of instances having missing values.

PercentageOfMissingValues

11.23

Percentage of missing values.

PercentageOfNumericFeatures

Percentage of numeric attributes.

PercentageOfSymbolicFeatures

100

Percentage of nominal attributes.

Quartile1AttributeEntropy

0.92

First quartile of entropy among attributes.

Quartile1KurtosisOfNumericAtts

First quartile of kurtosis among attributes of the numeric type.

Quartile1MeansOfNumericAtts

First quartile of means among attributes of the numeric type.

Quartile1MutualInformation

0.02

First quartile of mutual information between the nominal attributes and the target attribute.

Quartile1SkewnessOfNumericAtts

First quartile of skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

First quartile of standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

1.3

Second quartile (Median) of entropy among attributes.

Quartile2KurtosisOfNumericAtts

Second quartile (Median) of kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

Second quartile (Median) of means among attributes of the numeric type.

Quartile2MutualInformation

0.05

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

Second quartile (Median) of skewness among attributes of the numeric type.

Quartile2StdDevOfNumericAtts

Second quartile (Median) of standard deviation of attributes of the numeric type.

Quartile3AttributeEntropy

2.43

Third quartile of entropy among attributes.

Quartile3KurtosisOfNumericAtts

Third quartile of kurtosis among attributes of the numeric type.

Quartile3MeansOfNumericAtts

Third quartile of means among attributes of the numeric type.

Quartile3MutualInformation

0.13

Third quartile of mutual information between the nominal attributes and the target attribute.

Quartile3SkewnessOfNumericAtts

Third quartile of skewness among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

Third quartile of standard deviation of attributes of the numeric type.

REPTreeDepth1AUC

0.5

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

REPTreeDepth2AUC

0.5

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

0.5

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

REPTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

RandomTreeDepth1AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth1Kappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

RandomTreeDepth2AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth2Kappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

RandomTreeDepth3AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3ErrRate

0.72

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

RandomTreeDepth3Kappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

StdvNominalAttDistinctValues

94.87

Standard deviation of the number of distinct values among attributes of the nominal type.

kNN1NAUC

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NErrRate

0.66

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

kNN1NKappa

0.19

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

Show all 107 properties

14 tasks

Supervised Classification on ipums_la_97-small

269 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: movedin

Supervised Classification on ipums_la_97-small

165 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: movedin

Supervised Data Stream Classification on ipums_la_97-small

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: movedin

Clustering on ipums_la_97-small

0 runs

Clustering on ipums_la_97-small