DEVELOPMENT... OpenML
Data
micro-mass_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

micro-mass_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset micro-mass (1515) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal20 unique values
0 missing
V2numeric1 unique values
0 missing
V6numeric1 unique values
0 missing
V38numeric1 unique values
0 missing
V53numeric34 unique values
0 missing
V58numeric60 unique values
0 missing
V94numeric1 unique values
0 missing
V113numeric9 unique values
0 missing
V131numeric88 unique values
0 missing
V167numeric101 unique values
0 missing
V171numeric187 unique values
0 missing
V188numeric7 unique values
0 missing
V191numeric28 unique values
0 missing
V209numeric7 unique values
0 missing
V235numeric3 unique values
0 missing
V255numeric1 unique values
0 missing
V259numeric1 unique values
0 missing
V273numeric60 unique values
0 missing
V275numeric47 unique values
0 missing
V285numeric34 unique values
0 missing
V301numeric1 unique values
0 missing
V310numeric56 unique values
0 missing
V315numeric33 unique values
0 missing
V344numeric64 unique values
0 missing
V355numeric1 unique values
0 missing
V356numeric1 unique values
0 missing
V365numeric1 unique values
0 missing
V367numeric7 unique values
0 missing
V373numeric64 unique values
0 missing
V375numeric1 unique values
0 missing
V385numeric94 unique values
0 missing
V388numeric17 unique values
0 missing
V414numeric98 unique values
0 missing
V464numeric52 unique values
0 missing
V488numeric56 unique values
0 missing
V489numeric1 unique values
0 missing
V494numeric4 unique values
0 missing
V503numeric93 unique values
0 missing
V505numeric23 unique values
0 missing
V515numeric21 unique values
0 missing
V517numeric20 unique values
0 missing
V518numeric38 unique values
0 missing
V543numeric1 unique values
0 missing
V580numeric24 unique values
0 missing
V586numeric49 unique values
0 missing
V610numeric56 unique values
0 missing
V614numeric59 unique values
0 missing
V643numeric58 unique values
0 missing
V647numeric3 unique values
0 missing
V676numeric1 unique values
0 missing
V707numeric23 unique values
0 missing
V719numeric294 unique values
0 missing
V725numeric1 unique values
0 missing
V756numeric4 unique values
0 missing
V759numeric53 unique values
0 missing
V787numeric136 unique values
0 missing
V789numeric94 unique values
0 missing
V794numeric21 unique values
0 missing
V802numeric53 unique values
0 missing
V808numeric41 unique values
0 missing
V814numeric64 unique values
0 missing
V821numeric13 unique values
0 missing
V829numeric125 unique values
0 missing
V837numeric96 unique values
0 missing
V846numeric41 unique values
0 missing
V862numeric49 unique values
0 missing
V874numeric11 unique values
0 missing
V890numeric97 unique values
0 missing
V896numeric54 unique values
0 missing
V906numeric66 unique values
0 missing
V914numeric14 unique values
0 missing
V929numeric94 unique values
0 missing
V951numeric38 unique values
0 missing
V953numeric80 unique values
0 missing
V963numeric7 unique values
0 missing
V971numeric71 unique values
0 missing
V1026numeric176 unique values
0 missing
V1035numeric44 unique values
0 missing
V1037numeric64 unique values
0 missing
V1044numeric1 unique values
0 missing
V1057numeric50 unique values
0 missing
V1061numeric84 unique values
0 missing
V1083numeric1 unique values
0 missing
V1094numeric39 unique values
0 missing
V1116numeric1 unique values
0 missing
V1121numeric8 unique values
0 missing
V1143numeric49 unique values
0 missing
V1153numeric37 unique values
0 missing
V1156numeric35 unique values
0 missing
V1159numeric32 unique values
0 missing
V1188numeric23 unique values
0 missing
V1207numeric1 unique values
0 missing
V1213numeric13 unique values
0 missing
V1239numeric207 unique values
0 missing
V1245numeric1 unique values
0 missing
V1248numeric103 unique values
0 missing
V1255numeric42 unique values
0 missing
V1265numeric54 unique values
0 missing
V1273numeric71 unique values
0 missing
V1277numeric4 unique values
0 missing
V1298numeric34 unique values
0 missing

19 properties

571
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
20
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.7
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
11
Number of instances belonging to the least frequent class.
1.93
Percentage of instances belonging to the least frequent class.
60
Number of instances belonging to the most frequent class.
10.51
Percentage of instances belonging to the most frequent class.
0.18
Number of attributes divided by the number of instances.

0 tasks

Define a new task