DEVELOPMENT... OpenML
Data
micro-mass_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

micro-mass_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset micro-mass (1515) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal20 unique values
0 missing
V7numeric93 unique values
0 missing
V11numeric55 unique values
0 missing
V27numeric1 unique values
0 missing
V35numeric84 unique values
0 missing
V41numeric35 unique values
0 missing
V62numeric63 unique values
0 missing
V68numeric53 unique values
0 missing
V76numeric75 unique values
0 missing
V92numeric63 unique values
0 missing
V98numeric19 unique values
0 missing
V102numeric1 unique values
0 missing
V108numeric8 unique values
0 missing
V152numeric98 unique values
0 missing
V168numeric2 unique values
0 missing
V194numeric27 unique values
0 missing
V213numeric1 unique values
0 missing
V259numeric1 unique values
0 missing
V289numeric68 unique values
0 missing
V299numeric35 unique values
0 missing
V307numeric157 unique values
0 missing
V316numeric175 unique values
0 missing
V321numeric5 unique values
0 missing
V335numeric7 unique values
0 missing
V364numeric1 unique values
0 missing
V388numeric17 unique values
0 missing
V405numeric8 unique values
0 missing
V406numeric141 unique values
0 missing
V416numeric90 unique values
0 missing
V428numeric1 unique values
0 missing
V433numeric1 unique values
0 missing
V450numeric19 unique values
0 missing
V468numeric118 unique values
0 missing
V469numeric1 unique values
0 missing
V472numeric106 unique values
0 missing
V474numeric73 unique values
0 missing
V481numeric76 unique values
0 missing
V483numeric7 unique values
0 missing
V491numeric51 unique values
0 missing
V496numeric146 unique values
0 missing
V512numeric160 unique values
0 missing
V514numeric84 unique values
0 missing
V526numeric1 unique values
0 missing
V529numeric87 unique values
0 missing
V567numeric79 unique values
0 missing
V581numeric71 unique values
0 missing
V585numeric50 unique values
0 missing
V608numeric121 unique values
0 missing
V638numeric1 unique values
0 missing
V643numeric58 unique values
0 missing
V655numeric30 unique values
0 missing
V657numeric47 unique values
0 missing
V667numeric15 unique values
0 missing
V719numeric294 unique values
0 missing
V720numeric83 unique values
0 missing
V739numeric38 unique values
0 missing
V740numeric42 unique values
0 missing
V751numeric112 unique values
0 missing
V755numeric84 unique values
0 missing
V793numeric27 unique values
0 missing
V804numeric53 unique values
0 missing
V820numeric60 unique values
0 missing
V846numeric41 unique values
0 missing
V853numeric119 unique values
0 missing
V863numeric23 unique values
0 missing
V873numeric6 unique values
0 missing
V880numeric1 unique values
0 missing
V898numeric1 unique values
0 missing
V900numeric1 unique values
0 missing
V908numeric2 unique values
0 missing
V913numeric22 unique values
0 missing
V922numeric19 unique values
0 missing
V938numeric77 unique values
0 missing
V963numeric7 unique values
0 missing
V991numeric22 unique values
0 missing
V994numeric64 unique values
0 missing
V1006numeric3 unique values
0 missing
V1015numeric1 unique values
0 missing
V1022numeric1 unique values
0 missing
V1029numeric29 unique values
0 missing
V1031numeric3 unique values
0 missing
V1040numeric44 unique values
0 missing
V1045numeric92 unique values
0 missing
V1053numeric1 unique values
0 missing
V1063numeric1 unique values
0 missing
V1087numeric43 unique values
0 missing
V1109numeric5 unique values
0 missing
V1114numeric128 unique values
0 missing
V1124numeric69 unique values
0 missing
V1129numeric103 unique values
0 missing
V1156numeric35 unique values
0 missing
V1172numeric21 unique values
0 missing
V1174numeric8 unique values
0 missing
V1209numeric50 unique values
0 missing
V1227numeric22 unique values
0 missing
V1255numeric42 unique values
0 missing
V1257numeric74 unique values
0 missing
V1271numeric1 unique values
0 missing
V1272numeric1 unique values
0 missing
V1274numeric49 unique values
0 missing
V1295numeric37 unique values
0 missing

19 properties

571
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
20
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.7
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
11
Number of instances belonging to the least frequent class.
1.93
Percentage of instances belonging to the least frequent class.
60
Number of instances belonging to the most frequent class.
10.51
Percentage of instances belonging to the most frequent class.
0.18
Number of attributes divided by the number of instances.

0 tasks

Define a new task