DEVELOPMENT... OpenML
Data
micro-mass_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

micro-mass_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset micro-mass (1515) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal20 unique values
0 missing
V49numeric79 unique values
0 missing
V50numeric50 unique values
0 missing
V73numeric49 unique values
0 missing
V97numeric44 unique values
0 missing
V123numeric2 unique values
0 missing
V129numeric1 unique values
0 missing
V130numeric69 unique values
0 missing
V133numeric14 unique values
0 missing
V138numeric92 unique values
0 missing
V223numeric1 unique values
0 missing
V228numeric28 unique values
0 missing
V236numeric1 unique values
0 missing
V250numeric80 unique values
0 missing
V255numeric1 unique values
0 missing
V266numeric70 unique values
0 missing
V268numeric13 unique values
0 missing
V276numeric24 unique values
0 missing
V330numeric92 unique values
0 missing
V343numeric83 unique values
0 missing
V391numeric3 unique values
0 missing
V406numeric141 unique values
0 missing
V413numeric1 unique values
0 missing
V422numeric130 unique values
0 missing
V430numeric207 unique values
0 missing
V462numeric32 unique values
0 missing
V470numeric43 unique values
0 missing
V476numeric85 unique values
0 missing
V496numeric146 unique values
0 missing
V508numeric53 unique values
0 missing
V510numeric96 unique values
0 missing
V521numeric15 unique values
0 missing
V527numeric1 unique values
0 missing
V551numeric1 unique values
0 missing
V555numeric1 unique values
0 missing
V557numeric24 unique values
0 missing
V562numeric11 unique values
0 missing
V566numeric58 unique values
0 missing
V571numeric1 unique values
0 missing
V580numeric24 unique values
0 missing
V585numeric50 unique values
0 missing
V590numeric75 unique values
0 missing
V592numeric3 unique values
0 missing
V612numeric1 unique values
0 missing
V621numeric57 unique values
0 missing
V624numeric23 unique values
0 missing
V630numeric15 unique values
0 missing
V639numeric77 unique values
0 missing
V642numeric70 unique values
0 missing
V646numeric87 unique values
0 missing
V649numeric153 unique values
0 missing
V664numeric1 unique values
0 missing
V667numeric15 unique values
0 missing
V701numeric88 unique values
0 missing
V707numeric23 unique values
0 missing
V740numeric42 unique values
0 missing
V741numeric64 unique values
0 missing
V742numeric30 unique values
0 missing
V743numeric12 unique values
0 missing
V764numeric1 unique values
0 missing
V765numeric122 unique values
0 missing
V774numeric73 unique values
0 missing
V802numeric53 unique values
0 missing
V803numeric13 unique values
0 missing
V806numeric15 unique values
0 missing
V817numeric72 unique values
0 missing
V828numeric8 unique values
0 missing
V845numeric1 unique values
0 missing
V846numeric41 unique values
0 missing
V855numeric49 unique values
0 missing
V864numeric81 unique values
0 missing
V877numeric108 unique values
0 missing
V893numeric58 unique values
0 missing
V898numeric1 unique values
0 missing
V900numeric1 unique values
0 missing
V950numeric30 unique values
0 missing
V977numeric111 unique values
0 missing
V982numeric4 unique values
0 missing
V990numeric9 unique values
0 missing
V1013numeric17 unique values
0 missing
V1016numeric31 unique values
0 missing
V1047numeric1 unique values
0 missing
V1052numeric2 unique values
0 missing
V1057numeric50 unique values
0 missing
V1073numeric22 unique values
0 missing
V1082numeric61 unique values
0 missing
V1090numeric1 unique values
0 missing
V1097numeric64 unique values
0 missing
V1114numeric128 unique values
0 missing
V1123numeric1 unique values
0 missing
V1131numeric1 unique values
0 missing
V1135numeric1 unique values
0 missing
V1140numeric79 unique values
0 missing
V1151numeric90 unique values
0 missing
V1171numeric28 unique values
0 missing
V1185numeric91 unique values
0 missing
V1197numeric25 unique values
0 missing
V1255numeric42 unique values
0 missing
V1272numeric1 unique values
0 missing
V1273numeric71 unique values
0 missing
V1284numeric20 unique values
0 missing

19 properties

571
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
20
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.7
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
11
Number of instances belonging to the least frequent class.
1.93
Percentage of instances belonging to the least frequent class.
60
Number of instances belonging to the most frequent class.
10.51
Percentage of instances belonging to the most frequent class.
0.18
Number of attributes divided by the number of instances.

0 tasks

Define a new task