DEVELOPMENT... OpenML
Data
micro-mass_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

micro-mass_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset micro-mass (1515) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal20 unique values
0 missing
V10numeric25 unique values
0 missing
V25numeric72 unique values
0 missing
V50numeric50 unique values
0 missing
V69numeric63 unique values
0 missing
V76numeric75 unique values
0 missing
V79numeric60 unique values
0 missing
V88numeric22 unique values
0 missing
V105numeric25 unique values
0 missing
V113numeric9 unique values
0 missing
V143numeric77 unique values
0 missing
V144numeric1 unique values
0 missing
V150numeric104 unique values
0 missing
V162numeric23 unique values
0 missing
V189numeric70 unique values
0 missing
V198numeric41 unique values
0 missing
V246numeric31 unique values
0 missing
V247numeric1 unique values
0 missing
V271numeric101 unique values
0 missing
V284numeric64 unique values
0 missing
V319numeric3 unique values
0 missing
V329numeric18 unique values
0 missing
V341numeric26 unique values
0 missing
V342numeric97 unique values
0 missing
V356numeric1 unique values
0 missing
V360numeric69 unique values
0 missing
V363numeric28 unique values
0 missing
V366numeric2 unique values
0 missing
V398numeric93 unique values
0 missing
V402numeric56 unique values
0 missing
V436numeric4 unique values
0 missing
V452numeric46 unique values
0 missing
V460numeric8 unique values
0 missing
V462numeric32 unique values
0 missing
V464numeric52 unique values
0 missing
V488numeric56 unique values
0 missing
V521numeric15 unique values
0 missing
V527numeric1 unique values
0 missing
V533numeric14 unique values
0 missing
V544numeric68 unique values
0 missing
V547numeric16 unique values
0 missing
V566numeric58 unique values
0 missing
V574numeric117 unique values
0 missing
V576numeric1 unique values
0 missing
V592numeric3 unique values
0 missing
V608numeric121 unique values
0 missing
V610numeric56 unique values
0 missing
V611numeric2 unique values
0 missing
V617numeric18 unique values
0 missing
V638numeric1 unique values
0 missing
V646numeric87 unique values
0 missing
V648numeric55 unique values
0 missing
V661numeric23 unique values
0 missing
V665numeric91 unique values
0 missing
V694numeric132 unique values
0 missing
V712numeric89 unique values
0 missing
V736numeric14 unique values
0 missing
V748numeric5 unique values
0 missing
V763numeric117 unique values
0 missing
V774numeric73 unique values
0 missing
V800numeric40 unique values
0 missing
V806numeric15 unique values
0 missing
V840numeric1 unique values
0 missing
V854numeric1 unique values
0 missing
V861numeric38 unique values
0 missing
V872numeric1 unique values
0 missing
V889numeric89 unique values
0 missing
V894numeric65 unique values
0 missing
V898numeric1 unique values
0 missing
V913numeric22 unique values
0 missing
V917numeric70 unique values
0 missing
V948numeric7 unique values
0 missing
V957numeric73 unique values
0 missing
V963numeric7 unique values
0 missing
V965numeric48 unique values
0 missing
V968numeric88 unique values
0 missing
V970numeric21 unique values
0 missing
V1006numeric3 unique values
0 missing
V1015numeric1 unique values
0 missing
V1029numeric29 unique values
0 missing
V1045numeric92 unique values
0 missing
V1062numeric1 unique values
0 missing
V1073numeric22 unique values
0 missing
V1085numeric39 unique values
0 missing
V1089numeric103 unique values
0 missing
V1091numeric6 unique values
0 missing
V1095numeric1 unique values
0 missing
V1102numeric2 unique values
0 missing
V1112numeric3 unique values
0 missing
V1118numeric44 unique values
0 missing
V1133numeric56 unique values
0 missing
V1137numeric1 unique values
0 missing
V1143numeric49 unique values
0 missing
V1178numeric36 unique values
0 missing
V1180numeric65 unique values
0 missing
V1193numeric1 unique values
0 missing
V1197numeric25 unique values
0 missing
V1199numeric2 unique values
0 missing
V1205numeric119 unique values
0 missing
V1245numeric1 unique values
0 missing
V1284numeric20 unique values
0 missing

19 properties

571
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
20
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.7
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
11
Number of instances belonging to the least frequent class.
1.93
Percentage of instances belonging to the least frequent class.
60
Number of instances belonging to the most frequent class.
10.51
Percentage of instances belonging to the most frequent class.
0.18
Number of attributes divided by the number of instances.

0 tasks

Define a new task