DEVELOPMENT... OpenML
Data
arcene_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

arcene_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset arcene (41157) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V269numeric3 unique values
0 missing
V347numeric25 unique values
0 missing
V601numeric43 unique values
0 missing
V790numeric47 unique values
0 missing
V801numeric90 unique values
0 missing
V898numeric57 unique values
0 missing
V1151numeric90 unique values
0 missing
V1278numeric37 unique values
0 missing
V1328numeric48 unique values
0 missing
V1385numeric33 unique values
0 missing
V1643numeric36 unique values
0 missing
V1706numeric39 unique values
0 missing
V1731numeric67 unique values
0 missing
V1766numeric16 unique values
0 missing
V1795numeric27 unique values
0 missing
V1906numeric59 unique values
0 missing
V2001numeric87 unique values
0 missing
V2067numeric94 unique values
0 missing
V2166numeric27 unique values
0 missing
V2179numeric50 unique values
0 missing
V2213numeric52 unique values
0 missing
V2229numeric41 unique values
0 missing
V2671numeric39 unique values
0 missing
V2807numeric84 unique values
0 missing
V3022numeric24 unique values
0 missing
V3356numeric98 unique values
0 missing
V3485numeric28 unique values
0 missing
V3615numeric47 unique values
0 missing
V3625numeric23 unique values
0 missing
V3670numeric44 unique values
0 missing
V3673numeric62 unique values
0 missing
V3732numeric70 unique values
0 missing
V3823numeric6 unique values
0 missing
V4060numeric40 unique values
0 missing
V4274numeric61 unique values
0 missing
V4494numeric58 unique values
0 missing
V4594numeric27 unique values
0 missing
V4613numeric29 unique values
0 missing
V4736numeric11 unique values
0 missing
V4750numeric43 unique values
0 missing
V4785numeric79 unique values
0 missing
V4905numeric21 unique values
0 missing
V4913numeric43 unique values
0 missing
V4954numeric71 unique values
0 missing
V4961numeric87 unique values
0 missing
V4968numeric68 unique values
0 missing
V4980numeric75 unique values
0 missing
V5065numeric76 unique values
0 missing
V5186numeric34 unique values
0 missing
V5202numeric15 unique values
0 missing
V5326numeric26 unique values
0 missing
V5374numeric15 unique values
0 missing
V5396numeric17 unique values
0 missing
V5698numeric60 unique values
0 missing
V5738numeric66 unique values
0 missing
V5760numeric29 unique values
0 missing
V5802numeric84 unique values
0 missing
V5857numeric52 unique values
0 missing
V6019numeric41 unique values
0 missing
V6052numeric67 unique values
0 missing
V6216numeric61 unique values
0 missing
V6389numeric39 unique values
0 missing
V6606numeric7 unique values
0 missing
V6621numeric2 unique values
0 missing
V6704numeric85 unique values
0 missing
V6742numeric52 unique values
0 missing
V6914numeric93 unique values
0 missing
V7008numeric24 unique values
0 missing
V7193numeric44 unique values
0 missing
V7278numeric92 unique values
0 missing
V7694numeric67 unique values
0 missing
V7833numeric56 unique values
0 missing
V7951numeric41 unique values
0 missing
V8041numeric59 unique values
0 missing
V8083numeric42 unique values
0 missing
V8516numeric43 unique values
0 missing
V8645numeric84 unique values
0 missing
V8729numeric64 unique values
0 missing
V8801numeric35 unique values
0 missing
V8841numeric1 unique values
0 missing
V8909numeric14 unique values
0 missing
V8952numeric47 unique values
0 missing
V8959numeric31 unique values
0 missing
V9231numeric85 unique values
0 missing
V9235numeric82 unique values
0 missing
V9272numeric21 unique values
0 missing
V9316numeric42 unique values
0 missing
V9339numeric59 unique values
0 missing
V9360numeric40 unique values
0 missing
V9376numeric12 unique values
0 missing
V9544numeric24 unique values
0 missing
V9577numeric93 unique values
0 missing
V9613numeric23 unique values
0 missing
V9622numeric50 unique values
0 missing
V9626numeric8 unique values
0 missing
V9671numeric19 unique values
0 missing
V9729numeric25 unique values
0 missing
V9773numeric55 unique values
0 missing
V9937numeric58 unique values
0 missing
V9953numeric34 unique values
0 missing

19 properties

100
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.39
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
44
Number of instances belonging to the least frequent class.
44
Percentage of instances belonging to the least frequent class.
56
Number of instances belonging to the most frequent class.
56
Percentage of instances belonging to the most frequent class.
1.01
Number of attributes divided by the number of instances.

0 tasks

Define a new task