DEVELOPMENT... OpenML
Data
arcene_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

arcene_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset arcene (41157) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V15numeric87 unique values
0 missing
V48numeric59 unique values
0 missing
V303numeric12 unique values
0 missing
V325numeric81 unique values
0 missing
V391numeric19 unique values
0 missing
V429numeric62 unique values
0 missing
V765numeric60 unique values
0 missing
V849numeric12 unique values
0 missing
V906numeric28 unique values
0 missing
V933numeric63 unique values
0 missing
V1023numeric83 unique values
0 missing
V1128numeric27 unique values
0 missing
V1386numeric41 unique values
0 missing
V1584numeric28 unique values
0 missing
V1717numeric51 unique values
0 missing
V1778numeric36 unique values
0 missing
V1797numeric14 unique values
0 missing
V1895numeric60 unique values
0 missing
V2066numeric41 unique values
0 missing
V2183numeric2 unique values
0 missing
V2249numeric52 unique values
0 missing
V2346numeric88 unique values
0 missing
V2436numeric40 unique values
0 missing
V2473numeric82 unique values
0 missing
V2528numeric60 unique values
0 missing
V2534numeric56 unique values
0 missing
V2626numeric17 unique values
0 missing
V2824numeric52 unique values
0 missing
V2910numeric21 unique values
0 missing
V2919numeric52 unique values
0 missing
V2968numeric79 unique values
0 missing
V2974numeric35 unique values
0 missing
V3124numeric25 unique values
0 missing
V3180numeric52 unique values
0 missing
V3254numeric55 unique values
0 missing
V3293numeric36 unique values
0 missing
V3729numeric20 unique values
0 missing
V3844numeric6 unique values
0 missing
V3871numeric50 unique values
0 missing
V3882numeric62 unique values
0 missing
V3940numeric79 unique values
0 missing
V4170numeric19 unique values
0 missing
V4275numeric52 unique values
0 missing
V4282numeric17 unique values
0 missing
V4294numeric30 unique values
0 missing
V4486numeric19 unique values
0 missing
V4692numeric79 unique values
0 missing
V4725numeric46 unique values
0 missing
V4750numeric43 unique values
0 missing
V4861numeric27 unique values
0 missing
V5129numeric17 unique values
0 missing
V5177numeric20 unique values
0 missing
V5185numeric74 unique values
0 missing
V5769numeric43 unique values
0 missing
V5824numeric28 unique values
0 missing
V5826numeric9 unique values
0 missing
V5933numeric65 unique values
0 missing
V6064numeric73 unique values
0 missing
V6065numeric31 unique values
0 missing
V6159numeric34 unique values
0 missing
V6284numeric69 unique values
0 missing
V6444numeric65 unique values
0 missing
V6463numeric4 unique values
0 missing
V6524numeric39 unique values
0 missing
V6564numeric24 unique values
0 missing
V6566numeric3 unique values
0 missing
V6583numeric93 unique values
0 missing
V6595numeric29 unique values
0 missing
V6615numeric44 unique values
0 missing
V6819numeric52 unique values
0 missing
V6857numeric33 unique values
0 missing
V6919numeric77 unique values
0 missing
V7042numeric44 unique values
0 missing
V7206numeric44 unique values
0 missing
V7286numeric70 unique values
0 missing
V7327numeric14 unique values
0 missing
V7400numeric29 unique values
0 missing
V7516numeric48 unique values
0 missing
V7581numeric13 unique values
0 missing
V7699numeric41 unique values
0 missing
V7810numeric1 unique values
0 missing
V7938numeric27 unique values
0 missing
V8035numeric21 unique values
0 missing
V8191numeric19 unique values
0 missing
V8284numeric19 unique values
0 missing
V8413numeric4 unique values
0 missing
V8496numeric38 unique values
0 missing
V8612numeric45 unique values
0 missing
V8640numeric35 unique values
0 missing
V8780numeric44 unique values
0 missing
V8810numeric66 unique values
0 missing
V8873numeric14 unique values
0 missing
V9127numeric80 unique values
0 missing
V9251numeric26 unique values
0 missing
V9285numeric67 unique values
0 missing
V9333numeric6 unique values
0 missing
V9387numeric21 unique values
0 missing
V9498numeric80 unique values
0 missing
V9562numeric46 unique values
0 missing
V9681numeric20 unique values
0 missing

19 properties

100
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.39
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
44
Number of instances belonging to the least frequent class.
44
Percentage of instances belonging to the least frequent class.
56
Number of instances belonging to the most frequent class.
56
Percentage of instances belonging to the most frequent class.
1.01
Number of attributes divided by the number of instances.

0 tasks

Define a new task