DEVELOPMENT... OpenML
Data
cnae-9_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

cnae-9_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset cnae-9 (1468) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

Class (target)nominal9 unique values
0 missing
V3numeric2 unique values
0 missing
V5numeric2 unique values
0 missing
V7numeric4 unique values
0 missing
V13numeric2 unique values
0 missing
V18numeric2 unique values
0 missing
V23numeric2 unique values
0 missing
V27numeric2 unique values
0 missing
V32numeric2 unique values
0 missing
V42numeric2 unique values
0 missing
V58numeric2 unique values
0 missing
V61numeric2 unique values
0 missing
V64numeric2 unique values
0 missing
V71numeric4 unique values
0 missing
V72numeric2 unique values
0 missing
V100numeric2 unique values
0 missing
V112numeric2 unique values
0 missing
V135numeric3 unique values
0 missing
V139numeric2 unique values
0 missing
V194numeric2 unique values
0 missing
V206numeric2 unique values
0 missing
V208numeric2 unique values
0 missing
V214numeric2 unique values
0 missing
V216numeric2 unique values
0 missing
V225numeric2 unique values
0 missing
V235numeric2 unique values
0 missing
V238numeric2 unique values
0 missing
V257numeric2 unique values
0 missing
V271numeric2 unique values
0 missing
V278numeric2 unique values
0 missing
V286numeric2 unique values
0 missing
V300numeric2 unique values
0 missing
V308numeric2 unique values
0 missing
V309numeric2 unique values
0 missing
V311numeric2 unique values
0 missing
V319numeric2 unique values
0 missing
V322numeric2 unique values
0 missing
V323numeric2 unique values
0 missing
V332numeric2 unique values
0 missing
V337numeric2 unique values
0 missing
V351numeric2 unique values
0 missing
V373numeric3 unique values
0 missing
V383numeric3 unique values
0 missing
V388numeric2 unique values
0 missing
V404numeric2 unique values
0 missing
V421numeric3 unique values
0 missing
V423numeric3 unique values
0 missing
V426numeric2 unique values
0 missing
V429numeric2 unique values
0 missing
V434numeric2 unique values
0 missing
V435numeric2 unique values
0 missing
V444numeric2 unique values
0 missing
V468numeric2 unique values
0 missing
V477numeric2 unique values
0 missing
V479numeric2 unique values
0 missing
V483numeric2 unique values
0 missing
V489numeric2 unique values
0 missing
V497numeric2 unique values
0 missing
V499numeric3 unique values
0 missing
V501numeric2 unique values
0 missing
V521numeric2 unique values
0 missing
V523numeric2 unique values
0 missing
V531numeric2 unique values
0 missing
V538numeric2 unique values
0 missing
V559numeric2 unique values
0 missing
V564numeric2 unique values
0 missing
V574numeric2 unique values
0 missing
V577numeric2 unique values
0 missing
V595numeric3 unique values
0 missing
V598numeric2 unique values
0 missing
V601numeric2 unique values
0 missing
V606numeric2 unique values
0 missing
V608numeric3 unique values
0 missing
V617numeric2 unique values
0 missing
V623numeric2 unique values
0 missing
V635numeric2 unique values
0 missing
V643numeric2 unique values
0 missing
V644numeric2 unique values
0 missing
V653numeric2 unique values
0 missing
V667numeric2 unique values
0 missing
V671numeric3 unique values
0 missing
V682numeric2 unique values
0 missing
V687numeric2 unique values
0 missing
V698numeric2 unique values
0 missing
V701numeric3 unique values
0 missing
V713numeric2 unique values
0 missing
V719numeric2 unique values
0 missing
V726numeric3 unique values
0 missing
V741numeric2 unique values
0 missing
V749numeric2 unique values
0 missing
V755numeric2 unique values
0 missing
V769numeric2 unique values
0 missing
V775numeric2 unique values
0 missing
V779numeric2 unique values
0 missing
V797numeric2 unique values
0 missing
V808numeric2 unique values
0 missing
V818numeric2 unique values
0 missing
V826numeric2 unique values
0 missing
V827numeric2 unique values
0 missing
V837numeric2 unique values
0 missing
V852numeric2 unique values
0 missing

19 properties

1080
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
9
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
120
Number of instances belonging to the least frequent class.
11.11
Percentage of instances belonging to the least frequent class.
120
Number of instances belonging to the most frequent class.
11.11
Percentage of instances belonging to the most frequent class.
0.09
Number of attributes divided by the number of instances.

0 tasks

Define a new task