DEVELOPMENT... OpenML
Data
volkert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

volkert_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset volkert (41166) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V3numeric1 unique values
0 missing
V7numeric1 unique values
0 missing
V8numeric1 unique values
0 missing
V9numeric1 unique values
0 missing
V10numeric360 unique values
0 missing
V13numeric1 unique values
0 missing
V14numeric1 unique values
0 missing
V15numeric1 unique values
0 missing
V16numeric1 unique values
0 missing
V17numeric1 unique values
0 missing
V23numeric1 unique values
0 missing
V25numeric1 unique values
0 missing
V26numeric1 unique values
0 missing
V27numeric1 unique values
0 missing
V29numeric1 unique values
0 missing
V32numeric1 unique values
0 missing
V34numeric1 unique values
0 missing
V35numeric1 unique values
0 missing
V36numeric1 unique values
0 missing
V38numeric998 unique values
0 missing
V39numeric909 unique values
0 missing
V40numeric828 unique values
0 missing
V42numeric642 unique values
0 missing
V44numeric505 unique values
0 missing
V45numeric462 unique values
0 missing
V47numeric365 unique values
0 missing
V49numeric313 unique values
0 missing
V51numeric229 unique values
0 missing
V54numeric160 unique values
0 missing
V55numeric174 unique values
0 missing
V56numeric161 unique values
0 missing
V59numeric114 unique values
0 missing
V60numeric115 unique values
0 missing
V62numeric92 unique values
0 missing
V63numeric87 unique values
0 missing
V66numeric63 unique values
0 missing
V67numeric53 unique values
0 missing
V72numeric719 unique values
0 missing
V73numeric793 unique values
0 missing
V74numeric831 unique values
0 missing
V77numeric744 unique values
0 missing
V78numeric649 unique values
0 missing
V80numeric497 unique values
0 missing
V83numeric273 unique values
0 missing
V84numeric184 unique values
0 missing
V86numeric1868 unique values
0 missing
V87numeric1853 unique values
0 missing
V88numeric1868 unique values
0 missing
V89numeric1883 unique values
0 missing
V91numeric1870 unique values
0 missing
V92numeric1873 unique values
0 missing
V96numeric1617 unique values
0 missing
V97numeric1527 unique values
0 missing
V98numeric1459 unique values
0 missing
V99numeric1486 unique values
0 missing
V100numeric1459 unique values
0 missing
V104numeric1728 unique values
0 missing
V106numeric1875 unique values
0 missing
V108numeric1875 unique values
0 missing
V111numeric313 unique values
0 missing
V112numeric305 unique values
0 missing
V113numeric305 unique values
0 missing
V115numeric300 unique values
0 missing
V117numeric284 unique values
0 missing
V118numeric288 unique values
0 missing
V121numeric281 unique values
0 missing
V122numeric280 unique values
0 missing
V124numeric272 unique values
0 missing
V125numeric286 unique values
0 missing
V126numeric379 unique values
0 missing
V129numeric282 unique values
0 missing
V130numeric282 unique values
0 missing
V131numeric276 unique values
0 missing
V133numeric297 unique values
0 missing
V135numeric299 unique values
0 missing
V136numeric300 unique values
0 missing
V137numeric318 unique values
0 missing
V138numeric332 unique values
0 missing
V139numeric352 unique values
0 missing
V140numeric354 unique values
0 missing
V141numeric368 unique values
0 missing
V143numeric390 unique values
0 missing
V145numeric444 unique values
0 missing
V146numeric395 unique values
0 missing
V148numeric350 unique values
0 missing
V150numeric340 unique values
0 missing
V157numeric284 unique values
0 missing
V158numeric280 unique values
0 missing
V161numeric281 unique values
0 missing
V163numeric343 unique values
0 missing
V164numeric285 unique values
0 missing
V167numeric278 unique values
0 missing
V169numeric277 unique values
0 missing
V170numeric367 unique values
0 missing
V171numeric286 unique values
0 missing
V172numeric279 unique values
0 missing
V173numeric318 unique values
0 missing
V175numeric302 unique values
0 missing
V176numeric303 unique values
0 missing
V177numeric314 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.14
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
47
Number of instances belonging to the least frequent class.
2.35
Percentage of instances belonging to the least frequent class.
439
Number of instances belonging to the most frequent class.
21.95
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task