DEVELOPMENT... OpenML
Data
volkert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

volkert_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset volkert (41166) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V1numeric782 unique values
0 missing
V4numeric1 unique values
0 missing
V5numeric1 unique values
0 missing
V6numeric1 unique values
0 missing
V8numeric1 unique values
0 missing
V9numeric1 unique values
0 missing
V10numeric365 unique values
0 missing
V12numeric1 unique values
0 missing
V14numeric1 unique values
0 missing
V15numeric1 unique values
0 missing
V16numeric1 unique values
0 missing
V18numeric813 unique values
0 missing
V19numeric1 unique values
0 missing
V20numeric1 unique values
0 missing
V26numeric1 unique values
0 missing
V31numeric1 unique values
0 missing
V32numeric1 unique values
0 missing
V33numeric1 unique values
0 missing
V35numeric1 unique values
0 missing
V36numeric1 unique values
0 missing
V37numeric798 unique values
0 missing
V38numeric996 unique values
0 missing
V40numeric814 unique values
0 missing
V41numeric737 unique values
0 missing
V45numeric453 unique values
0 missing
V46numeric398 unique values
0 missing
V52numeric202 unique values
0 missing
V54numeric167 unique values
0 missing
V57numeric148 unique values
0 missing
V58numeric131 unique values
0 missing
V63numeric96 unique values
0 missing
V64numeric91 unique values
0 missing
V66numeric77 unique values
0 missing
V68numeric158 unique values
0 missing
V69numeric153 unique values
0 missing
V71numeric570 unique values
0 missing
V72numeric713 unique values
0 missing
V73numeric804 unique values
0 missing
V74numeric871 unique values
0 missing
V76numeric821 unique values
0 missing
V77numeric746 unique values
0 missing
V78numeric652 unique values
0 missing
V82numeric328 unique values
0 missing
V83numeric275 unique values
0 missing
V84numeric178 unique values
0 missing
V87numeric1862 unique values
0 missing
V89numeric1889 unique values
0 missing
V92numeric1881 unique values
0 missing
V96numeric1620 unique values
0 missing
V97numeric1548 unique values
0 missing
V99numeric1509 unique values
0 missing
V101numeric1728 unique values
0 missing
V102numeric1729 unique values
0 missing
V104numeric1729 unique values
0 missing
V106numeric1893 unique values
0 missing
V107numeric1888 unique values
0 missing
V108numeric1894 unique values
0 missing
V114numeric299 unique values
0 missing
V115numeric288 unique values
0 missing
V116numeric305 unique values
0 missing
V118numeric285 unique values
0 missing
V121numeric271 unique values
0 missing
V130numeric269 unique values
0 missing
V131numeric269 unique values
0 missing
V133numeric286 unique values
0 missing
V134numeric364 unique values
0 missing
V135numeric292 unique values
0 missing
V136numeric287 unique values
0 missing
V137numeric321 unique values
0 missing
V138numeric323 unique values
0 missing
V139numeric338 unique values
0 missing
V140numeric324 unique values
0 missing
V142numeric383 unique values
0 missing
V143numeric375 unique values
0 missing
V145numeric445 unique values
0 missing
V147numeric374 unique values
0 missing
V149numeric339 unique values
0 missing
V151numeric315 unique values
0 missing
V152numeric317 unique values
0 missing
V153numeric294 unique values
0 missing
V154numeric288 unique values
0 missing
V155numeric365 unique values
0 missing
V156numeric276 unique values
0 missing
V157numeric272 unique values
0 missing
V159numeric263 unique values
0 missing
V162numeric371 unique values
0 missing
V163numeric336 unique values
0 missing
V164numeric270 unique values
0 missing
V165numeric270 unique values
0 missing
V166numeric258 unique values
0 missing
V167numeric273 unique values
0 missing
V168numeric262 unique values
0 missing
V169numeric276 unique values
0 missing
V171numeric275 unique values
0 missing
V174numeric287 unique values
0 missing
V175numeric292 unique values
0 missing
V176numeric301 unique values
0 missing
V177numeric304 unique values
0 missing
V178numeric312 unique values
0 missing
V179numeric297 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.16
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
47
Number of instances belonging to the least frequent class.
2.35
Percentage of instances belonging to the least frequent class.
439
Number of instances belonging to the most frequent class.
21.95
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task