DEVELOPMENT... OpenML
Data
volkert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

volkert_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset volkert (41166) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal10 unique values
0 missing
V5numeric1 unique values
0 missing
V6numeric1 unique values
0 missing
V7numeric1 unique values
0 missing
V9numeric1 unique values
0 missing
V10numeric369 unique values
0 missing
V13numeric1 unique values
0 missing
V15numeric1 unique values
0 missing
V16numeric1 unique values
0 missing
V17numeric1 unique values
0 missing
V18numeric819 unique values
0 missing
V20numeric1 unique values
0 missing
V22numeric1 unique values
0 missing
V23numeric1 unique values
0 missing
V25numeric1 unique values
0 missing
V26numeric1 unique values
0 missing
V27numeric1 unique values
0 missing
V29numeric1 unique values
0 missing
V30numeric1 unique values
0 missing
V31numeric1 unique values
0 missing
V34numeric1 unique values
0 missing
V36numeric1 unique values
0 missing
V39numeric905 unique values
0 missing
V40numeric797 unique values
0 missing
V41numeric716 unique values
0 missing
V42numeric634 unique values
0 missing
V43numeric562 unique values
0 missing
V45numeric451 unique values
0 missing
V46numeric391 unique values
0 missing
V47numeric360 unique values
0 missing
V55numeric168 unique values
0 missing
V58numeric138 unique values
0 missing
V59numeric123 unique values
0 missing
V62numeric88 unique values
0 missing
V64numeric77 unique values
0 missing
V65numeric74 unique values
0 missing
V66numeric66 unique values
0 missing
V67numeric44 unique values
0 missing
V68numeric165 unique values
0 missing
V69numeric188 unique values
0 missing
V71numeric557 unique values
0 missing
V72numeric713 unique values
0 missing
V73numeric808 unique values
0 missing
V74numeric849 unique values
0 missing
V75numeric842 unique values
0 missing
V76numeric822 unique values
0 missing
V77numeric748 unique values
0 missing
V79numeric570 unique values
0 missing
V80numeric505 unique values
0 missing
V84numeric185 unique values
0 missing
V87numeric1844 unique values
0 missing
V89numeric1856 unique values
0 missing
V90numeric1852 unique values
0 missing
V91numeric1870 unique values
0 missing
V92numeric1852 unique values
0 missing
V93numeric1615 unique values
0 missing
V95numeric1573 unique values
0 missing
V96numeric1615 unique values
0 missing
V97numeric1521 unique values
0 missing
V100numeric1443 unique values
0 missing
V101numeric1726 unique values
0 missing
V103numeric1672 unique values
0 missing
V104numeric1688 unique values
0 missing
V106numeric1871 unique values
0 missing
V108numeric1871 unique values
0 missing
V109numeric444 unique values
0 missing
V111numeric309 unique values
0 missing
V112numeric300 unique values
0 missing
V113numeric293 unique values
0 missing
V118numeric276 unique values
0 missing
V119numeric368 unique values
0 missing
V122numeric282 unique values
0 missing
V126numeric387 unique values
0 missing
V127numeric355 unique values
0 missing
V128numeric279 unique values
0 missing
V133numeric304 unique values
0 missing
V134numeric386 unique values
0 missing
V136numeric299 unique values
0 missing
V137numeric327 unique values
0 missing
V138numeric332 unique values
0 missing
V140numeric348 unique values
0 missing
V144numeric504 unique values
0 missing
V145numeric470 unique values
0 missing
V146numeric395 unique values
0 missing
V148numeric357 unique values
0 missing
V150numeric340 unique values
0 missing
V154numeric299 unique values
0 missing
V155numeric377 unique values
0 missing
V156numeric289 unique values
0 missing
V157numeric280 unique values
0 missing
V160numeric276 unique values
0 missing
V161numeric282 unique values
0 missing
V163numeric363 unique values
0 missing
V167numeric277 unique values
0 missing
V168numeric270 unique values
0 missing
V170numeric365 unique values
0 missing
V173numeric305 unique values
0 missing
V174numeric287 unique values
0 missing
V175numeric306 unique values
0 missing
V177numeric293 unique values
0 missing
V178numeric310 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
10
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.15
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
47
Number of instances belonging to the least frequent class.
2.35
Percentage of instances belonging to the least frequent class.
439
Number of instances belonging to the most frequent class.
21.95
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task