DEVELOPMENT... OpenML
Data
guillermo_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

guillermo_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset guillermo (41159) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V166numeric845 unique values
0 missing
V169numeric812 unique values
0 missing
V233numeric545 unique values
0 missing
V245numeric832 unique values
0 missing
V328numeric513 unique values
0 missing
V387numeric485 unique values
0 missing
V415numeric286 unique values
0 missing
V433numeric713 unique values
0 missing
V445numeric870 unique values
0 missing
V457numeric330 unique values
0 missing
V459numeric711 unique values
0 missing
V634numeric589 unique values
0 missing
V792numeric491 unique values
0 missing
V794numeric565 unique values
0 missing
V849numeric618 unique values
0 missing
V860numeric425 unique values
0 missing
V911numeric393 unique values
0 missing
V929numeric614 unique values
0 missing
V951numeric859 unique values
0 missing
V1098numeric396 unique values
0 missing
V1099numeric744 unique values
0 missing
V1111numeric418 unique values
0 missing
V1159numeric566 unique values
0 missing
V1254numeric814 unique values
0 missing
V1290numeric790 unique values
0 missing
V1351numeric437 unique values
0 missing
V1409numeric467 unique values
0 missing
V1411numeric479 unique values
0 missing
V1454numeric579 unique values
0 missing
V1467numeric617 unique values
0 missing
V1610numeric683 unique values
0 missing
V1659numeric859 unique values
0 missing
V1736numeric816 unique values
0 missing
V1739numeric852 unique values
0 missing
V1787numeric297 unique values
0 missing
V1827numeric121 unique values
0 missing
V1874numeric272 unique values
0 missing
V1885numeric728 unique values
0 missing
V1897numeric582 unique values
0 missing
V1904numeric860 unique values
0 missing
V1913numeric183 unique values
0 missing
V1937numeric349 unique values
0 missing
V2002numeric282 unique values
0 missing
V2016numeric196 unique values
0 missing
V2030numeric871 unique values
0 missing
V2080numeric813 unique values
0 missing
V2129numeric598 unique values
0 missing
V2134numeric804 unique values
0 missing
V2153numeric581 unique values
0 missing
V2167numeric580 unique values
0 missing
V2206numeric704 unique values
0 missing
V2226numeric575 unique values
0 missing
V2243numeric827 unique values
0 missing
V2352numeric699 unique values
0 missing
V2372numeric449 unique values
0 missing
V2421numeric528 unique values
0 missing
V2458numeric787 unique values
0 missing
V2525numeric828 unique values
0 missing
V2534numeric489 unique values
0 missing
V2535numeric230 unique values
0 missing
V2536numeric883 unique values
0 missing
V2627numeric527 unique values
0 missing
V2678numeric917 unique values
0 missing
V2714numeric740 unique values
0 missing
V2750numeric900 unique values
0 missing
V2772numeric590 unique values
0 missing
V2828numeric585 unique values
0 missing
V2867numeric676 unique values
0 missing
V2891numeric1 unique values
0 missing
V2898numeric840 unique values
0 missing
V2937numeric764 unique values
0 missing
V2950numeric787 unique values
0 missing
V2969numeric551 unique values
0 missing
V2979numeric430 unique values
0 missing
V3066numeric1038 unique values
0 missing
V3163numeric809 unique values
0 missing
V3292numeric309 unique values
0 missing
V3299numeric523 unique values
0 missing
V3352numeric492 unique values
0 missing
V3422numeric780 unique values
0 missing
V3516numeric605 unique values
0 missing
V3552numeric498 unique values
0 missing
V3622numeric934 unique values
0 missing
V3673numeric647 unique values
0 missing
V3685numeric822 unique values
0 missing
V3707numeric373 unique values
0 missing
V3722numeric308 unique values
0 missing
V3768numeric354 unique values
0 missing
V3781numeric916 unique values
0 missing
V3818numeric849 unique values
0 missing
V3854numeric299 unique values
0 missing
V3927numeric865 unique values
0 missing
V3954numeric805 unique values
0 missing
V3994numeric746 unique values
0 missing
V4078numeric711 unique values
0 missing
V4093numeric804 unique values
0 missing
V4146numeric1231 unique values
0 missing
V4179numeric1234 unique values
0 missing
V4207numeric1231 unique values
0 missing
V4292numeric1229 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.52
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
800
Number of instances belonging to the least frequent class.
40
Percentage of instances belonging to the least frequent class.
1200
Number of instances belonging to the most frequent class.
60
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task