DEVELOPMENT... OpenML
Data
guillermo_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

guillermo_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset guillermo (41159) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V115numeric550 unique values
0 missing
V150numeric335 unique values
0 missing
V256numeric809 unique values
0 missing
V338numeric698 unique values
0 missing
V340numeric464 unique values
0 missing
V383numeric540 unique values
0 missing
V493numeric620 unique values
0 missing
V549numeric194 unique values
0 missing
V567numeric720 unique values
0 missing
V589numeric588 unique values
0 missing
V702numeric382 unique values
0 missing
V733numeric581 unique values
0 missing
V736numeric990 unique values
0 missing
V753numeric681 unique values
0 missing
V770numeric833 unique values
0 missing
V817numeric327 unique values
0 missing
V855numeric915 unique values
0 missing
V882numeric299 unique values
0 missing
V926numeric481 unique values
0 missing
V928numeric614 unique values
0 missing
V950numeric601 unique values
0 missing
V953numeric431 unique values
0 missing
V1143numeric728 unique values
0 missing
V1192numeric730 unique values
0 missing
V1288numeric414 unique values
0 missing
V1427numeric691 unique values
0 missing
V1489numeric755 unique values
0 missing
V1548numeric783 unique values
0 missing
V1551numeric799 unique values
0 missing
V1564numeric818 unique values
0 missing
V1572numeric243 unique values
0 missing
V1585numeric600 unique values
0 missing
V1643numeric633 unique values
0 missing
V1731numeric474 unique values
0 missing
V1818numeric681 unique values
0 missing
V1908numeric188 unique values
0 missing
V1966numeric156 unique values
0 missing
V1971numeric877 unique values
0 missing
V2014numeric654 unique values
0 missing
V2035numeric815 unique values
0 missing
V2053numeric438 unique values
0 missing
V2097numeric684 unique values
0 missing
V2100numeric666 unique values
0 missing
V2114numeric643 unique values
0 missing
V2117numeric500 unique values
0 missing
V2126numeric631 unique values
0 missing
V2127numeric166 unique values
0 missing
V2148numeric731 unique values
0 missing
V2213numeric586 unique values
0 missing
V2224numeric780 unique values
0 missing
V2278numeric791 unique values
0 missing
V2294numeric418 unique values
0 missing
V2304numeric725 unique values
0 missing
V2427numeric610 unique values
0 missing
V2447numeric927 unique values
0 missing
V2453numeric771 unique values
0 missing
V2482numeric317 unique values
0 missing
V2495numeric871 unique values
0 missing
V2555numeric478 unique values
0 missing
V2580numeric621 unique values
0 missing
V2639numeric789 unique values
0 missing
V2732numeric712 unique values
0 missing
V2824numeric1 unique values
0 missing
V2833numeric719 unique values
0 missing
V2848numeric768 unique values
0 missing
V2888numeric951 unique values
0 missing
V2963numeric341 unique values
0 missing
V2988numeric582 unique values
0 missing
V3049numeric732 unique values
0 missing
V3099numeric365 unique values
0 missing
V3299numeric524 unique values
0 missing
V3333numeric1 unique values
0 missing
V3377numeric509 unique values
0 missing
V3452numeric812 unique values
0 missing
V3464numeric834 unique values
0 missing
V3636numeric535 unique values
0 missing
V3674numeric395 unique values
0 missing
V3701numeric731 unique values
0 missing
V3760numeric325 unique values
0 missing
V3789numeric579 unique values
0 missing
V3797numeric626 unique values
0 missing
V3806numeric578 unique values
0 missing
V3831numeric393 unique values
0 missing
V3931numeric579 unique values
0 missing
V3932numeric611 unique values
0 missing
V3951numeric742 unique values
0 missing
V3959numeric935 unique values
0 missing
V3969numeric615 unique values
0 missing
V3998numeric349 unique values
0 missing
V4003numeric380 unique values
0 missing
V4075numeric754 unique values
0 missing
V4076numeric753 unique values
0 missing
V4079numeric686 unique values
0 missing
V4099numeric1233 unique values
0 missing
V4103numeric1234 unique values
0 missing
V4132numeric1230 unique values
0 missing
V4160numeric1225 unique values
0 missing
V4175numeric1232 unique values
0 missing
V4253numeric1231 unique values
0 missing
V4272numeric1235 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.53
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
800
Number of instances belonging to the least frequent class.
40
Percentage of instances belonging to the least frequent class.
1200
Number of instances belonging to the most frequent class.
60
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task