DEVELOPMENT... OpenML
Data
guillermo_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

guillermo_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset guillermo (41159) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V12numeric344 unique values
0 missing
V23numeric721 unique values
0 missing
V36numeric385 unique values
0 missing
V70numeric192 unique values
0 missing
V94numeric591 unique values
0 missing
V121numeric854 unique values
0 missing
V142numeric641 unique values
0 missing
V173numeric539 unique values
0 missing
V209numeric657 unique values
0 missing
V312numeric760 unique values
0 missing
V317numeric690 unique values
0 missing
V341numeric584 unique values
0 missing
V361numeric703 unique values
0 missing
V378numeric650 unique values
0 missing
V527numeric848 unique values
0 missing
V576numeric393 unique values
0 missing
V737numeric718 unique values
0 missing
V743numeric555 unique values
0 missing
V975numeric784 unique values
0 missing
V1090numeric834 unique values
0 missing
V1093numeric800 unique values
0 missing
V1132numeric735 unique values
0 missing
V1134numeric626 unique values
0 missing
V1170numeric741 unique values
0 missing
V1269numeric405 unique values
0 missing
V1294numeric651 unique values
0 missing
V1325numeric133 unique values
0 missing
V1378numeric692 unique values
0 missing
V1407numeric367 unique values
0 missing
V1448numeric411 unique values
0 missing
V1530numeric784 unique values
0 missing
V1602numeric761 unique values
0 missing
V1615numeric683 unique values
0 missing
V1620numeric866 unique values
0 missing
V1630numeric275 unique values
0 missing
V1657numeric721 unique values
0 missing
V1664numeric495 unique values
0 missing
V1679numeric729 unique values
0 missing
V1709numeric526 unique values
0 missing
V1791numeric434 unique values
0 missing
V1806numeric796 unique values
0 missing
V1959numeric930 unique values
0 missing
V2038numeric663 unique values
0 missing
V2075numeric223 unique values
0 missing
V2120numeric774 unique values
0 missing
V2147numeric408 unique values
0 missing
V2161numeric543 unique values
0 missing
V2231numeric758 unique values
0 missing
V2242numeric803 unique values
0 missing
V2272numeric673 unique values
0 missing
V2291numeric683 unique values
0 missing
V2292numeric667 unique values
0 missing
V2341numeric623 unique values
0 missing
V2361numeric752 unique values
0 missing
V2446numeric638 unique values
0 missing
V2468numeric589 unique values
0 missing
V2545numeric743 unique values
0 missing
V2554numeric367 unique values
0 missing
V2613numeric565 unique values
0 missing
V2664numeric771 unique values
0 missing
V2674numeric257 unique values
0 missing
V2675numeric363 unique values
0 missing
V2733numeric614 unique values
0 missing
V2747numeric394 unique values
0 missing
V2769numeric828 unique values
0 missing
V2831numeric632 unique values
0 missing
V2845numeric767 unique values
0 missing
V2878numeric562 unique values
0 missing
V2917numeric363 unique values
0 missing
V2932numeric783 unique values
0 missing
V2999numeric709 unique values
0 missing
V3065numeric840 unique values
0 missing
V3071numeric24 unique values
0 missing
V3073numeric321 unique values
0 missing
V3077numeric498 unique values
0 missing
V3081numeric852 unique values
0 missing
V3084numeric449 unique values
0 missing
V3232numeric536 unique values
0 missing
V3245numeric679 unique values
0 missing
V3260numeric515 unique values
0 missing
V3421numeric764 unique values
0 missing
V3423numeric334 unique values
0 missing
V3442numeric517 unique values
0 missing
V3571numeric590 unique values
0 missing
V3578numeric441 unique values
0 missing
V3579numeric825 unique values
0 missing
V3606numeric901 unique values
0 missing
V3620numeric502 unique values
0 missing
V3652numeric518 unique values
0 missing
V3730numeric767 unique values
0 missing
V3800numeric868 unique values
0 missing
V3818numeric798 unique values
0 missing
V3841numeric840 unique values
0 missing
V3943numeric646 unique values
0 missing
V3993numeric449 unique values
0 missing
V4043numeric521 unique values
0 missing
V4088numeric648 unique values
0 missing
V4171numeric1222 unique values
0 missing
V4239numeric1221 unique values
0 missing
V4296numeric1217 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.52
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
800
Number of instances belonging to the least frequent class.
40
Percentage of instances belonging to the least frequent class.
1200
Number of instances belonging to the most frequent class.
60
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task