DEVELOPMENT... OpenML
Data
Bioresponse_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

Bioresponse_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Bioresponse (4134) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

target (target)nominal2 unique values
0 missing
D3numeric13 unique values
0 missing
D9numeric1998 unique values
0 missing
D53numeric154 unique values
0 missing
D56numeric185 unique values
0 missing
D67numeric338 unique values
0 missing
D75numeric348 unique values
0 missing
D133numeric10 unique values
0 missing
D144numeric8 unique values
0 missing
D159numeric12 unique values
0 missing
D182numeric542 unique values
0 missing
D193numeric11 unique values
0 missing
D240numeric3 unique values
0 missing
D271numeric3 unique values
0 missing
D295numeric4 unique values
0 missing
D302numeric2 unique values
0 missing
D305numeric4 unique values
0 missing
D330numeric4 unique values
0 missing
D362numeric5 unique values
0 missing
D385numeric5 unique values
0 missing
D397numeric6 unique values
0 missing
D398numeric5 unique values
0 missing
D424numeric11 unique values
0 missing
D434numeric5 unique values
0 missing
D442numeric17 unique values
0 missing
D448numeric13 unique values
0 missing
D486numeric8 unique values
0 missing
D503numeric6 unique values
0 missing
D513numeric13 unique values
0 missing
D514numeric9 unique values
0 missing
D522numeric11 unique values
0 missing
D542numeric8 unique values
0 missing
D548numeric11 unique values
0 missing
D561numeric14 unique values
0 missing
D576numeric6 unique values
0 missing
D651numeric6 unique values
0 missing
D665numeric12 unique values
0 missing
D682numeric5 unique values
0 missing
D700numeric15 unique values
0 missing
D715numeric5 unique values
0 missing
D732numeric5 unique values
0 missing
D733numeric5 unique values
0 missing
D757numeric7 unique values
0 missing
D768numeric16 unique values
0 missing
D810numeric14 unique values
0 missing
D816numeric10 unique values
0 missing
D823numeric2 unique values
0 missing
D854numeric5 unique values
0 missing
D879numeric4 unique values
0 missing
D903numeric6 unique values
0 missing
D907numeric7 unique values
0 missing
D981numeric2 unique values
0 missing
D1000numeric2 unique values
0 missing
D1012numeric2 unique values
0 missing
D1050numeric2 unique values
0 missing
D1053numeric2 unique values
0 missing
D1060numeric2 unique values
0 missing
D1063numeric2 unique values
0 missing
D1102numeric2 unique values
0 missing
D1111numeric2 unique values
0 missing
D1115numeric2 unique values
0 missing
D1135numeric2 unique values
0 missing
D1145numeric2 unique values
0 missing
D1148numeric2 unique values
0 missing
D1151numeric2 unique values
0 missing
D1158numeric2 unique values
0 missing
D1166numeric2 unique values
0 missing
D1171numeric2 unique values
0 missing
D1194numeric2 unique values
0 missing
D1204numeric2 unique values
0 missing
D1228numeric2 unique values
0 missing
D1245numeric2 unique values
0 missing
D1259numeric2 unique values
0 missing
D1267numeric2 unique values
0 missing
D1292numeric2 unique values
0 missing
D1300numeric2 unique values
0 missing
D1340numeric2 unique values
0 missing
D1342numeric2 unique values
0 missing
D1344numeric2 unique values
0 missing
D1348numeric2 unique values
0 missing
D1361numeric2 unique values
0 missing
D1447numeric2 unique values
0 missing
D1459numeric2 unique values
0 missing
D1463numeric2 unique values
0 missing
D1483numeric2 unique values
0 missing
D1491numeric2 unique values
0 missing
D1508numeric2 unique values
0 missing
D1509numeric2 unique values
0 missing
D1540numeric2 unique values
0 missing
D1555numeric2 unique values
0 missing
D1612numeric2 unique values
0 missing
D1613numeric2 unique values
0 missing
D1620numeric2 unique values
0 missing
D1625numeric2 unique values
0 missing
D1634numeric2 unique values
0 missing
D1675numeric2 unique values
0 missing
D1691numeric2 unique values
0 missing
D1739numeric2 unique values
0 missing
D1740numeric2 unique values
0 missing
D1744numeric2 unique values
0 missing
D1771numeric2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.5
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
915
Number of instances belonging to the least frequent class.
45.75
Percentage of instances belonging to the least frequent class.
1085
Number of instances belonging to the most frequent class.
54.25
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task

A PHP Error was encountered

Severity: Core Warning

Message: Module 'mysqli' already loaded

Filename: Unknown

Line Number: 0

Backtrace: