DEVELOPMENT... OpenML
Data
Bioresponse_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

Bioresponse_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Bioresponse (4134) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

target (target)nominal2 unique values
0 missing
D35numeric78 unique values
0 missing
D47numeric1803 unique values
0 missing
D59numeric21 unique values
0 missing
D70numeric63 unique values
0 missing
D96numeric22 unique values
0 missing
D107numeric1959 unique values
0 missing
D110numeric7 unique values
0 missing
D146numeric33 unique values
0 missing
D160numeric16 unique values
0 missing
D202numeric938 unique values
0 missing
D212numeric368 unique values
0 missing
D213numeric67 unique values
0 missing
D230numeric3 unique values
0 missing
D243numeric11 unique values
0 missing
D263numeric4 unique values
0 missing
D279numeric3 unique values
0 missing
D349numeric7 unique values
0 missing
D379numeric6 unique values
0 missing
D420numeric17 unique values
0 missing
D435numeric5 unique values
0 missing
D451numeric15 unique values
0 missing
D459numeric10 unique values
0 missing
D462numeric8 unique values
0 missing
D480numeric3 unique values
0 missing
D483numeric6 unique values
0 missing
D507numeric3 unique values
0 missing
D510numeric3 unique values
0 missing
D518numeric11 unique values
0 missing
D526numeric10 unique values
0 missing
D560numeric12 unique values
0 missing
D562numeric11 unique values
0 missing
D609numeric6 unique values
0 missing
D634numeric7 unique values
0 missing
D646numeric4 unique values
0 missing
D656numeric3 unique values
0 missing
D691numeric7 unique values
0 missing
D693numeric6 unique values
0 missing
D715numeric5 unique values
0 missing
D732numeric3 unique values
0 missing
D740numeric22 unique values
0 missing
D748numeric10 unique values
0 missing
D771numeric9 unique values
0 missing
D775numeric6 unique values
0 missing
D794numeric4 unique values
0 missing
D795numeric8 unique values
0 missing
D804numeric3 unique values
0 missing
D806numeric8 unique values
0 missing
D836numeric5 unique values
0 missing
D858numeric1 unique values
0 missing
D859numeric3 unique values
0 missing
D862numeric2 unique values
0 missing
D897numeric7 unique values
0 missing
D901numeric8 unique values
0 missing
D905numeric5 unique values
0 missing
D915numeric7 unique values
0 missing
D927numeric28 unique values
0 missing
D932numeric5 unique values
0 missing
D937numeric8 unique values
0 missing
D968numeric2 unique values
0 missing
D1028numeric2 unique values
0 missing
D1045numeric2 unique values
0 missing
D1071numeric2 unique values
0 missing
D1087numeric2 unique values
0 missing
D1091numeric2 unique values
0 missing
D1128numeric2 unique values
0 missing
D1192numeric2 unique values
0 missing
D1253numeric2 unique values
0 missing
D1258numeric2 unique values
0 missing
D1259numeric2 unique values
0 missing
D1268numeric2 unique values
0 missing
D1280numeric2 unique values
0 missing
D1290numeric2 unique values
0 missing
D1293numeric2 unique values
0 missing
D1335numeric2 unique values
0 missing
D1344numeric2 unique values
0 missing
D1351numeric2 unique values
0 missing
D1355numeric2 unique values
0 missing
D1386numeric2 unique values
0 missing
D1391numeric2 unique values
0 missing
D1399numeric2 unique values
0 missing
D1419numeric2 unique values
0 missing
D1424numeric2 unique values
0 missing
D1456numeric2 unique values
0 missing
D1467numeric2 unique values
0 missing
D1469numeric2 unique values
0 missing
D1484numeric2 unique values
0 missing
D1501numeric2 unique values
0 missing
D1529numeric2 unique values
0 missing
D1549numeric2 unique values
0 missing
D1597numeric2 unique values
0 missing
D1598numeric2 unique values
0 missing
D1602numeric2 unique values
0 missing
D1604numeric2 unique values
0 missing
D1660numeric2 unique values
0 missing
D1670numeric2 unique values
0 missing
D1684numeric2 unique values
0 missing
D1691numeric2 unique values
0 missing
D1738numeric2 unique values
0 missing
D1740numeric2 unique values
0 missing
D1759numeric2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.51
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
915
Number of instances belonging to the least frequent class.
45.75
Percentage of instances belonging to the least frequent class.
1085
Number of instances belonging to the most frequent class.
54.25
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task