DEVELOPMENT... OpenML
Data
Bioresponse_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

Bioresponse_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset Bioresponse (4134) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

target (target)nominal2 unique values
0 missing
D48numeric512 unique values
0 missing
D62numeric116 unique values
0 missing
D103numeric1743 unique values
0 missing
D137numeric13 unique values
0 missing
D139numeric26 unique values
0 missing
D156numeric6 unique values
0 missing
D203numeric366 unique values
0 missing
D227numeric7 unique values
0 missing
D230numeric3 unique values
0 missing
D238numeric5 unique values
0 missing
D287numeric3 unique values
0 missing
D296numeric8 unique values
0 missing
D302numeric2 unique values
0 missing
D305numeric4 unique values
0 missing
D317numeric3 unique values
0 missing
D337numeric10 unique values
0 missing
D349numeric9 unique values
0 missing
D358numeric4 unique values
0 missing
D373numeric9 unique values
0 missing
D382numeric8 unique values
0 missing
D389numeric5 unique values
0 missing
D392numeric6 unique values
0 missing
D468numeric7 unique values
0 missing
D478numeric8 unique values
0 missing
D522numeric8 unique values
0 missing
D574numeric4 unique values
0 missing
D607numeric10 unique values
0 missing
D631numeric4 unique values
0 missing
D632numeric3 unique values
0 missing
D636numeric5 unique values
0 missing
D639numeric4 unique values
0 missing
D646numeric4 unique values
0 missing
D679numeric5 unique values
0 missing
D702numeric10 unique values
0 missing
D733numeric5 unique values
0 missing
D765numeric16 unique values
0 missing
D796numeric3 unique values
0 missing
D812numeric12 unique values
0 missing
D835numeric4 unique values
0 missing
D846numeric6 unique values
0 missing
D853numeric7 unique values
0 missing
D860numeric4 unique values
0 missing
D861numeric3 unique values
0 missing
D866numeric8 unique values
0 missing
D873numeric8 unique values
0 missing
D900numeric6 unique values
0 missing
D908numeric5 unique values
0 missing
D923numeric3 unique values
0 missing
D932numeric5 unique values
0 missing
D948numeric6 unique values
0 missing
D982numeric2 unique values
0 missing
D983numeric2 unique values
0 missing
D1002numeric2 unique values
0 missing
D1010numeric2 unique values
0 missing
D1016numeric2 unique values
0 missing
D1025numeric2 unique values
0 missing
D1045numeric2 unique values
0 missing
D1059numeric2 unique values
0 missing
D1116numeric2 unique values
0 missing
D1145numeric2 unique values
0 missing
D1147numeric2 unique values
0 missing
D1167numeric2 unique values
0 missing
D1185numeric2 unique values
0 missing
D1211numeric2 unique values
0 missing
D1218numeric2 unique values
0 missing
D1219numeric2 unique values
0 missing
D1253numeric2 unique values
0 missing
D1345numeric2 unique values
0 missing
D1356numeric2 unique values
0 missing
D1357numeric2 unique values
0 missing
D1423numeric2 unique values
0 missing
D1425numeric2 unique values
0 missing
D1477numeric2 unique values
0 missing
D1480numeric2 unique values
0 missing
D1527numeric2 unique values
0 missing
D1532numeric2 unique values
0 missing
D1566numeric2 unique values
0 missing
D1569numeric2 unique values
0 missing
D1581numeric2 unique values
0 missing
D1583numeric2 unique values
0 missing
D1588numeric2 unique values
0 missing
D1591numeric2 unique values
0 missing
D1622numeric2 unique values
0 missing
D1626numeric2 unique values
0 missing
D1633numeric2 unique values
0 missing
D1636numeric2 unique values
0 missing
D1643numeric2 unique values
0 missing
D1645numeric2 unique values
0 missing
D1657numeric2 unique values
0 missing
D1661numeric2 unique values
0 missing
D1679numeric2 unique values
0 missing
D1706numeric2 unique values
0 missing
D1721numeric2 unique values
0 missing
D1726numeric2 unique values
0 missing
D1729numeric2 unique values
0 missing
D1733numeric2 unique values
0 missing
D1741numeric2 unique values
0 missing
D1750numeric2 unique values
0 missing
D1762numeric2 unique values
0 missing
D1764numeric2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.51
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
915
Number of instances belonging to the least frequent class.
45.75
Percentage of instances belonging to the least frequent class.
1085
Number of instances belonging to the most frequent class.
54.25
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task