DEVELOPMENT... OpenML
Data
jasmine_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

jasmine_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset jasmine (41143) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V4nominal2 unique values
0 missing
V5nominal2 unique values
0 missing
V6nominal2 unique values
0 missing
V8nominal2 unique values
0 missing
V10nominal2 unique values
0 missing
V11nominal2 unique values
0 missing
V13numeric105 unique values
0 missing
V15nominal2 unique values
0 missing
V16nominal2 unique values
0 missing
V17nominal2 unique values
0 missing
V18nominal2 unique values
0 missing
V20nominal2 unique values
0 missing
V21nominal2 unique values
0 missing
V22nominal2 unique values
0 missing
V24nominal2 unique values
0 missing
V27nominal2 unique values
0 missing
V29nominal2 unique values
0 missing
V30nominal2 unique values
0 missing
V31nominal2 unique values
0 missing
V32nominal2 unique values
0 missing
V33nominal2 unique values
0 missing
V35nominal2 unique values
0 missing
V37nominal2 unique values
0 missing
V38nominal2 unique values
0 missing
V41nominal2 unique values
0 missing
V43numeric74 unique values
0 missing
V45numeric14 unique values
0 missing
V47nominal2 unique values
0 missing
V48nominal2 unique values
0 missing
V49nominal2 unique values
0 missing
V50nominal2 unique values
0 missing
V51nominal2 unique values
0 missing
V52nominal2 unique values
0 missing
V53nominal2 unique values
0 missing
V55nominal2 unique values
0 missing
V56numeric1148 unique values
0 missing
V57nominal2 unique values
0 missing
V58nominal2 unique values
0 missing
V59numeric110 unique values
0 missing
V61nominal2 unique values
0 missing
V62nominal2 unique values
0 missing
V63nominal2 unique values
0 missing
V66nominal2 unique values
0 missing
V67nominal2 unique values
0 missing
V68nominal2 unique values
0 missing
V69nominal2 unique values
0 missing
V70nominal2 unique values
0 missing
V71nominal2 unique values
0 missing
V72nominal2 unique values
0 missing
V73nominal2 unique values
0 missing
V74nominal2 unique values
0 missing
V75nominal2 unique values
0 missing
V76nominal2 unique values
0 missing
V77nominal2 unique values
0 missing
V78nominal2 unique values
0 missing
V79nominal2 unique values
0 missing
V80nominal2 unique values
0 missing
V81nominal2 unique values
0 missing
V82nominal2 unique values
0 missing
V83nominal2 unique values
0 missing
V86nominal2 unique values
0 missing
V87nominal2 unique values
0 missing
V88nominal2 unique values
0 missing
V89nominal2 unique values
0 missing
V90nominal2 unique values
0 missing
V91nominal2 unique values
0 missing
V92nominal2 unique values
0 missing
V93nominal2 unique values
0 missing
V94nominal2 unique values
0 missing
V96nominal2 unique values
0 missing
V99nominal2 unique values
0 missing
V100nominal2 unique values
0 missing
V101nominal2 unique values
0 missing
V102nominal2 unique values
0 missing
V104nominal2 unique values
0 missing
V106nominal2 unique values
0 missing
V107nominal2 unique values
0 missing
V108nominal2 unique values
0 missing
V109nominal2 unique values
0 missing
V111nominal2 unique values
0 missing
V112nominal2 unique values
0 missing
V113nominal2 unique values
0 missing
V117nominal2 unique values
0 missing
V119nominal2 unique values
0 missing
V120nominal2 unique values
0 missing
V122nominal2 unique values
0 missing
V124nominal2 unique values
0 missing
V125nominal2 unique values
0 missing
V126numeric102 unique values
0 missing
V127nominal2 unique values
0 missing
V128nominal2 unique values
0 missing
V130nominal2 unique values
0 missing
V132nominal2 unique values
0 missing
V133nominal2 unique values
0 missing
V136nominal2 unique values
0 missing
V137nominal2 unique values
0 missing
V140nominal2 unique values
0 missing
V142nominal2 unique values
0 missing
V143nominal2 unique values
0 missing
V144nominal2 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
6
Number of numeric attributes.
95
Number of nominal attributes.
94.06
Percentage of nominal attributes.
0.5
Average class difference between consecutive instances.
5.94
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
94.06
Percentage of binary attributes.
95
Number of binary attributes.
1000
Number of instances belonging to the least frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task