DEVELOPMENT... OpenML
Data
madeline_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

madeline_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset madeline (41144) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V1numeric189 unique values
0 missing
V2numeric123 unique values
0 missing
V6numeric30 unique values
0 missing
V7numeric156 unique values
0 missing
V9numeric111 unique values
0 missing
V14numeric161 unique values
0 missing
V16numeric147 unique values
0 missing
V21numeric163 unique values
0 missing
V27numeric225 unique values
0 missing
V28numeric137 unique values
0 missing
V29numeric229 unique values
0 missing
V30numeric91 unique values
0 missing
V33numeric206 unique values
0 missing
V39numeric73 unique values
0 missing
V41numeric192 unique values
0 missing
V47numeric99 unique values
0 missing
V48numeric85 unique values
0 missing
V52numeric59 unique values
0 missing
V53numeric255 unique values
0 missing
V55numeric202 unique values
0 missing
V56numeric205 unique values
0 missing
V57numeric86 unique values
0 missing
V58numeric121 unique values
0 missing
V59numeric167 unique values
0 missing
V62numeric158 unique values
0 missing
V63numeric204 unique values
0 missing
V65numeric222 unique values
0 missing
V70numeric156 unique values
0 missing
V71numeric217 unique values
0 missing
V75numeric214 unique values
0 missing
V78numeric127 unique values
0 missing
V80numeric50 unique values
0 missing
V82numeric149 unique values
0 missing
V83numeric165 unique values
0 missing
V94numeric188 unique values
0 missing
V95numeric123 unique values
0 missing
V98numeric96 unique values
0 missing
V101numeric50 unique values
0 missing
V103numeric229 unique values
0 missing
V107numeric223 unique values
0 missing
V110numeric20 unique values
0 missing
V115numeric125 unique values
0 missing
V118numeric83 unique values
0 missing
V122numeric128 unique values
0 missing
V124numeric73 unique values
0 missing
V127numeric140 unique values
0 missing
V128numeric217 unique values
0 missing
V130numeric211 unique values
0 missing
V131numeric9 unique values
0 missing
V133numeric166 unique values
0 missing
V137numeric191 unique values
0 missing
V138numeric113 unique values
0 missing
V140numeric182 unique values
0 missing
V144numeric188 unique values
0 missing
V145numeric80 unique values
0 missing
V146numeric203 unique values
0 missing
V149numeric211 unique values
0 missing
V151numeric112 unique values
0 missing
V152numeric30 unique values
0 missing
V155numeric55 unique values
0 missing
V156numeric9 unique values
0 missing
V160numeric91 unique values
0 missing
V162numeric42 unique values
0 missing
V164numeric330 unique values
0 missing
V167numeric193 unique values
0 missing
V168numeric47 unique values
0 missing
V170numeric57 unique values
0 missing
V172numeric64 unique values
0 missing
V173numeric124 unique values
0 missing
V175numeric71 unique values
0 missing
V183numeric137 unique values
0 missing
V187numeric93 unique values
0 missing
V188numeric105 unique values
0 missing
V191numeric167 unique values
0 missing
V193numeric36 unique values
0 missing
V198numeric480 unique values
0 missing
V201numeric190 unique values
0 missing
V202numeric71 unique values
0 missing
V203numeric189 unique values
0 missing
V204numeric141 unique values
0 missing
V210numeric119 unique values
0 missing
V211numeric198 unique values
0 missing
V212numeric120 unique values
0 missing
V217numeric226 unique values
0 missing
V219numeric192 unique values
0 missing
V221numeric9 unique values
0 missing
V222numeric210 unique values
0 missing
V223numeric116 unique values
0 missing
V227numeric220 unique values
0 missing
V228numeric64 unique values
0 missing
V230numeric228 unique values
0 missing
V234numeric173 unique values
0 missing
V236numeric94 unique values
0 missing
V240numeric75 unique values
0 missing
V246numeric69 unique values
0 missing
V249numeric84 unique values
0 missing
V252numeric214 unique values
0 missing
V253numeric405 unique values
0 missing
V254numeric217 unique values
0 missing
V259numeric63 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
100
Number of numeric attributes.
1
Number of nominal attributes.
0.99
Percentage of nominal attributes.
0.5
Average class difference between consecutive instances.
99.01
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
994
Number of instances belonging to the least frequent class.
49.7
Percentage of instances belonging to the least frequent class.
1006
Number of instances belonging to the most frequent class.
50.3
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task