DEVELOPMENT... OpenML
Data
christine_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

christine_seed_1_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset christine (41142) with seed=1 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

class (target)nominal2 unique values
0 missing
V32numeric60 unique values
0 missing
V43numeric34 unique values
0 missing
V54numeric351 unique values
0 missing
V64numeric475 unique values
0 missing
V89numeric197 unique values
0 missing
V98numeric367 unique values
0 missing
V101numeric390 unique values
0 missing
V134numeric264 unique values
0 missing
V147numeric210 unique values
0 missing
V186numeric539 unique values
0 missing
V194numeric503 unique values
0 missing
V195numeric58 unique values
0 missing
V211numeric541 unique values
0 missing
V223numeric329 unique values
0 missing
V242numeric325 unique values
0 missing
V257nominal1 unique values
0 missing
V321numeric193 unique values
0 missing
V349numeric449 unique values
0 missing
V386numeric435 unique values
0 missing
V399numeric395 unique values
0 missing
V414numeric264 unique values
0 missing
V423numeric252 unique values
0 missing
V424numeric22 unique values
0 missing
V441numeric88 unique values
0 missing
V444numeric602 unique values
0 missing
V466numeric369 unique values
0 missing
V469numeric514 unique values
0 missing
V475numeric260 unique values
0 missing
V483numeric428 unique values
0 missing
V515numeric382 unique values
0 missing
V516numeric196 unique values
0 missing
V561numeric136 unique values
0 missing
V583numeric327 unique values
0 missing
V595numeric197 unique values
0 missing
V603numeric386 unique values
0 missing
V634numeric518 unique values
0 missing
V636numeric80 unique values
0 missing
V656numeric460 unique values
0 missing
V673numeric335 unique values
0 missing
V680numeric272 unique values
0 missing
V688nominal1 unique values
0 missing
V708numeric213 unique values
0 missing
V712numeric294 unique values
0 missing
V728numeric337 unique values
0 missing
V731numeric429 unique values
0 missing
V740numeric377 unique values
0 missing
V742numeric374 unique values
0 missing
V768numeric247 unique values
0 missing
V788numeric497 unique values
0 missing
V792numeric381 unique values
0 missing
V825numeric275 unique values
0 missing
V830numeric375 unique values
0 missing
V833nominal1 unique values
0 missing
V840numeric250 unique values
0 missing
V853numeric468 unique values
0 missing
V855numeric568 unique values
0 missing
V861numeric318 unique values
0 missing
V892numeric278 unique values
0 missing
V947numeric347 unique values
0 missing
V962numeric467 unique values
0 missing
V985numeric139 unique values
0 missing
V999numeric465 unique values
0 missing
V1000numeric409 unique values
0 missing
V1038numeric418 unique values
0 missing
V1098numeric149 unique values
0 missing
V1151numeric157 unique values
0 missing
V1157numeric475 unique values
0 missing
V1158numeric275 unique values
0 missing
V1163numeric207 unique values
0 missing
V1174numeric374 unique values
0 missing
V1185numeric445 unique values
0 missing
V1188numeric335 unique values
0 missing
V1229nominal1 unique values
0 missing
V1234numeric240 unique values
0 missing
V1242numeric490 unique values
0 missing
V1246numeric422 unique values
0 missing
V1247numeric405 unique values
0 missing
V1270numeric435 unique values
0 missing
V1277numeric388 unique values
0 missing
V1283numeric255 unique values
0 missing
V1306numeric131 unique values
0 missing
V1307numeric482 unique values
0 missing
V1341numeric209 unique values
0 missing
V1345numeric360 unique values
0 missing
V1348numeric343 unique values
0 missing
V1366numeric198 unique values
0 missing
V1382numeric169 unique values
0 missing
V1407numeric267 unique values
0 missing
V1422numeric452 unique values
0 missing
V1464numeric293 unique values
0 missing
V1465numeric175 unique values
0 missing
V1472numeric64 unique values
0 missing
V1476numeric299 unique values
0 missing
V1526numeric101 unique values
0 missing
V1533numeric475 unique values
0 missing
V1549numeric5 unique values
0 missing
V1554numeric232 unique values
0 missing
V1581numeric88 unique values
0 missing
V1600numeric427 unique values
0 missing
V1631numeric250 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
96
Number of numeric attributes.
5
Number of nominal attributes.
4.95
Percentage of nominal attributes.
0.49
Average class difference between consecutive instances.
95.05
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
1.98
Percentage of binary attributes.
2
Number of binary attributes.
1000
Number of instances belonging to the least frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task