DEVELOPMENT... OpenML
Data
MiniBooNE_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

MiniBooNE_seed_2_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset MiniBooNE (44128) with seed=2 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

51 features

signal (target)nominal2 unique values
0 missing
ParticleID_26numeric1918 unique values
0 missing
ParticleID_25numeric1998 unique values
0 missing
ParticleID_27numeric1978 unique values
0 missing
ParticleID_28numeric1985 unique values
0 missing
ParticleID_29numeric1998 unique values
0 missing
ParticleID_30numeric1995 unique values
0 missing
ParticleID_31numeric1985 unique values
0 missing
ParticleID_32numeric1997 unique values
0 missing
ParticleID_33numeric1994 unique values
0 missing
ParticleID_34numeric1989 unique values
0 missing
ParticleID_35numeric1997 unique values
0 missing
ParticleID_36numeric1997 unique values
0 missing
ParticleID_37numeric1995 unique values
0 missing
ParticleID_38numeric1973 unique values
0 missing
ParticleID_39numeric1992 unique values
0 missing
ParticleID_40numeric1979 unique values
0 missing
ParticleID_41numeric1998 unique values
0 missing
ParticleID_42numeric1999 unique values
0 missing
ParticleID_43numeric1999 unique values
0 missing
ParticleID_44numeric618 unique values
0 missing
ParticleID_45numeric1995 unique values
0 missing
ParticleID_46numeric1998 unique values
0 missing
ParticleID_47numeric1997 unique values
0 missing
ParticleID_48numeric1998 unique values
0 missing
ParticleID_49numeric1988 unique values
0 missing
ParticleID_13numeric1966 unique values
0 missing
ParticleID_1numeric1996 unique values
0 missing
ParticleID_2numeric1994 unique values
0 missing
ParticleID_3numeric1981 unique values
0 missing
ParticleID_4numeric1373 unique values
0 missing
ParticleID_5numeric1690 unique values
0 missing
ParticleID_6numeric1989 unique values
0 missing
ParticleID_7numeric1988 unique values
0 missing
ParticleID_8numeric1974 unique values
0 missing
ParticleID_9numeric1990 unique values
0 missing
ParticleID_10numeric1967 unique values
0 missing
ParticleID_11numeric1994 unique values
0 missing
ParticleID_12numeric1994 unique values
0 missing
ParticleID_0numeric1991 unique values
0 missing
ParticleID_14numeric1989 unique values
0 missing
ParticleID_15numeric1994 unique values
0 missing
ParticleID_16numeric1997 unique values
0 missing
ParticleID_17numeric1998 unique values
0 missing
ParticleID_18numeric1880 unique values
0 missing
ParticleID_19numeric1996 unique values
0 missing
ParticleID_20numeric1998 unique values
0 missing
ParticleID_21numeric1871 unique values
0 missing
ParticleID_22numeric1995 unique values
0 missing
ParticleID_23numeric1995 unique values
0 missing
ParticleID_24numeric1974 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
51
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
50
Number of numeric attributes.
1
Number of nominal attributes.
1.96
Percentage of nominal attributes.
0.5
Average class difference between consecutive instances.
98.04
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
1.96
Percentage of binary attributes.
1
Number of binary attributes.
1000
Number of instances belonging to the least frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the most frequent class.
0.03
Number of attributes divided by the number of instances.

0 tasks

Define a new task