DEVELOPMENT... OpenML
Data
MiniBooNE_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

MiniBooNE_seed_0_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset MiniBooNE (44128) with seed=0 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

51 features

signal (target)nominal2 unique values
0 missing
ParticleID_26numeric1939 unique values
0 missing
ParticleID_25numeric1994 unique values
0 missing
ParticleID_27numeric1971 unique values
0 missing
ParticleID_28numeric1986 unique values
0 missing
ParticleID_29numeric1994 unique values
0 missing
ParticleID_30numeric1995 unique values
0 missing
ParticleID_31numeric1978 unique values
0 missing
ParticleID_32numeric1990 unique values
0 missing
ParticleID_33numeric1989 unique values
0 missing
ParticleID_34numeric1992 unique values
0 missing
ParticleID_35numeric1990 unique values
0 missing
ParticleID_36numeric1994 unique values
0 missing
ParticleID_37numeric1990 unique values
0 missing
ParticleID_38numeric1977 unique values
0 missing
ParticleID_39numeric1989 unique values
0 missing
ParticleID_40numeric1976 unique values
0 missing
ParticleID_41numeric1995 unique values
0 missing
ParticleID_42numeric1994 unique values
0 missing
ParticleID_43numeric1988 unique values
0 missing
ParticleID_44numeric604 unique values
0 missing
ParticleID_45numeric1990 unique values
0 missing
ParticleID_46numeric1994 unique values
0 missing
ParticleID_47numeric1993 unique values
0 missing
ParticleID_48numeric1995 unique values
0 missing
ParticleID_49numeric1990 unique values
0 missing
ParticleID_13numeric1964 unique values
0 missing
ParticleID_1numeric1988 unique values
0 missing
ParticleID_2numeric1994 unique values
0 missing
ParticleID_3numeric1985 unique values
0 missing
ParticleID_4numeric1362 unique values
0 missing
ParticleID_5numeric1697 unique values
0 missing
ParticleID_6numeric1980 unique values
0 missing
ParticleID_7numeric1986 unique values
0 missing
ParticleID_8numeric1970 unique values
0 missing
ParticleID_9numeric1984 unique values
0 missing
ParticleID_10numeric1962 unique values
0 missing
ParticleID_11numeric1993 unique values
0 missing
ParticleID_12numeric1993 unique values
0 missing
ParticleID_0numeric1990 unique values
0 missing
ParticleID_14numeric1986 unique values
0 missing
ParticleID_15numeric1990 unique values
0 missing
ParticleID_16numeric1993 unique values
0 missing
ParticleID_17numeric1993 unique values
0 missing
ParticleID_18numeric1875 unique values
0 missing
ParticleID_19numeric1993 unique values
0 missing
ParticleID_20numeric1994 unique values
0 missing
ParticleID_21numeric1879 unique values
0 missing
ParticleID_22numeric1994 unique values
0 missing
ParticleID_23numeric1992 unique values
0 missing
ParticleID_24numeric1979 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
51
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
50
Number of numeric attributes.
1
Number of nominal attributes.
1.96
Percentage of nominal attributes.
0.48
Average class difference between consecutive instances.
98.04
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
1.96
Percentage of binary attributes.
1
Number of binary attributes.
1000
Number of instances belonging to the least frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the most frequent class.
0.03
Number of attributes divided by the number of instances.

0 tasks

Define a new task