DEVELOPMENT... OpenML
Data
MiniBooNE_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

MiniBooNE_seed_3_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Public Domain (CC0) Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset MiniBooNE (44128) with seed=3 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

51 features

signal (target)nominal2 unique values
0 missing
ParticleID_26numeric1933 unique values
0 missing
ParticleID_25numeric1996 unique values
0 missing
ParticleID_27numeric1972 unique values
0 missing
ParticleID_28numeric1987 unique values
0 missing
ParticleID_29numeric1996 unique values
0 missing
ParticleID_30numeric1994 unique values
0 missing
ParticleID_31numeric1983 unique values
0 missing
ParticleID_32numeric1994 unique values
0 missing
ParticleID_33numeric1989 unique values
0 missing
ParticleID_34numeric1985 unique values
0 missing
ParticleID_35numeric1991 unique values
0 missing
ParticleID_36numeric1995 unique values
0 missing
ParticleID_37numeric1993 unique values
0 missing
ParticleID_38numeric1979 unique values
0 missing
ParticleID_39numeric1992 unique values
0 missing
ParticleID_40numeric1986 unique values
0 missing
ParticleID_41numeric1996 unique values
0 missing
ParticleID_42numeric1996 unique values
0 missing
ParticleID_43numeric1996 unique values
0 missing
ParticleID_44numeric621 unique values
0 missing
ParticleID_45numeric1990 unique values
0 missing
ParticleID_46numeric1997 unique values
0 missing
ParticleID_47numeric1994 unique values
0 missing
ParticleID_48numeric1995 unique values
0 missing
ParticleID_49numeric1980 unique values
0 missing
ParticleID_13numeric1961 unique values
0 missing
ParticleID_1numeric1988 unique values
0 missing
ParticleID_2numeric1994 unique values
0 missing
ParticleID_3numeric1988 unique values
0 missing
ParticleID_4numeric1384 unique values
0 missing
ParticleID_5numeric1692 unique values
0 missing
ParticleID_6numeric1988 unique values
0 missing
ParticleID_7numeric1985 unique values
0 missing
ParticleID_8numeric1960 unique values
0 missing
ParticleID_9numeric1991 unique values
0 missing
ParticleID_10numeric1972 unique values
0 missing
ParticleID_11numeric1996 unique values
0 missing
ParticleID_12numeric1996 unique values
0 missing
ParticleID_0numeric1993 unique values
0 missing
ParticleID_14numeric1994 unique values
0 missing
ParticleID_15numeric1997 unique values
0 missing
ParticleID_16numeric1995 unique values
0 missing
ParticleID_17numeric1995 unique values
0 missing
ParticleID_18numeric1849 unique values
0 missing
ParticleID_19numeric1993 unique values
0 missing
ParticleID_20numeric1994 unique values
0 missing
ParticleID_21numeric1855 unique values
0 missing
ParticleID_22numeric1996 unique values
0 missing
ParticleID_23numeric1993 unique values
0 missing
ParticleID_24numeric1972 unique values
0 missing

19 properties

2000
Number of instances (rows) of the dataset.
51
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
50
Number of numeric attributes.
1
Number of nominal attributes.
1.96
Percentage of nominal attributes.
0.51
Average class difference between consecutive instances.
98.04
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
1.96
Percentage of binary attributes.
1
Number of binary attributes.
1000
Number of instances belonging to the least frequent class.
50
Percentage of instances belonging to the least frequent class.
1000
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the most frequent class.
0.03
Number of attributes divided by the number of instances.

0 tasks

Define a new task