DEVELOPMENT... OpenML
Data
KDDCup09-Upselling_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

KDDCup09-Upselling_seed_4_nrows_2000_nclasses_10_ncols_100_stratify_True

active ARFF Publicly available Visibility: public Uploaded 17-11-2022 by David Wilson
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Subsampling of the dataset KDDCup09-Upselling (43072) with seed=4 args.nrows=2000 args.ncols=100 args.nclasses=10 args.no_stratify=True Generated with the following source code: ```python def subsample( self, seed: int, nrows_max: int = 2_000, ncols_max: int = 100, nclasses_max: int = 10, stratified: bool = True, ) -> Dataset: rng = np.random.default_rng(seed) x = self.x y = self.y # Uniformly sample classes = y.unique() if len(classes) > nclasses_max: vcs = y.value_counts() selected_classes = rng.choice( classes, size=nclasses_max, replace=False, p=vcs / sum(vcs), ) # Select the indices where one of these classes is present idxs = y.index[y.isin(classes)] x = x.iloc[idxs] y = y.iloc[idxs] # Uniformly sample columns if required if len(x.columns) > ncols_max: columns_idxs = rng.choice( list(range(len(x.columns))), size=ncols_max, replace=False ) sorted_column_idxs = sorted(columns_idxs) selected_columns = list(x.columns[sorted_column_idxs]) x = x[selected_columns] else: sorted_column_idxs = list(range(len(x.columns))) if len(x) > nrows_max: # Stratify accordingly target_name = y.name data = pd.concat((x, y), axis="columns") _, subset = train_test_split( data, test_size=nrows_max, stratify=data[target_name], shuffle=True, random_state=seed, ) x = subset.drop(target_name, axis="columns") y = subset[target_name] # We need to convert categorical columns to string for openml categorical_mask = [self.categorical_mask[i] for i in sorted_column_idxs] columns = list(x.columns) return Dataset( # Technically this is not the same but it's where it was derived from dataset=self.dataset, x=x, y=y, categorical_mask=categorical_mask, columns=columns, ) ```

101 features

upselling (target)nominal2 unique values
0 missing
Var403numeric3 unique values
0 missing
Var520numeric2 unique values
0 missing
Var900numeric2 unique values
0 missing
Var1181numeric2 unique values
0 missing
Var1201numeric4 unique values
0 missing
Var1344numeric2 unique values
0 missing
Var1720numeric1 unique values
0 missing
Var1908numeric1 unique values
0 missing
Var1985numeric16 unique values
1984 missing
Var2072numeric2 unique values
0 missing
Var2455numeric10 unique values
1947 missing
Var2548numeric7 unique values
0 missing
Var2592numeric2 unique values
0 missing
Var2643numeric3 unique values
0 missing
Var2681numeric1 unique values
0 missing
Var2846numeric1 unique values
0 missing
Var2990numeric14 unique values
0 missing
Var3090numeric6 unique values
0 missing
Var3235numeric2 unique values
0 missing
Var3262numeric78 unique values
0 missing
Var3303numeric2 unique values
0 missing
Var3332numeric3 unique values
0 missing
Var3990numeric3 unique values
0 missing
Var4201numeric12 unique values
0 missing
Var4519numeric1 unique values
0 missing
Var5022numeric2 unique values
0 missing
Var5210numeric1 unique values
0 missing
Var5399numeric1529 unique values
0 missing
Var5420numeric5 unique values
0 missing
Var5482numeric2 unique values
0 missing
Var5495numeric88 unique values
0 missing
Var5588numeric3 unique values
0 missing
Var5707numeric1 unique values
0 missing
Var6070numeric13 unique values
0 missing
Var6394numeric2 unique values
0 missing
Var6728numeric2 unique values
0 missing
Var6859numeric7 unique values
0 missing
Var6897numeric2 unique values
0 missing
Var7086numeric3 unique values
0 missing
Var7095numeric42 unique values
0 missing
Var7145numeric2 unique values
0 missing
Var7329numeric1 unique values
0 missing
Var7346numeric2 unique values
0 missing
Var7408numeric1 unique values
0 missing
Var7418numeric2 unique values
0 missing
Var7423numeric3 unique values
0 missing
Var7445numeric1 unique values
0 missing
Var7585numeric2 unique values
0 missing
Var7755numeric1 unique values
0 missing
Var7774numeric4 unique values
0 missing
Var7959numeric2 unique values
0 missing
Var8026numeric1 unique values
0 missing
Var8077numeric1680 unique values
0 missing
Var8523numeric2 unique values
0 missing
Var8575numeric2 unique values
0 missing
Var8621numeric306 unique values
0 missing
Var8669numeric2 unique values
0 missing
Var8760numeric2 unique values
0 missing
Var9010numeric20 unique values
0 missing
Var9050numeric1 unique values
0 missing
Var9303numeric1 unique values
0 missing
Var9546numeric2 unique values
0 missing
Var9863numeric1 unique values
0 missing
Var9898numeric2 unique values
0 missing
Var10032numeric65 unique values
0 missing
Var10070numeric624 unique values
0 missing
Var10323numeric7 unique values
0 missing
Var10478numeric2 unique values
0 missing
Var10768numeric20 unique values
0 missing
Var10885numeric2 unique values
0 missing
Var11489numeric1 unique values
0 missing
Var11719numeric9 unique values
0 missing
Var11902numeric1 unique values
0 missing
Var12004numeric2 unique values
0 missing
Var12073numeric4 unique values
0 missing
Var12731numeric1 unique values
0 missing
Var12940numeric2 unique values
0 missing
Var13071numeric14 unique values
0 missing
Var13155numeric1 unique values
0 missing
Var13197numeric1 unique values
0 missing
Var13333numeric1 unique values
0 missing
Var13391numeric2 unique values
0 missing
Var13399numeric12 unique values
0 missing
Var13809numeric38 unique values
1936 missing
Var13817numeric1 unique values
0 missing
Var13855numeric2 unique values
0 missing
Var13950numeric2 unique values
0 missing
Var13986numeric1 unique values
0 missing
Var13988numeric71 unique values
0 missing
Var14022numeric10 unique values
0 missing
Var14270numeric1 unique values
0 missing
Var14332numeric5 unique values
0 missing
Var14367numeric1 unique values
0 missing
Var14397numeric16 unique values
0 missing
Var14403numeric1 unique values
0 missing
Var14484numeric1 unique values
0 missing
Var14530numeric1 unique values
0 missing
Var14625numeric2 unique values
0 missing
Var14882nominal5 unique values
1872 missing
Var14896nominal14 unique values
1687 missing

19 properties

2000
Number of instances (rows) of the dataset.
101
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
9426
Number of missing values in the dataset.
2000
Number of instances with at least one value missing.
98
Number of numeric attributes.
3
Number of nominal attributes.
2.97
Percentage of nominal attributes.
0.86
Average class difference between consecutive instances.
97.03
Percentage of numeric attributes.
4.67
Percentage of missing values.
100
Percentage of instances having missing values.
0.99
Percentage of binary attributes.
1
Number of binary attributes.
147
Number of instances belonging to the least frequent class.
7.35
Percentage of instances belonging to the least frequent class.
1853
Number of instances belonging to the most frequent class.
92.65
Percentage of instances belonging to the most frequent class.
0.05
Number of attributes divided by the number of instances.

0 tasks

Define a new task