Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark. Original description:
Author: UCI
Source: [original](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets) -
Please cite:
This is the poker dataset, retrieved 2013-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows:
-join test and train datasets (non-scaled versions)
-relabel classes 0=positive class and 1,2,...9=negative class
-normalize each file columnwise according to the following rules:
-If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity.
-If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one.
-If a column contains more than two values (multinary/real feature), the column is divided by its std deviation.
NOTE: please keep in mind that poker has a mild redundancy, e.g. some duplicated data points, roughly 0.2%, within each file (train,test). these duplicated points have not been removed!