cv | Determines the cross-validation splitting strategy
Possible inputs for cv are:
- None, to use the default 3-fold cross validation,
- integer, to specify the number of folds in a `(Stratified)KFold`,
- An object to be used as a cross-validation generator
- An iterable yielding train, test splits
For integer/None inputs, if the estimator is a classifier and ``y`` is
either binary or multiclass, :class:`StratifiedKFold` is used. In all
other cases, :class:`KFold` is used
Refer :ref:`User Guide ` for the various
cross-validation strategies that can be used here | default: null |
error_score | Value to assign to the score if an error occurs in estimator fitting
If set to 'raise', the error is raised. If a numeric value is given,
FitFailedWarning is raised. This parameter does not affect the refit
step, which will always raise the error | default: "raise" |
estimator | A object of that type is instantiated for each grid point
This is assumed to implement the scikit-learn estimator interface
Either estimator needs to provide a ``score`` function,
or ``scoring`` must be passed | default: {"oml-python:serialized_object": "component_reference", "value": {"key": "estimator", "step_name": null}} |
fit_params | Parameters to pass to the fit method | |
iid | If True, the data is assumed to be identically distributed across
the folds, and the loss minimized is the total loss per sample,
and not the mean loss across the folds | default: true |
n_iter | Number of parameter settings that are sampled. n_iter trades
off runtime vs quality of the solution | default: 10 |
n_jobs | Number of jobs to run in parallel | default: 1 |
param_distributions | Dictionary with parameters names (string) as keys and distributions
or lists of parameters to try. Distributions must provide a ``rvs``
method for sampling (such as those from scipy.stats.distributions)
If a list is given, it is sampled uniformly | default: {"bootstrap": [true, false], "criterion": ["gini", "entropy"], "max_depth": [3, null], "max_features": [1, 2, 3, 4]} |
pre_dispatch | Controls the number of jobs that get dispatched during parallel
execution. Reducing this number can be useful to avoid an
explosion of memory consumption when more jobs get dispatched
than CPUs can process. This parameter can be:
- None, in which case all the jobs are immediately
created and spawned. Use this for lightweight and
fast-running jobs, to avoid delays due to on-demand
spawning of the jobs
- An int, giving the exact number of total jobs that are
spawned
- A string, giving an expression as a function of n_jobs,
as in '2*n_jobs' | default: "2*n_jobs" |
random_state | Pseudo random number generator state used for random uniform sampling
from lists of possible values instead of scipy.stats distributions | default: 42 |
refit | Refit the best estimator with the entire dataset
If "False", it is impossible to make predictions using
this RandomizedSearchCV instance after fitting | default: true |
return_train_score | If ``'False'``, the ``cv_results_`` attribute will not include training
scores. | default: true |
scoring | A string (see model evaluation documentation) or
a scorer callable object / function with signature
``scorer(estimator, X, y)``
If ``None``, the ``score`` method of the estimator is used | default: null |
verbose | Controls the verbosity: the higher, the more messages | default: 0 |