API¶
-
class
AutoMLClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶ Create/instantiate an AutoMLClassifier object.
- Parameters
time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.
-
add_primitives(new_primitives)¶ Add new primitives to the search space.
- Parameters
new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR
-
blacklist_primitives(exclude_primitives)¶ Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
export_pipeline_code(pipeline_id)¶ Export Pipeline to executable .py file.
- Parameters
pipeline_id – Id of a pipeline
-
fit(X, y)¶ Search for pipelines and fit the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
-
fit_pipeline(pipeline_id)¶ Fit a pipeline given its id.
- Parameters
pipeline_id – Id of a pipeline
-
get_leaderboard()¶ Return the leaderboard.
- Returns
The leaderboard
-
get_pipeline(pipeline_id=None)¶ Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
- Returns
A Pipeline object
-
plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶ Plot PipelineProfiler visualization.
- Parameters
precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types
-
plot_leaderboard(use_print=False)¶ Plot the leaderboard.
- Parameters
use_print – Whether or not to use a regular print
- Returns
The leaderboard
-
plot_pipeline(pipeline_id=None, use_print=False)¶ Plot a pipeline, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print
-
predict(X)¶ Predict classes for X using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
- Returns
The predictions
-
predict_pipeline(X, pipeline_id)¶ Predict classes for X given the id of a pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline
- Returns
The predictions
-
score(X, y)¶ Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
- Returns
A dict with metric and performance
-
score_pipeline(X, y, pipeline_id)¶ Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline
- Returns
A dict with metric and performance
-
whitelist_primitives(include_primitives)¶ Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
class
AutoMLRegressor(time_bound=15, metric='mean_absolute_error', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶ Create/instantiate an AutoMLRegressor object.
- Parameters
time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.
-
add_primitives(new_primitives)¶ Add new primitives to the search space.
- Parameters
new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR
-
blacklist_primitives(exclude_primitives)¶ Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
export_pipeline_code(pipeline_id)¶ Export Pipeline to executable .py file.
- Parameters
pipeline_id – Id of a pipeline
-
fit(X, y)¶ Search for pipelines and fit the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
-
fit_pipeline(pipeline_id)¶ Fit a pipeline given its id.
- Parameters
pipeline_id – Id of a pipeline
-
get_leaderboard()¶ Return the leaderboard.
- Returns
The leaderboard
-
get_pipeline(pipeline_id=None)¶ Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
- Returns
A Pipeline object
-
plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶ Plot PipelineProfiler visualization.
- Parameters
precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types
-
plot_leaderboard(use_print=False)¶ Plot the leaderboard.
- Parameters
use_print – Whether or not to use a regular print
- Returns
The leaderboard
-
plot_pipeline(pipeline_id=None, use_print=False)¶ Plot a pipeline, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print
-
predict(X)¶ Predict classes for X using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
- Returns
The predictions
-
predict_pipeline(X, pipeline_id)¶ Predict classes for X given the id of a pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline
- Returns
The predictions
-
score(X, y)¶ Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
- Returns
A dict with metric and performance
-
score_pipeline(X, y, pipeline_id)¶ Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline
- Returns
A dict with metric and performance
-
whitelist_primitives(include_primitives)¶ Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
class
AutoMLTimeSeries(time_bound=15, metric='mean_squared_error', split_strategy='timeseries', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20, date_column=None, target_column=None)¶ Create/instantiate an AutoMLTimeSeries object.
- Parameters
time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for TimeSeriesSplit, E.g. n_splits and test_size(int).
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.
-
fit(X, y=None)¶ Search for pipelines and fit the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
-
add_primitives(new_primitives)¶ Add new primitives to the search space.
- Parameters
new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR
-
blacklist_primitives(exclude_primitives)¶ Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
export_pipeline_code(pipeline_id)¶ Export Pipeline to executable .py file.
- Parameters
pipeline_id – Id of a pipeline
-
fit_pipeline(pipeline_id)¶ Fit a pipeline given its id.
- Parameters
pipeline_id – Id of a pipeline
-
get_leaderboard()¶ Return the leaderboard.
- Returns
The leaderboard
-
get_pipeline(pipeline_id=None)¶ Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
- Returns
A Pipeline object
-
plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶ Plot PipelineProfiler visualization.
- Parameters
precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types
-
plot_leaderboard(use_print=False)¶ Plot the leaderboard.
- Parameters
use_print – Whether or not to use a regular print
- Returns
The leaderboard
-
plot_pipeline(pipeline_id=None, use_print=False)¶ Plot a pipeline, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print
-
predict(X)¶ Predict classes for X using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
- Returns
The predictions
-
predict_pipeline(X, pipeline_id)¶ Predict classes for X given the id of a pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline
- Returns
The predictions
-
score(X, y)¶ Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
- Returns
A dict with metric and performance
-
score_pipeline(X, y, pipeline_id)¶ Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline
- Returns
A dict with metric and performance
-
whitelist_primitives(include_primitives)¶ Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
class
AutoMLSemiSupervisedClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶ Create/instantiate an AutoMLSemiSupervisedClassifier object.
- Parameters
time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy. In SemiSupervised case, n_splits and `test_size`(test proportion from 0 to 1) can be pass to the splitter.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.
-
add_primitives(new_primitives)¶ Add new primitives to the search space.
- Parameters
new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR
-
blacklist_primitives(exclude_primitives)¶ Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]
-
export_pipeline_code(pipeline_id)¶ Export Pipeline to executable .py file.
- Parameters
pipeline_id – Id of a pipeline
-
fit(X, y)¶ Search for pipelines and fit the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
-
fit_pipeline(pipeline_id)¶ Fit a pipeline given its id.
- Parameters
pipeline_id – Id of a pipeline
-
get_leaderboard()¶ Return the leaderboard.
- Returns
The leaderboard
-
get_pipeline(pipeline_id=None)¶ Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
- Returns
A Pipeline object
-
plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶ Plot PipelineProfiler visualization.
- Parameters
precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types
-
plot_leaderboard(use_print=False)¶ Plot the leaderboard.
- Parameters
use_print – Whether or not to use a regular print
- Returns
The leaderboard
-
plot_pipeline(pipeline_id=None, use_print=False)¶ Plot a pipeline, if pipeline_id is None, return the best pipeline.
- Parameters
pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print
-
predict(X)¶ Predict classes for X using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
- Returns
The predictions
-
predict_pipeline(X, pipeline_id)¶ Predict classes for X given the id of a pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline
- Returns
The predictions
-
score(X, y)¶ Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
- Returns
A dict with metric and performance
-
score_pipeline(X, y, pipeline_id)¶ Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.
- Parameters
X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline
- Returns
A dict with metric and performance
-
whitelist_primitives(include_primitives)¶ Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.
For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]