API¶

class AutoMLClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶

Create/instantiate an AutoMLClassifier object.

Parameters

time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.

add_primitives(new_primitives)¶

Add new primitives to the search space.

Parameters: new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)¶: Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)¶

Export Pipeline to executable .py file.

Parameters: pipeline_id – Id of a pipeline

fit(X, y)¶

Search for pipelines and fit the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)¶

Fit a pipeline given its id.

Parameters: pipeline_id – Id of a pipeline

get_leaderboard()¶

Return the leaderboard.

Returns: The leaderboard

get_pipeline(pipeline_id=None)¶

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters: pipeline_id – Id of a pipeline
Returns: A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶

Plot PipelineProfiler visualization.

Parameters

precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)¶

Plot the leaderboard.

Parameters: use_print – Whether or not to use a regular print
Returns: The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)¶

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print

predict(X)¶

Predict classes for X using the best pipeline.

Parameters: X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
Returns: The predictions

predict_pipeline(X, pipeline_id)¶

Predict classes for X given the id of a pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)¶

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)¶

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)¶: Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLRegressor(time_bound=15, metric='mean_absolute_error', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶

Create/instantiate an AutoMLRegressor object.

Parameters

time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.

add_primitives(new_primitives)¶

Add new primitives to the search space.

Parameters: new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)¶: Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)¶

Export Pipeline to executable .py file.

Parameters: pipeline_id – Id of a pipeline

fit(X, y)¶

Search for pipelines and fit the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)¶

Fit a pipeline given its id.

Parameters: pipeline_id – Id of a pipeline

get_leaderboard()¶

Return the leaderboard.

Returns: The leaderboard

get_pipeline(pipeline_id=None)¶

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters: pipeline_id – Id of a pipeline
Returns: A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶

Plot PipelineProfiler visualization.

Parameters

precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)¶

Plot the leaderboard.

Parameters: use_print – Whether or not to use a regular print
Returns: The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)¶

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print

predict(X)¶

Predict classes for X using the best pipeline.

Parameters: X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
Returns: The predictions

predict_pipeline(X, pipeline_id)¶

Predict classes for X given the id of a pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)¶

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)¶

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)¶: Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLTimeSeries(time_bound=15, metric='mean_squared_error', split_strategy='timeseries', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20, date_column=None, target_column=None)¶

Create/instantiate an AutoMLTimeSeries object.

Parameters

time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for TimeSeriesSplit, E.g. n_splits and test_size(int).
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.

fit(X, y=None)¶

Search for pipelines and fit the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

add_primitives(new_primitives)¶

Add new primitives to the search space.

Parameters: new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)¶: Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)¶

Export Pipeline to executable .py file.

Parameters: pipeline_id – Id of a pipeline

fit_pipeline(pipeline_id)¶

Fit a pipeline given its id.

Parameters: pipeline_id – Id of a pipeline

get_leaderboard()¶

Return the leaderboard.

Returns: The leaderboard

get_pipeline(pipeline_id=None)¶

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters: pipeline_id – Id of a pipeline
Returns: A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶

Plot PipelineProfiler visualization.

Parameters

precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)¶

Plot the leaderboard.

Parameters: use_print – Whether or not to use a regular print
Returns: The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)¶

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print

predict(X)¶

Predict classes for X using the best pipeline.

Parameters: X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
Returns: The predictions

predict_pipeline(X, pipeline_id)¶

Predict classes for X given the id of a pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)¶

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)¶

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)¶: Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLSemiSupervisedClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)¶

Create/instantiate an AutoMLSemiSupervisedClassifier object.

Parameters

time_bound – Limit time in minutes to perform the search.
metric – A str (see in the documentation the list of available metrics) or a callable object/function.
split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.
time_bound_run – Limit time in minutes to score a pipeline.
score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.
metric_kwargs – Additional arguments for metric.
split_strategy_kwargs – Additional arguments for splitting_strategy. In SemiSupervised case, n_splits and `test_size`(test proportion from 0 to 1) can be pass to the splitter.
output_folder – Path to the output directory. If it is None, create a temp folder automatically.
checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.
num_cpus – Number of CPUs to be used.
start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.
verbose – The logs level.

add_primitives(new_primitives)¶

Add new primitives to the search space.

Parameters: new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)¶: Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)¶

Export Pipeline to executable .py file.

Parameters: pipeline_id – Id of a pipeline

fit(X, y)¶

Search for pipelines and fit the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)¶

Fit a pipeline given its id.

Parameters: pipeline_id – Id of a pipeline

get_leaderboard()¶

Return the leaderboard.

Returns: The leaderboard

get_pipeline(pipeline_id=None)¶

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters: pipeline_id – Id of a pipeline
Returns: A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)¶

Plot PipelineProfiler visualization.

Parameters

precomputed_pipelines – Pre-calculated list of pipelines
precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)¶

Plot the leaderboard.

Parameters: use_print – Whether or not to use a regular print
Returns: The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)¶

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline
use_print – Whether or not to use a regular print

predict(X)¶

Predict classes for X using the best pipeline.

Parameters: X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
Returns: The predictions

predict_pipeline(X, pipeline_id)¶

Predict classes for X given the id of a pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)¶

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)¶

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]
y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]
pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)¶: Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]