API

class AutoMLClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)

Create/instantiate an AutoMLClassifier object.

Parameters
  • time_bound – Limit time in minutes to perform the search.

  • metric – A str (see in the documentation the list of available metrics) or a callable object/function.

  • split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.

  • time_bound_run – Limit time in minutes to score a pipeline.

  • score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.

  • metric_kwargs – Additional arguments for metric.

  • split_strategy_kwargs – Additional arguments for splitting_strategy.

  • output_folder – Path to the output directory. If it is None, create a temp folder automatically.

  • checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.

  • num_cpus – Number of CPUs to be used.

  • start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.

  • verbose – The logs level.

add_primitives(new_primitives)

Add new primitives to the search space.

Parameters

new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)

Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)

Export Pipeline to executable .py file.

Parameters

pipeline_id – Id of a pipeline

fit(X, y)

Search for pipelines and fit the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)

Fit a pipeline given its id.

Parameters

pipeline_id – Id of a pipeline

get_leaderboard()

Return the leaderboard.

Returns

The leaderboard

get_pipeline(pipeline_id=None)

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline

Returns

A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)

Plot PipelineProfiler visualization.

Parameters
  • precomputed_pipelines – Pre-calculated list of pipelines

  • precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)

Plot the leaderboard.

Parameters

use_print – Whether or not to use a regular print

Returns

The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters
  • pipeline_id – Id of a pipeline

  • use_print – Whether or not to use a regular print

predict(X)

Predict classes for X using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

Returns

The predictions

predict_pipeline(X, pipeline_id)

Predict classes for X given the id of a pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

  • pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)

Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLRegressor(time_bound=15, metric='mean_absolute_error', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)

Create/instantiate an AutoMLRegressor object.

Parameters
  • time_bound – Limit time in minutes to perform the search.

  • metric – A str (see in the documentation the list of available metrics) or a callable object/function.

  • split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.

  • time_bound_run – Limit time in minutes to score a pipeline.

  • score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.

  • metric_kwargs – Additional arguments for metric.

  • split_strategy_kwargs – Additional arguments for splitting_strategy.

  • output_folder – Path to the output directory. If it is None, create a temp folder automatically.

  • checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.

  • num_cpus – Number of CPUs to be used.

  • start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.

  • verbose – The logs level.

add_primitives(new_primitives)

Add new primitives to the search space.

Parameters

new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)

Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)

Export Pipeline to executable .py file.

Parameters

pipeline_id – Id of a pipeline

fit(X, y)

Search for pipelines and fit the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)

Fit a pipeline given its id.

Parameters

pipeline_id – Id of a pipeline

get_leaderboard()

Return the leaderboard.

Returns

The leaderboard

get_pipeline(pipeline_id=None)

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline

Returns

A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)

Plot PipelineProfiler visualization.

Parameters
  • precomputed_pipelines – Pre-calculated list of pipelines

  • precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)

Plot the leaderboard.

Parameters

use_print – Whether or not to use a regular print

Returns

The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters
  • pipeline_id – Id of a pipeline

  • use_print – Whether or not to use a regular print

predict(X)

Predict classes for X using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

Returns

The predictions

predict_pipeline(X, pipeline_id)

Predict classes for X given the id of a pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

  • pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)

Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLTimeSeries(time_bound=15, metric='mean_squared_error', split_strategy='timeseries', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20, date_column=None, target_column=None)

Create/instantiate an AutoMLTimeSeries object.

Parameters
  • time_bound – Limit time in minutes to perform the search.

  • metric – A str (see in the documentation the list of available metrics) or a callable object/function.

  • split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.

  • time_bound_run – Limit time in minutes to score a pipeline.

  • score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.

  • metric_kwargs – Additional arguments for metric.

  • split_strategy_kwargs – Additional arguments for TimeSeriesSplit, E.g. n_splits and test_size(int).

  • output_folder – Path to the output directory. If it is None, create a temp folder automatically.

  • checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.

  • num_cpus – Number of CPUs to be used.

  • start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.

  • verbose – The logs level.

fit(X, y=None)

Search for pipelines and fit the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

add_primitives(new_primitives)

Add new primitives to the search space.

Parameters

new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)

Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)

Export Pipeline to executable .py file.

Parameters

pipeline_id – Id of a pipeline

fit_pipeline(pipeline_id)

Fit a pipeline given its id.

Parameters

pipeline_id – Id of a pipeline

get_leaderboard()

Return the leaderboard.

Returns

The leaderboard

get_pipeline(pipeline_id=None)

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline

Returns

A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)

Plot PipelineProfiler visualization.

Parameters
  • precomputed_pipelines – Pre-calculated list of pipelines

  • precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)

Plot the leaderboard.

Parameters

use_print – Whether or not to use a regular print

Returns

The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters
  • pipeline_id – Id of a pipeline

  • use_print – Whether or not to use a regular print

predict(X)

Predict classes for X using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

Returns

The predictions

predict_pipeline(X, pipeline_id)

Predict classes for X given the id of a pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

  • pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)

Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

class AutoMLSemiSupervisedClassifier(time_bound=15, metric='accuracy_score', split_strategy='holdout', time_bound_run=5, score_sorting='auto', metric_kwargs=None, split_strategy_kwargs=None, output_folder=None, checkpoints_folder=None, num_cpus=None, start_mode='auto', verbose=20)

Create/instantiate an AutoMLSemiSupervisedClassifier object.

Parameters
  • time_bound – Limit time in minutes to perform the search.

  • metric – A str (see in the documentation the list of available metrics) or a callable object/function.

  • split_strategy – Method to score the pipeline: holdout, cross_validation or an instance of BaseCrossValidator, BaseShuffleSplit, RepeatedSplits.

  • time_bound_run – Limit time in minutes to score a pipeline.

  • score_sorting – The sort used to order the scores. It could be auto, ascending or descending. auto is used for the built-in metrics. For the user-defined metrics, this param must be passed.

  • metric_kwargs – Additional arguments for metric.

  • split_strategy_kwargs – Additional arguments for splitting_strategy. In SemiSupervised case, n_splits and `test_size`(test proportion from 0 to 1) can be pass to the splitter.

  • output_folder – Path to the output directory. If it is None, create a temp folder automatically.

  • checkpoints_folder – Path to the directory to load and save the checkpoints. If it is None, it will use the default checkpoints and save the new checkpoints in output_folder.

  • num_cpus – Number of CPUs to be used.

  • start_mode – The mode to start the multiprocessing library. It could be auto, fork or spawn.

  • verbose – The logs level.

add_primitives(new_primitives)

Add new primitives to the search space.

Parameters

new_primitives – Set of new primitives, tuples of name and object primitive. Possible names are: IMPUTER, CATEGORICAL_ENCODER, DATETIME_ENCODER, TEXT_ENCODER, IMAGE_ENCODER, FEATURE_GENERATOR, FEATURE_SCALER, FEATURE_SELECTOR, CLASSIFICATION_SINGLE_ENSEMBLER, CLASSIFICATION_MULTI_ENSEMBLER, REGRESSION_SINGLE_ENSEMBLER, REGRESSION_MULTI_ENSEMBLER, CLASSIFIER, REGRESSOR, CLUSTERER, TIME_SERIES_FORECASTER, SEMISUPERVISED_SELFTRAINER, and SEMISUPERVISED_LABELPROPAGATOR

blacklist_primitives(exclude_primitives)

Blacklist primitives to the search space. :param exclude_primitives: List of tuples (primitive type, primitive ID) to be removed from the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]

export_pipeline_code(pipeline_id)

Export Pipeline to executable .py file.

Parameters

pipeline_id – Id of a pipeline

fit(X, y)

Search for pipelines and fit the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

fit_pipeline(pipeline_id)

Fit a pipeline given its id.

Parameters

pipeline_id – Id of a pipeline

get_leaderboard()

Return the leaderboard.

Returns

The leaderboard

get_pipeline(pipeline_id=None)

Return a pipeline given its pipeline id, if pipeline_id is None, return the best pipeline.

Parameters

pipeline_id – Id of a pipeline

Returns

A Pipeline object

plot_comparison_pipelines(precomputed_pipelines=None, precomputed_primitive_types=None)

Plot PipelineProfiler visualization.

Parameters
  • precomputed_pipelines – Pre-calculated list of pipelines

  • precomputed_primitive_types – Pre-calculated list of primitive types

plot_leaderboard(use_print=False)

Plot the leaderboard.

Parameters

use_print – Whether or not to use a regular print

Returns

The leaderboard

plot_pipeline(pipeline_id=None, use_print=False)

Plot a pipeline, if pipeline_id is None, return the best pipeline.

Parameters
  • pipeline_id – Id of a pipeline

  • use_print – Whether or not to use a regular print

predict(X)

Predict classes for X using the best pipeline.

Parameters

X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

Returns

The predictions

predict_pipeline(X, pipeline_id)

Predict classes for X given the id of a pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • pipeline_id – Id of a pipeline

Returns

The predictions

score(X, y)

Return the performance (using the chosen metric) on the given test data and labels using the best pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

Returns

A dict with metric and performance

score_pipeline(X, y, pipeline_id)

Return the performance (using the chosen metric) on the given test data and labels using a given pipeline.

Parameters
  • X – The training input samples, array-like or sparse matrix of shape = [n_samples, n_features]

  • y – The target classes, array-like, shape = [n_samples] or [n_samples, n_outputs]

  • pipeline_id – Id of a pipeline

Returns

A dict with metric and performance

whitelist_primitives(include_primitives)

Whitelist primitives to the search space. :param include_primitives: List of tuples (primitive type, primitive ID) to be used in the search space.

For example: [(‘CLASSIFIER’, ‘sklearn.ensemble.RandomForestClassifier’), …]