Getting Started

Alpha-AutoML is integrated with Jupyter Notebooks. The Jupyter notebooks provide an interactive computing environment where you can generate models using Alpha-AutoML, and explore them using PipelineProfiler which is an interactive visualization aimed at producing detailed visualizations of end-to-end machine learning pipelines. Alpha-AutoML has two main components: model generation and model exploration.

First, import the class AutoMLClassifier (for classification problems). If you plan to use Alpha-AutoML for other ML tasks, please see these other examples.

[1]:
from alpha_automl import AutoMLClassifier

Model Generation

The model generation component provides methods to search pipelines.

In this example, we are generating pipelines for a CSV dataset. The 299_libras_move dataset is used for this example. This dataset contains 15 classes, where each class references to a hand movement type in LIBRAS. LIBRAS, acronym of the Portuguese name “LIngua BRAsileira de Sinais”, is the official brazilian sign language.

[2]:
import pandas as pd

train_dataset = pd.read_csv('../../examples/datasets/299_libras_move/train_data.csv')
test_dataset = pd.read_csv('../../examples/datasets/299_libras_move/test_data.csv')

Removing the target column from the features for the train dataset

[3]:
target_column = 'class'
X_train = train_dataset.drop(columns=[target_column])
X_train
[3]:
xcoord1 ycoord1 xcoord2 ycoord2 xcoord3 ycoord3 xcoord4 ycoord4 xcoord5 ycoord5 ... xcoord41 ycoord41 xcoord42 ycoord42 xcoord43 ycoord43 xcoord44 ycoord44 xcoord45 ycoord45
0 0.82979 0.76620 0.82979 0.76620 0.82979 0.77083 0.82785 0.77083 0.82979 0.76620 ... 0.41199 0.45370 0.37524 0.43750 0.33269 0.43056 0.29787 0.44213 0.26886 0.47222
1 0.80271 0.54630 0.80077 0.54398 0.80271 0.54398 0.80271 0.54630 0.80271 0.54398 ... 0.20503 0.64583 0.20503 0.68056 0.20696 0.71296 0.21083 0.74537 0.21277 0.77315
2 0.78917 0.59028 0.79110 0.59028 0.79110 0.59028 0.79304 0.59028 0.79110 0.59028 ... 0.20503 0.57407 0.19149 0.57176 0.18569 0.57407 0.18956 0.57639 0.19149 0.56944
3 0.88395 0.61574 0.88201 0.61806 0.87234 0.62037 0.87041 0.61574 0.84526 0.61574 ... 0.27079 0.65972 0.26886 0.62731 0.27660 0.59259 0.27660 0.56250 0.27853 0.53009
4 0.60155 0.77315 0.59768 0.77315 0.59961 0.77315 0.59381 0.77083 0.58801 0.75000 ... 0.63830 0.47917 0.63250 0.52778 0.63830 0.57407 0.63830 0.63194 0.63830 0.68750
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
283 0.85300 0.57639 0.85493 0.57407 0.85300 0.57639 0.85300 0.57407 0.85493 0.56944 ... 0.19923 0.80787 0.17021 0.79630 0.14313 0.75231 0.12959 0.69213 0.12959 0.62037
284 0.66925 0.78009 0.66925 0.78009 0.66925 0.78009 0.67118 0.78009 0.66925 0.78009 ... 0.67311 0.28704 0.66925 0.26389 0.66731 0.24306 0.66731 0.22454 0.66538 0.20602
285 0.57060 0.65741 0.57060 0.65741 0.57060 0.65509 0.56867 0.65046 0.54739 0.64120 ... 0.36944 0.62500 0.42166 0.62269 0.47389 0.62269 0.52418 0.63194 0.56286 0.64120
286 0.62282 0.65278 0.62476 0.65046 0.62669 0.64815 0.61315 0.63426 0.55319 0.60185 ... 0.28433 0.66435 0.30754 0.63889 0.33462 0.61574 0.37331 0.58102 0.42747 0.54398
287 0.65571 0.51157 0.65571 0.51389 0.65571 0.51157 0.65571 0.51389 0.65571 0.51157 ... 0.64603 0.46296 0.65377 0.48843 0.65571 0.49074 0.64990 0.46296 0.65377 0.45833

288 rows × 90 columns

Selecting the target column for the train dataset

[4]:
y_train = train_dataset[[target_column]]
y_train
[4]:
class
0 12
1 15
2 7
3 12
4 3
... ...
283 12
284 8
285 2
286 1
287 9

288 rows × 1 columns

The AutoMLClassifier class needs the following parameters: the output path to be used and the maximum running time (time_bound) in minutes. To perform the search of pipelines, we need to call the fit method, which receives the features and labels columns.

[5]:
automl = AutoMLClassifier(time_bound=1)
automl.fit(X_train, y_train)
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2638888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:04, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5555555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7777777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:05, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6805555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5555555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1388888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:12, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7361111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5972222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6805555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4166666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.20833333333333334
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:13, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.75
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3194444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.20833333333333334
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7916666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:14, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.75
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4583333333333333
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3472222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:15, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4166666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6666666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2638888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4166666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4305555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5555555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:16, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:18, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.375
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:18, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3194444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:18, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3888888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4305555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7361111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6805555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7638888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:19, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6944444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5555555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:20, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2916666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:21, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:21, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4027777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:21, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:21, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:22, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:22, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:22, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4583333333333333
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:22, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.06944444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4305555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2222222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3333333333333333
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4305555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4861111111111111
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:23, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5555555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:30, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5833333333333334
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7361111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1388888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:32, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4166666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3055555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.08333333333333333
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:34, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:40, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5833333333333334
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:40, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.027777777777777776
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:40, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.375
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:40, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2777777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:40, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1388888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5138888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7361111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5694444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4305555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3055555555555556
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4444444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:41, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5416666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5277777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3611111111111111
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.125
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6388888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7361111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2222222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7638888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:42, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.75
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:43, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6388888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:43, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:43, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:43, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.20833333333333334
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5972222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.08333333333333333
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6388888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.5277777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.2361111111111111
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:44, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7638888888888888
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1388888888888889
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.7916666666666666
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.4166666666666667
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6944444444444444
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.06944444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.6111111111111112
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:45, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3472222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:50, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3472222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:50, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.3472222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:53, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:53, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1527777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.19444444444444445
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.18055555555555555
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.09722222222222222
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1527777777777778
INFO:alpha_automl.automl_api:Found pipeline, time=0:00:56, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.25
INFO:alpha_automl.automl_api:Found pipeline, time=0:01:03, scoring...
INFO:alpha_automl.automl_api:Scored pipeline, score=0.1388888888888889
INFO:alpha_automl.automl_api:Found 162 pipelines

After the pipeline search is complete, we can display the leaderboard:

[6]:
automl.plot_leaderboard()
[6]:
ranking pipeline accuracy_score
1 StandardScaler, ExtraTreesClassifier 0.792
2 StandardScaler, LogisticRegression 0.792
3 MaxAbsScaler, ExtraTreesClassifier 0.778
4 MaxAbsScaler, SVC 0.764
5 SVC 0.764
6 StandardScaler, RandomForestClassifier 0.764
7 ExtraTreesClassifier 0.750
8 RobustScaler, ExtraTreesClassifier 0.750
9 StandardScaler, SVC 0.750
10 MaxAbsScaler, LinearSVC 0.736
11 MaxAbsScaler, RandomForestClassifier 0.736
12 LinearSVC 0.736
13 StandardScaler, LinearSVC 0.736
14 RandomForestClassifier 0.736
15 MaxAbsScaler, BaggingClassifier 0.694
16 StandardScaler, BaggingClassifier 0.694
17 MaxAbsScaler, KNeighborsClassifier 0.681
18 MaxAbsScaler, LogisticRegression 0.681
19 StandardScaler, KNeighborsClassifier 0.681
20 KNeighborsClassifier 0.667
21 LogisticRegression 0.639
22 BaggingClassifier 0.639
23 StandardScaler, SGDClassifier 0.639
24 MaxAbsScaler, LGBMClassifier 0.611
25 MaxAbsScaler, XGBClassifier 0.611
26 XGBClassifier 0.611
27 LGBMClassifier 0.611
28 StandardScaler, LGBMClassifier 0.611
29 StandardScaler, XGBClassifier 0.611
30 MaxAbsScaler, DecisionTreeClassifier 0.597
31 StandardScaler, PassiveAggressiveClassifier 0.597
32 GradientBoostingClassifier 0.583
33 StandardScaler, GradientBoostingClassifier 0.583
34 DecisionTreeClassifier 0.569
35 MaxAbsScaler, LinearDiscriminantAnalysis 0.556
36 MaxAbsScaler, GradientBoostingClassifier 0.556
37 LinearDiscriminantAnalysis 0.556
38 StandardScaler, LinearDiscriminantAnalysis 0.556
39 RobustScaler, LinearDiscriminantAnalysis 0.556
40 MaxAbsScaler, GaussianNB 0.542
41 GaussianNB 0.542
42 StandardScaler, GaussianNB 0.542
43 RobustScaler, GaussianNB 0.542
44 MaxAbsScaler, MultinomialNB 0.542
45 MaxAbsScaler, PassiveAggressiveClassifier 0.542
46 MultinomialNB 0.542
47 SGDClassifier 0.542
48 StandardScaler, BernoulliNB 0.528
49 StandardScaler, DecisionTreeClassifier 0.528
50 MaxAbsScaler, SelectKBest, RandomForestClassifier 0.514
51 StandardScaler, SelectKBest, ExtraTreesClassifier 0.500
52 RobustScaler, SelectKBest, ExtraTreesClassifier 0.500
53 MaxAbsScaler, SelectPercentile, XGBClassifier 0.486
54 MaxAbsScaler, SelectKBest, XGBClassifier 0.472
55 PassiveAggressiveClassifier 0.472
56 MaxAbsScaler, SelectKBest, ExtraTreesClassifier 0.458
57 MaxAbsScaler, SelectPercentile, DecisionTreeClassifier 0.458
58 MaxAbsScaler, SelectPercentile, ExtraTreesClassifier 0.444
59 SelectPercentile, ExtraTreesClassifier 0.444
60 SelectKBest, ExtraTreesClassifier 0.444
61 MaxAbsScaler, SelectPercentile, RandomForestClassifier 0.444
62 MaxAbsScaler, SelectPercentile, LGBMClassifier 0.444
63 MaxAbsScaler, SelectKBest, DecisionTreeClassifier 0.444
64 MaxAbsScaler, SelectKBest, BaggingClassifier 0.444
65 RobustScaler, SelectPercentile, ExtraTreesClassifier 0.431
66 MaxAbsScaler, SGDClassifier 0.431
67 MaxAbsScaler, SelectPercentile, QuadraticDiscriminantAnalysis 0.431
68 MaxAbsScaler, SelectPercentile, BaggingClassifier 0.431
69 MaxAbsScaler, SelectKBest, LGBMClassifier 0.431
70 MaxAbsScaler, SelectPercentile, KNeighborsClassifier 0.417
71 StandardScaler, SelectPercentile, ExtraTreesClassifier 0.417
72 MaxAbsScaler, SelectKBest, KNeighborsClassifier 0.417
73 MaxAbsScaler, SelectKBest, GradientBoostingClassifier 0.417
74 StandardScaler, QuadraticDiscriminantAnalysis 0.417
75 SelectPercentile, KNeighborsClassifier 0.403
76 MaxAbsScaler, QuadraticDiscriminantAnalysis 0.389
77 MaxAbsScaler, SelectPercentile, GradientBoostingClassifier 0.375
78 MaxAbsScaler, SelectKBest, QuadraticDiscriminantAnalysis 0.375
79 QuadraticDiscriminantAnalysis 0.361
80 MaxAbsScaler, SelectKBest, GaussianNB 0.347
81 SelectKBest, GaussianNB 0.347
82 StandardScaler, SelectKBest, GaussianNB 0.347
83 RobustScaler, SelectKBest, GaussianNB 0.347
84 MaxAbsScaler, SelectPercentile, SVC 0.333
85 MaxAbsScaler, SelectPercentile, LinearDiscriminantAnalysis 0.319
86 MaxAbsScaler, SelectKBest, LinearDiscriminantAnalysis 0.319
87 MaxAbsScaler, SelectKBest, LinearSVC 0.306
88 MaxAbsScaler, SelectKBest, SVC 0.306
89 MaxAbsScaler, SelectPercentile, LinearSVC 0.292
90 MaxAbsScaler, SelectKBest, LogisticRegression 0.278
91 MaxAbsScaler, GenericUnivariateSelect, ExtraTreesClassifier 0.264
92 RobustScaler, GenericUnivariateSelect, ExtraTreesClassifier 0.264
93 MaxAbsScaler, GenericUnivariateSelect, DecisionTreeClassifier 0.250
94 MaxAbsScaler, SelectPercentile, GaussianNB 0.250
95 GenericUnivariateSelect, ExtraTreesClassifier 0.250
96 StandardScaler, GenericUnivariateSelect, ExtraTreesClassifier 0.250
97 SelectPercentile, GaussianNB 0.250
98 StandardScaler, SelectPercentile, GaussianNB 0.250
99 RobustScaler, SelectPercentile, GaussianNB 0.250
100 MaxAbsScaler, SelectPercentile, LogisticRegression 0.250
101 GenericUnivariateSelect, DecisionTreeClassifier 0.250
102 StandardScaler, GenericUnivariateSelect, DecisionTreeClassifier 0.250
103 RobustScaler, GenericUnivariateSelect, DecisionTreeClassifier 0.250
104 GenericUnivariateSelect, BaggingClassifier 0.236
105 MaxAbsScaler, SelectPercentile, SGDClassifier 0.222
106 GenericUnivariateSelect, RandomForestClassifier 0.222
107 MaxAbsScaler, GenericUnivariateSelect, LGBMClassifier 0.208
108 MaxAbsScaler, GenericUnivariateSelect, BaggingClassifier 0.208
109 GenericUnivariateSelect, LGBMClassifier 0.208
110 MaxAbsScaler, GenericUnivariateSelect, GaussianNB 0.194
111 MaxAbsScaler, GenericUnivariateSelect, GradientBoostingClassifier 0.194
112 MaxAbsScaler, GenericUnivariateSelect, QuadraticDiscriminantAnalysis 0.194
113 MaxAbsScaler, GenericUnivariateSelect, RandomForestClassifier 0.194
114 MaxAbsScaler, GenericUnivariateSelect, XGBClassifier 0.194
115 GenericUnivariateSelect, GaussianNB 0.194
116 StandardScaler, GenericUnivariateSelect, GaussianNB 0.194
117 RobustScaler, GenericUnivariateSelect, GaussianNB 0.194
118 StandardScaler, GenericUnivariateSelect, KNeighborsClassifier 0.194
119 GenericUnivariateSelect, GradientBoostingClassifier 0.194
120 StandardScaler, GenericUnivariateSelect, GradientBoostingClassifier 0.194
121 RobustScaler, GenericUnivariateSelect, GradientBoostingClassifier 0.194
122 GenericUnivariateSelect, QuadraticDiscriminantAnalysis 0.194
123 GenericUnivariateSelect, XGBClassifier 0.194
124 StandardScaler, GenericUnivariateSelect, QuadraticDiscriminantAnalysis 0.194
125 MaxAbsScaler, GenericUnivariateSelect, KNeighborsClassifier 0.181
126 MaxAbsScaler, GenericUnivariateSelect, SGDClassifier 0.181
127 MaxAbsScaler, GenericUnivariateSelect, SVC 0.181
128 GenericUnivariateSelect, KNeighborsClassifier 0.181
129 RobustScaler, GenericUnivariateSelect, KNeighborsClassifier 0.181
130 GenericUnivariateSelect, SVC 0.181
131 StandardScaler, GenericUnivariateSelect, SVC 0.181
132 StandardScaler, GenericUnivariateSelect, LogisticRegression 0.153
133 RobustScaler, GenericUnivariateSelect, LogisticRegression 0.153
134 MaxAbsScaler, GenericUnivariateSelect, LinearSVC 0.139
135 GenericUnivariateSelect, LinearSVC 0.139
136 MaxAbsScaler, SelectKBest, SGDClassifier 0.139
137 StandardScaler, GenericUnivariateSelect, LinearSVC 0.139
138 RobustScaler, GenericUnivariateSelect, LinearSVC 0.139
139 MaxAbsScaler, GenericUnivariateSelect, LinearDiscriminantAnalysis 0.125
140 MaxAbsScaler, GenericUnivariateSelect, PassiveAggressiveClassifier 0.125
141 GenericUnivariateSelect, LinearDiscriminantAnalysis 0.125
142 StandardScaler, GenericUnivariateSelect, LinearDiscriminantAnalysis 0.125
143 RobustScaler, GenericUnivariateSelect, LinearDiscriminantAnalysis 0.125
144 MaxAbsScaler, SelectPercentile, PassiveAggressiveClassifier 0.125
145 GenericUnivariateSelect, SGDClassifier 0.125
146 MaxAbsScaler, GenericUnivariateSelect, LogisticRegression 0.097
147 MaxAbsScaler, SelectKBest, PassiveAggressiveClassifier 0.097
148 GenericUnivariateSelect, LogisticRegression 0.097
149 StandardScaler, GenericUnivariateSelect, SGDClassifier 0.097
150 RobustScaler, GenericUnivariateSelect, SGDClassifier 0.097
151 MaxAbsScaler, SelectKBest, MultinomialNB 0.083
152 GenericUnivariateSelect, PassiveAggressiveClassifier 0.083
153 MaxAbsScaler, SelectPercentile, MultinomialNB 0.069
154 StandardScaler, GenericUnivariateSelect, BernoulliNB 0.069
155 MaxAbsScaler, GenericUnivariateSelect, BernoulliNB 0.028
156 MaxAbsScaler, GenericUnivariateSelect, MultinomialNB 0.028
157 MaxAbsScaler, BernoulliNB 0.028
158 MaxAbsScaler, SelectPercentile, BernoulliNB 0.028
159 BernoulliNB 0.028
160 GenericUnivariateSelect, BernoulliNB 0.028
161 MaxAbsScaler, SelectKBest, BernoulliNB 0.028
162 GenericUnivariateSelect, MultinomialNB 0.028

Model Exploration

In order to explore the produced pipelines, we can use PipelineProfiler. PipelineProfiler is a visualization that enables users to compare and explore the pipelines generated by the AutoML systems.

After the pipeline search process is completed, we can use PipelineProfiler with:

Note

You can partially interact with this visualization. Try it in Jupyter Notebook to get full access to all features.

[7]:
automl.plot_comparison_pipelines()

PipelineProfiler shows the produced pipelines as a matrix, where the pipelines are represented as rows, and primitives as columns.

PipelineProfiler matrix view

The score view displays performance metrics (i.e. accuracy, F1) of the evaluated pipelines. It can also visualize the training time of each of the pipelines.

PipelineProfiler performance view

The Primitive Contribution view shows the correlation between primitive usage and the pipeline scores.

PipelineProfiler primitive contribution

The Pipeline Comparison view highlights the differences between selected pipelines. It presents a node-link representation of the selected pipelines. Multiple pipelines can be selected by shift-clicking the matrix rows.

PipelineProfiler graph comparison

For more information about how to use PipelineProfiler, click here. There is also a video demo available here.

Separating the features and target columns for the test dataset.

[8]:
X_test = test_dataset.drop(columns=[target_column])
y_test = test_dataset[[target_column]]

The best pipeline’s predictions are accessed with:

[9]:
y_pred = automl.predict(X_test)
y_pred
[9]:
array([ 2,  9, 12,  3, 14, 11, 14, 14,  9,  4,  1,  7,  1,  8,  5, 12, 12,
        5, 11, 11,  3, 11, 11,  7, 15,  9, 13,  3, 15, 12, 10, 12, 15,  8,
        9,  8, 12,  7, 11,  4,  2, 10,  3,  6, 15, 13,  6, 10,  9, 14,  9,
        2,  1,  3, 10,  9,  5,  8, 15,  7, 13,  5, 15,  6, 15,  2,  4,  5,
        7,  6, 14,  2])

The best pipeline can be evaluated against a held out dataset with the function call:

[10]:
automl.score(X_test, y_test)
INFO:alpha_automl.automl_api:Metric: accuracy_score, Score: 0.8333333333333334
[10]:
{'metric': 'accuracy_score', 'score': 0.8333333333333334}

Download this example as a jupyter notebook file ( .ipynb ).