Note
Click here to download the full example code
Operating configuration spaces (basic)¶
A configuration space can represent the hyperparameters of a single component (an algorithm) or the hyperparameters of all components contained in the pipeline.
We represent as a workflow the union of configuration spaces of different algorithms that together can create a multitude of machine learning pipeline types.
You can create workflows using the following configuration space operators:
- Chain –> It creates a sequential chain of configuration spaces
- Shuffle –> It shuffles the configuration spaces order
- Select –> It selects one of the given configuration spaces
Importing the required packages
import numpy as np
from pjautoml.cs.operator.free.chain import Chain
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
Using Chain¶
The Chain
is a configuration space operator that concatenates other spaces
in a sequence. Intuitively you can see it as a Cartesian product between two
or more search spaces.
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 59.28450253804001,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": "balanced",
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": null,
"max_depth": 647,
"min_samples_split": 0.1312767257915965,
"min_samples_leaf": 0.2675320084616231,
"min_weight_fraction_leaf": 0.2890988281503088,
"min_impurity_decrease": 0.07668830376515555,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 81.21689166067644,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 100,
"cache_size": 200,
"class_weight": "balanced",
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": "auto",
"max_depth": 89,
"min_samples_split": 0.006066499013700276,
"min_samples_leaf": 0.24978612104453587,
"min_weight_fraction_leaf": 0.23344702528495515,
"min_impurity_decrease": 0.17400242964936385,
"seed": 0
}
},
"enhance": true,
"model": true
}
Using Shuffle¶
The Select
is a configuration space operator that works like a
bifurcation, where only one of the spaces will be selected. Intuitively you
can see it as a branch created in your search space in which a random factor
can enable one or other configuration space.
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.978618342232764
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
-1,
1
]
}
},
"enhance": true,
"model": true
}
You can also use the python operator @
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.46147936225293185
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
Using Select¶
The Shuffle
is a configuration space operator that concatenate
configurations spaces in a sequence, but the order is not maintained.
Intuitively, you can see it as the union of the Cartesian product of all
configuration space combinations.
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 53.737369207578126,
"kernel": "sigmoid",
"degree": 3,
"gamma": 58.201983387312794,
"coef0": 72.06326547259168,
"shrinking": true,
"probability": false,
"tol": 10000,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": null,
"max_features": null,
"max_depth": 737,
"min_samples_split": 0.06496588977695714,
"min_samples_leaf": 0.04056631680346222,
"min_weight_fraction_leaf": 0.09724230233796423,
"min_impurity_decrease": 0.029934973436736637,
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use the python operator +
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 38.64895946368809,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 1000,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null
}
},
"enhance": true,
"model": true
}
Using them all:¶
Using these simple operations, you can create diverse kind of configuration spaces to represent an end-to-end AutoML problem.
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.43703195379934145
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": null,
"max_depth": 654,
"min_samples_split": 0.05127370463122841,
"min_samples_leaf": 0.10744629193869054,
"min_weight_fraction_leaf": 0.22520584236553687,
"min_impurity_decrease": 0.12156613374309355,
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use python operators
Out:
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.038425426472734725
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "gini",
"splitter": "best",
"class_weight": null,
"max_features": "auto",
"max_depth": 653,
"min_samples_split": 0.19051802702219556,
"min_samples_leaf": 0.2985898750037986,
"min_weight_fraction_leaf": 0.17455509883156028,
"min_impurity_decrease": 0.08287371764527376,
"seed": 0
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 0.324 seconds)