Operating configuration spaces (basic)

A configuration space can represent the hyperparameters of a single component (an algorithm) or the hyperparameters of all components contained in the pipeline.

We represent as a workflow the union of configuration spaces of different algorithms that together can create a multitude of machine learning pipeline types.

You can create workflows using the following configuration space operators:

  • Chain –> It creates a sequential chain of configuration spaces
  • Shuffle –> It shuffles the configuration spaces order
  • Select –> It selects one of the given configuration spaces

Importing the required packages

import numpy as np

from pjautoml.cs.operator.free.chain import Chain
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax

np.random.seed(0)

Using Chain

The Chain is a configuration space operator that concatenates other spaces in a sequence. Intuitively you can see it as a Cartesian product between two or more search spaces.

exp = Chain(SVMC, DT)
print(exp.sample())

# You can also use the python operator ``*``

exp = SVMC * DT
print(exp.sample())

Out:

{
    "info": {
        "_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
        "config": {
            "C": 59.28450253804001,
            "kernel": "linear",
            "degree": 3,
            "gamma": "scale",
            "coef0": 0.0,
            "shrinking": true,
            "probability": false,
            "tol": 0.001,
            "cache_size": 200,
            "class_weight": "balanced",
            "verbose": false,
            "max_iter": 1000000,
            "decision_function_shape": "ovo",
            "break_ties": false,
            "random_state": null,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "DT@pjpy.modeling.supervised.classifier.dt",
        "config": {
            "criterion": "entropy",
            "splitter": "best",
            "class_weight": "balanced",
            "max_features": null,
            "max_depth": 647,
            "min_samples_split": 0.1312767257915965,
            "min_samples_leaf": 0.2675320084616231,
            "min_weight_fraction_leaf": 0.2890988281503088,
            "min_impurity_decrease": 0.07668830376515555,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
        "config": {
            "C": 81.21689166067644,
            "kernel": "linear",
            "degree": 3,
            "gamma": "scale",
            "coef0": 0.0,
            "shrinking": true,
            "probability": false,
            "tol": 100,
            "cache_size": 200,
            "class_weight": "balanced",
            "verbose": false,
            "max_iter": 1000000,
            "decision_function_shape": "ovo",
            "break_ties": false,
            "random_state": null,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "DT@pjpy.modeling.supervised.classifier.dt",
        "config": {
            "criterion": "entropy",
            "splitter": "best",
            "class_weight": "balanced",
            "max_features": "auto",
            "max_depth": 89,
            "min_samples_split": 0.006066499013700276,
            "min_samples_leaf": 0.24978612104453587,
            "min_weight_fraction_leaf": 0.23344702528495515,
            "min_impurity_decrease": 0.17400242964936385,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}

Using Shuffle

The Select is a configuration space operator that works like a bifurcation, where only one of the spaces will be selected. Intuitively you can see it as a branch created in your search space in which a random factor can enable one or other configuration space.

exp = Chain(PCA, MinMax)
print(exp.sample())

Out:

{
    "info": {
        "_id": "PCA@pjpy.processing.feature.reductor.pca",
        "config": {
            "n": 0.978618342232764
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "MinMax@pjpy.processing.feature.scaler.minmax",
        "config": {
            "feature_range": [
                -1,
                1
            ]
        }
    },
    "enhance": true,
    "model": true
}

You can also use the python operator @

exp = PCA @ MinMax
print(exp.sample())

Out:

{
    "info": {
        "_id": "PCA@pjpy.processing.feature.reductor.pca",
        "config": {
            "n": 0.46147936225293185
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "MinMax@pjpy.processing.feature.scaler.minmax",
        "config": {
            "feature_range": [
                0,
                1
            ]
        }
    },
    "enhance": true,
    "model": true
}

Using Select

The Shuffle is a configuration space operator that concatenate configurations spaces in a sequence, but the order is not maintained. Intuitively, you can see it as the union of the Cartesian product of all configuration space combinations.

exp = Chain(SVMC, DT)
print(exp.sample())

Out:

{
    "info": {
        "_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
        "config": {
            "C": 53.737369207578126,
            "kernel": "sigmoid",
            "degree": 3,
            "gamma": 58.201983387312794,
            "coef0": 72.06326547259168,
            "shrinking": true,
            "probability": false,
            "tol": 10000,
            "cache_size": 200,
            "class_weight": null,
            "verbose": false,
            "max_iter": 1000000,
            "decision_function_shape": "ovo",
            "break_ties": false,
            "random_state": null,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "DT@pjpy.modeling.supervised.classifier.dt",
        "config": {
            "criterion": "entropy",
            "splitter": "best",
            "class_weight": null,
            "max_features": null,
            "max_depth": 737,
            "min_samples_split": 0.06496588977695714,
            "min_samples_leaf": 0.04056631680346222,
            "min_weight_fraction_leaf": 0.09724230233796423,
            "min_impurity_decrease": 0.029934973436736637,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}

You can also use the python operator +

exp = SVMC + DT
print(exp.sample())

Out:

{
    "info": {
        "_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
        "config": {
            "C": 38.64895946368809,
            "kernel": "linear",
            "degree": 3,
            "gamma": "scale",
            "coef0": 0.0,
            "shrinking": true,
            "probability": false,
            "tol": 1000,
            "cache_size": 200,
            "class_weight": null,
            "verbose": false,
            "max_iter": 1000000,
            "decision_function_shape": "ovr",
            "break_ties": false,
            "random_state": null
        }
    },
    "enhance": true,
    "model": true
}

Using them all:

Using these simple operations, you can create diverse kind of configuration spaces to represent an end-to-end AutoML problem.

exp = Chain(Shuffle(PCA, MinMax), Select(SVMC + DT))
print(exp.sample())

Out:

{
    "info": {
        "_id": "PCA@pjpy.processing.feature.reductor.pca",
        "config": {
            "n": 0.43703195379934145
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "MinMax@pjpy.processing.feature.scaler.minmax",
        "config": {
            "feature_range": [
                0,
                1
            ]
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "DT@pjpy.modeling.supervised.classifier.dt",
        "config": {
            "criterion": "entropy",
            "splitter": "best",
            "class_weight": "balanced",
            "max_features": null,
            "max_depth": 654,
            "min_samples_split": 0.05127370463122841,
            "min_samples_leaf": 0.10744629193869054,
            "min_weight_fraction_leaf": 0.22520584236553687,
            "min_impurity_decrease": 0.12156613374309355,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}

You can also use python operators

exp = PCA @ MinMax * (SVMC + DT)
print(exp.sample())

Out:

{
    "info": {
        "_id": "MinMax@pjpy.processing.feature.scaler.minmax",
        "config": {
            "feature_range": [
                0,
                1
            ]
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "PCA@pjpy.processing.feature.reductor.pca",
        "config": {
            "n": 0.038425426472734725
        }
    },
    "enhance": true,
    "model": true
}
{
    "info": {
        "_id": "DT@pjpy.modeling.supervised.classifier.dt",
        "config": {
            "criterion": "gini",
            "splitter": "best",
            "class_weight": null,
            "max_features": "auto",
            "max_depth": 653,
            "min_samples_split": 0.19051802702219556,
            "min_samples_leaf": 0.2985898750037986,
            "min_weight_fraction_leaf": 0.17455509883156028,
            "min_impurity_decrease": 0.08287371764527376,
            "seed": 0
        }
    },
    "enhance": true,
    "model": true
}

Total running time of the script: ( 0 minutes 0.324 seconds)

Gallery generated by Sphinx-Gallery