Welcome to pjautoml ‘s documentation!¶
Install¶
Install¶
The pjautoml is available on the PyPi . You can install it via pip as follow:
pip install -U pjautoml
It is possible to use the development version installing from GitHub:
pip install -U git@github.com:end-to-end-data-science/pjautoml.git
If you prefer, you can clone it and run the setup.py file. Use the following commands to get a copy from Github and install all dependencies:
git clone git@github.com:end-to-end-data-science/pjautoml.git
cd pjautoml
pip install .
Test and coverage¶
If you want to test/test-coverage the code before to install:
$ make install-dev
$ make test-cov
Or:
$ make install-dev
$ pytest --cov=pjautoml/ tests/
API Documentation¶
This is the full API documentation of the pjautoml package.
pjautoml.cs
: Configuration Space¶
Operand¶
graph.graph.Graph ([name, path, nodes]) |
TODO. |
graph.node.Node ([params, children]) |
Partial settings for a component. |
list.flist.ListCS (*css) |
Finite Config Space (FCS) is a representation of a discrete CS. |
list.flist.CList (*css) |
|
list.flist.FList (*css) |
Operator¶
Data-driven configuration space operator¶
optimization.modelfree.best.Best (listcs[, …]) |
|
optimization.modelfree.random.RandomSearch (cs) |
Configuration space operators¶
container.Container (*args, seed, name, path, …) |
TODO. |
map.Map (*args[, seed]) |
TODO. |
multi.Multi (*args[, seed]) |
TODO. |
sample.Sample (cs[, n]) |
TODO. |
chain.Chain (*css, **kwargs) |
TODO. |
select.Select (*css, **kwargs) |
TODO. |
shuffle.Shuffle (*css, **kwargs) |
A permutation is sampled. |
pjautoml.util
: Util Classes and Functions¶
parameter.Param (function, **kwargs) |
Base class for all kinds of algorithm (hyper)parameters. |
parameter.CatP (function, **kwargs) |
|
parameter.IntP (function, **kwargs) |
|
parameter.FixedP (value) |
|
parameter.OrdP (function, **kwargs) |
|
parameter.RealP (function, **kwargs) |
pjautoml.abs
: Abstract Classes and Mixin¶
The pjautoml.abs submodule contains abstract classes and mixin.
mixin.asoperand.AsOperandCS |
The pjautoml example gallery¶
The pjautoml library aims to provide easy tools to create AutoML systems from scratch. It adds new and elegant ways to deal and operates with configuration space. Moreover, it provides some AutoML system based on the literature.
Below we present a gallery with examples of use:
Introductory Examples¶
Introductory examples of pjautoml package.
Note
Click here to download the full example code
Operating configuration spaces (basic)¶
A configuration space can represent the hyperparameters of a single component (an algorithm) or the hyperparameters of all components contained in the pipeline.
We represent as a workflow the union of configuration spaces of different algorithms that together can create a multitude of machine learning pipeline types.
You can create workflows using the following configuration space operators:
- Chain –> It creates a sequential chain of configuration spaces
- Shuffle –> It shuffles the configuration spaces order
- Select –> It selects one of the given configuration spaces
Importing the required packages
import numpy as np
from pjautoml.cs.operator.free.chain import Chain
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
Using Chain¶
The Chain
is a configuration space operator that concatenates other spaces
in a sequence. Intuitively you can see it as a Cartesian product between two
or more search spaces.
exp = Chain(SVMC, DT)
print(exp.sample())
# You can also use the python operator ``*``
exp = SVMC * DT
print(exp.sample())
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 59.28450253804001,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 0.001,
"cache_size": 200,
"class_weight": "balanced",
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": null,
"max_depth": 647,
"min_samples_split": 0.1312767257915965,
"min_samples_leaf": 0.2675320084616231,
"min_weight_fraction_leaf": 0.2890988281503088,
"min_impurity_decrease": 0.07668830376515555,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 81.21689166067644,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 100,
"cache_size": 200,
"class_weight": "balanced",
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": "auto",
"max_depth": 89,
"min_samples_split": 0.006066499013700276,
"min_samples_leaf": 0.24978612104453587,
"min_weight_fraction_leaf": 0.23344702528495515,
"min_impurity_decrease": 0.17400242964936385,
"seed": 0
}
},
"enhance": true,
"model": true
}
Using Shuffle¶
The Select
is a configuration space operator that works like a
bifurcation, where only one of the spaces will be selected. Intuitively you
can see it as a branch created in your search space in which a random factor
can enable one or other configuration space.
exp = Chain(PCA, MinMax)
print(exp.sample())
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.978618342232764
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
-1,
1
]
}
},
"enhance": true,
"model": true
}
You can also use the python operator @
exp = PCA @ MinMax
print(exp.sample())
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.46147936225293185
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
Using Select¶
The Shuffle
is a configuration space operator that concatenate
configurations spaces in a sequence, but the order is not maintained.
Intuitively, you can see it as the union of the Cartesian product of all
configuration space combinations.
exp = Chain(SVMC, DT)
print(exp.sample())
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 53.737369207578126,
"kernel": "sigmoid",
"degree": 3,
"gamma": 58.201983387312794,
"coef0": 72.06326547259168,
"shrinking": true,
"probability": false,
"tol": 10000,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovo",
"break_ties": false,
"random_state": null,
"seed": 0
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": null,
"max_features": null,
"max_depth": 737,
"min_samples_split": 0.06496588977695714,
"min_samples_leaf": 0.04056631680346222,
"min_weight_fraction_leaf": 0.09724230233796423,
"min_impurity_decrease": 0.029934973436736637,
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use the python operator +
exp = SVMC + DT
print(exp.sample())
Out:
{
"info": {
"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc",
"config": {
"C": 38.64895946368809,
"kernel": "linear",
"degree": 3,
"gamma": "scale",
"coef0": 0.0,
"shrinking": true,
"probability": false,
"tol": 1000,
"cache_size": 200,
"class_weight": null,
"verbose": false,
"max_iter": 1000000,
"decision_function_shape": "ovr",
"break_ties": false,
"random_state": null
}
},
"enhance": true,
"model": true
}
Using them all:¶
Using these simple operations, you can create diverse kind of configuration spaces to represent an end-to-end AutoML problem.
exp = Chain(Shuffle(PCA, MinMax), Select(SVMC + DT))
print(exp.sample())
Out:
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.43703195379934145
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": null,
"max_depth": 654,
"min_samples_split": 0.05127370463122841,
"min_samples_leaf": 0.10744629193869054,
"min_weight_fraction_leaf": 0.22520584236553687,
"min_impurity_decrease": 0.12156613374309355,
"seed": 0
}
},
"enhance": true,
"model": true
}
You can also use python operators
exp = PCA @ MinMax * (SVMC + DT)
print(exp.sample())
Out:
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.038425426472734725
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "gini",
"splitter": "best",
"class_weight": null,
"max_features": "auto",
"max_depth": 653,
"min_samples_split": 0.19051802702219556,
"min_samples_leaf": 0.2985898750037986,
"min_weight_fraction_leaf": 0.17455509883156028,
"min_impurity_decrease": 0.08287371764527376,
"seed": 0
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 0.324 seconds)
Note
Click here to download the full example code
Searching for good pipelines from a workflow¶
Let’s run a random search to find a good machine learning pipeline to a given problem.
Importing the required packages
import numpy as np
from pjautoml.cs.operator.datadriven.optimization.modelfree.random import RandomSearch
from pjautoml.cs.operator.free.map import Map
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjautoml.cs.workflow import Workflow
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
First, we must create a workflow.
workflow = Workflow(
File("../data/iris.arff"),
Partition(),
Map(Shuffle(PCA, MinMax), Select(SVMC + DT), Metric()),
Summ(function="mean"),
Reduce(),
Report("Mean S: $S"),
)
Now, we run the random search over the workflow created. We will get the best pipeline found after 30 samples. The testing performance by pipeline will be printed.
rs1 = RandomSearch(workflow, sample=30)
print(len(rs1.datas))
print(len(rs1.components))
Out:
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94]])
[model] Mean S: array([[0.93333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.97333333]])
[model] Mean S: array([[0.38]])
[model] Mean S: array([[0.00666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.95333333]])
[model] Mean S: array([[0.94]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.93333333]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.95333333]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.94]])
1
1
The best pipeline found is:
res_train, res_test = rs1.datas[0]
print("Train result: ", res_train)
print("test result: ", res_test)
Out:
Train result: <pjdata.content.data.Data object at 0x7f6f3fcdae80>
test result: <pjdata.content.data.Data object at 0x7f6f3fc846d0>
Total running time of the script: ( 0 minutes 3.518 seconds)
Note
Click here to download the full example code
Creating an end-to-end workflow¶
Let create an end-to-end machine learning workflow.
Importing the required packages
import numpy as np
from pjautoml.cs.operator.free.chain import Chain
from pjautoml.cs.operator.free.map import Map
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjautoml.cs.workflow import Workflow
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
First, we create a machine learning expression.
exp = Chain(Shuffle(PCA, MinMax), Select(SVMC + DT))
It represents a configuration space. Let’s get a sample from it.
print(exp.sample())
Out:
{
"info": {
"_id": "MinMax@pjpy.processing.feature.scaler.minmax",
"config": {
"feature_range": [
0,
1
]
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "PCA@pjpy.processing.feature.reductor.pca",
"config": {
"n": 0.7151893663724195
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "DT@pjpy.modeling.supervised.classifier.dt",
"config": {
"criterion": "entropy",
"splitter": "best",
"class_weight": "balanced",
"max_features": null,
"max_depth": 425,
"min_samples_split": 0.1937685880258838,
"min_samples_leaf": 0.1312767257915965,
"min_weight_fraction_leaf": 0.2675319002346239,
"min_impurity_decrease": 0.19273255210020587,
"seed": 0
}
},
"enhance": true,
"model": true
}
Defined our machine learning expression, we will create an end-to-end workflow.
workflow = Workflow(
File("../data/iris.arff"),
Partition(),
Map(exp, Metric()),
Summ(function="mean"),
Reduce(),
Report("Mean S: $S"),
)
or using only python operators
workflow = (
File("../data/iris.arff")
* Partition()
* Map(exp * Metric())
* Summ(function="mean")
* Reduce()
* Report("Mean S: $S")
)
This workflow represents the union of all configuration spaces. Let get a sample of it:
spl = workflow.sample()
print(spl)
Out:
{
"info": {
"_id": "File@pjml.data.flow.file",
"config": {
"name": "../data/iris.arff",
"path": "./",
"description": "No description.",
"hashes": {
"X": "0ǏǍɽĊũÊүȏŵҖSîҕ",
"Y": "0ЄϒɐĵǏȂϗƽўýÎʃȆ",
"Xd": "5ɫңɖŇǓήʼnÝʑΏƀЀǔ",
"Yd": "5mϛǖͶƅĞOȁЎžʛѲƨ",
"Xt": "5ȥΔĨӑËҭȨƬδſΧȰɩ",
"Yt": "5έēPaӹЄźգǩȱɟǟǹ"
}
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Partition@pjml.stream.expand.partition",
"config": {
"split_type": "cv",
"partitions": 10,
"seed": 0,
"fields": "X,Y"
}
},
"enhance": true,
"model": true
}
Map>>
{"info": {"_id": "Pipeline@pjml.operator.pipeline","config": {"components": [{"info": {"_id": "MinMax@pjpy.processing.feature.scaler.minmax","config": {"feature_range": [0,1],"model": true,"enhance": true}},"enhance": true,"model": true},{"info": {"_id": "PCA@pjpy.processing.feature.reductor.pca","config": {"n": 0.7917250380826646,"model": true,"enhance": true}},"enhance": true,"model": true}],"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "DT@pjpy.modeling.supervised.classifier.dt","config": {"criterion": "gini","splitter": "best","class_weight": null,"max_features": "sqrt","max_depth": 926,"min_samples_split": 0.021311746423307888,"min_samples_leaf": 0.026139702781162514,"min_weight_fraction_leaf": 0.006065519232097715,"min_impurity_decrease": 0.1665239691095876,"seed": 0,"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "Metric@pjml.data.evaluation.metric","config": {"functions": ["accuracy"],"target": "Y","prediction": "Z","model": true,"enhance": true}},"enhance": true,"model": true}
<<Map
{
"info": {
"_id": "Summ@pjml.stream.reduce.summ",
"config": {
"field": "R",
"function": "mean"
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Reduce@pjml.stream.reduce.reduce",
"config": {}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Report@pjml.data.communication.report",
"config": {
"text": "Mean S: $S"
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 0.041 seconds)
Advanced Examples¶
Advanced examples of pjautoml package.
Note
Click here to download the full example code
Creating random search with configuration spaces’ operations¶
Let’s create a random search from scratch using the other two essential
configuration space operation: Best
and Sample
.
Importing the required packages
import numpy as np
from pjautoml.cs.operator.datadriven.optimization.modelfree.best import Best
from pjautoml.cs.operator.free.map import Map
from pjautoml.cs.operator.free.sample import Sample
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjautoml.cs.workflow import Workflow
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
This is the workflow we will work on. The workflow is also a representation of our configuration space, i.e., it represents all machine learning pipelines that can be achieved.
workflow = Workflow(
File("../data/iris.arff"),
Partition(),
Map(Shuffle(PCA, MinMax), Select(SVMC + DT), Metric()),
Summ(function="mean"),
Reduce(),
Report("Mean S: $S"),
)
Using Sample:
The operation Sample
will sample n
different pipelines.
It will transform infinity configuration space in finite.
spl = Sample(workflow, n=10)
print(len(spl))
Out:
10
Using Best:
The operation Best
will return the pipeline with the best performance
(highest value in data.S
). By default, we define the best as the highest
value, but you can also set the best as the lowest value.
best_2 = Best(spl, n=2)
print(len(best_2.datas))
print(len(best_2.components))
Out:
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94]])
[model] Mean S: array([[0.93333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.97333333]])
[model] Mean S: array([[0.38]])
2
2
Total running time of the script: ( 0 minutes 1.186 seconds)
Note
Click here to download the full example code
Creating an end-to-end AutoML from scratch¶
Let’s create an AutoML from scratch.
Importing the required packages
import numpy as np
from pjautoml.cs.operator.datadriven.optimization.modelfree.random import RandomSearch
from pjautoml.cs.operator.free.map import Map
from pjautoml.cs.operator.free.select import Select
from pjautoml.cs.operator.free.shuffle import Shuffle
from pjautoml.cs.workflow import Workflow
from pjml.data.communication.report import Report
from pjml.data.evaluation.metric import Metric
from pjml.data.flow.file import File
from pjml.stream.expand.partition import Partition
from pjml.stream.reduce.reduce import Reduce
from pjml.stream.reduce.summ import Summ
from pjpy.modeling.supervised.classifier.dt import DT
from pjpy.modeling.supervised.classifier.svmc import SVMC
from pjpy.processing.feature.reductor.pca import PCA
from pjpy.processing.feature.scaler.minmax import MinMax
np.random.seed(0)
First, we can define a workflow. Notice we not add a File
. Then, we use
random search as the optimization process to select the best pipeline.
Finally, we should also give it a name. The name of my AutoML, of course, is
my_automl
:)
def my_automl(data):
workflow = Workflow(
Partition(),
Map(Shuffle(PCA, MinMax), Select(SVMC + DT), Metric()),
Summ(function="mean"),
Reduce(),
Report("Mean S: $S"),
)
rs = RandomSearch(workflow, sample=30, train=data, test=data)
return rs.components[0]
Now, let’s find a good pipeline for the iris dataset:
data = File("../data/iris.arff").data
best_pipeline = my_automl(data)
print(best_pipeline)
Out:
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94]])
[model] Mean S: array([[0.93333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.97333333]])
[model] Mean S: array([[0.38]])
[model] Mean S: array([[0.00666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.95333333]])
[model] Mean S: array([[0.94]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.93333333]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.96666667]])
[model] Mean S: array([[0.94666667]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.95333333]])
[model] Mean S: array([[0.33333333]])
[model] Mean S: array([[0.96]])
[model] Mean S: array([[0.94]])
{
"info": {
"_id": "Partition@pjml.stream.expand.partition",
"config": {
"split_type": "cv",
"partitions": 10,
"seed": 0,
"fields": "X,Y"
}
},
"enhance": true,
"model": true
}
Map>>
{"info": {"_id": "Pipeline@pjml.operator.pipeline","config": {"components": [{"info": {"_id": "MinMax@pjpy.processing.feature.scaler.minmax","config": {"feature_range": [-1,1],"model": true,"enhance": true}},"enhance": true,"model": true},{"info": {"_id": "PCA@pjpy.processing.feature.reductor.pca","config": {"n": 0.9764594650133958,"model": true,"enhance": true}},"enhance": true,"model": true}],"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "SVMC@pjpy.modeling.supervised.classifier.svmc","config": {"C": 97.6761111429249,"kernel": "linear","degree": 3,"gamma": "scale","coef0": 0.0,"shrinking": false,"probability": false,"tol": 0.1,"cache_size": 200,"class_weight": "balanced","verbose": false,"max_iter": 1000000,"decision_function_shape": "ovo","break_ties": false,"random_state": null,"seed": 0,"model": true,"enhance": true}},"enhance": true,"model": true}
{"info": {"_id": "Metric@pjml.data.evaluation.metric","config": {"functions": ["accuracy"],"target": "Y","prediction": "Z","model": true,"enhance": true}},"enhance": true,"model": true}
<<Map
{
"info": {
"_id": "Summ@pjml.stream.reduce.summ",
"config": {
"field": "R",
"function": "mean"
}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Reduce@pjml.stream.reduce.reduce",
"config": {}
},
"enhance": true,
"model": true
}
{
"info": {
"_id": "Report@pjml.data.communication.report",
"config": {
"text": "Mean S: $S"
}
},
"enhance": true,
"model": true
}
Total running time of the script: ( 0 minutes 3.250 seconds)
Getting started¶
Information to install, test, and contribute to the package.
API Documentation¶
In this section, we document expected types, functions, classes, and parameters available for AutoML building. We also describe our own AutoML systems.
Examples¶
A set of examples illustrating the use of pjautoml package. You will learn in this section how pjautoml works, patter, tips, and more.
What’s new ?¶
Log of the pjautoml history.