Workflow to visualize Tpot results #337

TheodoreGalanos · 2017-01-03T23:49:31Z

Hello everyone,

I hope this issue hasn't been discussed already. I tried a search through the issues with no luck.

I was wondering if anyone knows a way to quickly visualize the results coming out of Tpot. I feel that performance information across different solvers and across different parameter settings of individual or family of solvers can be extremely useful for all users, and especially new users like me.

I wonder if this can be a feature of Tpot? It can output information about performance of different solvers (perhaps visualizing ranges, colored scatter plots, etc.) and the impact of different parameters on specific solvers in a way that can be easily visualized.

Let me know what you think.

Kind regards,
Theodore.

sskarkhanis · 2017-05-11T10:10:12Z

Hello

just adding my question to this as I thought this question was closest to mine (let me know otherwise, I can raise a separate question),

I've just discovered TPOT this week and having fun learning it so far!

In the description on http://rhiever.github.io/tpot/ ,its mentioned,
"TPOT will automate the most tedious part of machine learning by intelligently exploring thousands of possible pipelines to find the best one for your data."

How can I find out which modeling algorithms were considered in the solution?

Thanks!

weixuanfu · 2017-05-11T14:01:22Z

Hi, Do you mean to check the modeling algorithms / pipelines were evaluated in TPOT optimization process?

You can easily found these pipelines in the dictionary tpot_obj._evaluated_individuals (tpot_obj is a TPOT object) based on the [source codes(https://github.com/rhiever/tpot/blob/master/tpot/base.py#L259-260). The keys of dictionary are pipeline string and the values are the count of operators and pipelines' CV scores.

To be more clear, below it is a small demo:

from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np
from deap import creator
from sklearn.model_selection import cross_val_score

# Iris flower classification
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
    iris.target.astype(np.float64), train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
# print part of pipeline dictionary
print(dict(list(tpot._evaluated_individuals.items())[0:2]))
# print a pipeline and its values
pipeline_str = list(tpot._evaluated_individuals.keys())[0]
print(pipeline_str)
print(tpot._evaluated_individuals[pipeline_str])
# convert pipeline string to scikit-learn pipeline object
optimized_pipeline = creator.Individual.from_string(pipeline_str, tpot._pset) # deap object
fitted_pipeline = tpot._toolbox.compile(expr=optimized_pipeline ) # scikit-learn pipeline object
# print scikit-learn pipeline object
print(fitted_pipeline)
# Fix random state when the operator allows  (optional) just for get consistent CV score 
tpot._set_param_recursive(fitted_pipeline.steps, 'random_state', 42)
# CV scores from scikit-learn
scores = cross_val_score(fitted_pipeline, X_train, y_train, cv=5, scoring='accuracy', verbose=0)
print(np.mean(scores))
print(tpot._evaluated_individuals[pipeline_str][1])

weixuanfu · 2017-05-11T14:04:05Z

Update: just fix a small bug in the demo above.

sskarkhanis · 2017-05-12T08:21:39Z

great! that's what I was looking for...
thank you for your response...

zhuangyanbuaa · 2017-12-07T14:07:10Z

Hi, thanks for your answer.
It seems like your code doesn't work now.
when I run tpot_obj._evaluated_individuals, error is 'TPOTClassifier' object has no attribute '_evaluated_individuals'

weixuanfu · 2017-12-07T14:36:40Z

We updated the API since version 0.8. Try ‘tpot_obj.evaluated_individuals_’ instead. On Dec 7, 2017, at 9:08 AM, yan <notifications@github.com<mailto:notifications@github.com>> wrote: evaluated_individuals

sergiolucero · 2018-04-13T01:57:20Z

how to determine the winning individual? the scores on every value of that dict are all over the place. Can verbosity help?? More insights are needed in the docs. Will dive into the code now.

mikesaclgithubaccount · 2019-03-07T18:19:23Z

@weixuanfu, is there a way to turn the evaluated_individuals_ key into an sklearn.pipeline without having a fitted TPOTClassifier or TPOTRegressor? I'd like to be record the key strings from evaluated_individuals_ and load them as sklearn.Pipeline objects in sessions after I've fit my TPOTClassifier/Regressor and no longer have access to the initial TPOTClassifier/Regressor that had the evaluated_individuals_.

weixuanfu · 2019-03-07T20:28:30Z

@miguelehernandez Please try the demo below to convert key strings to Pipeline

from tpot.export_utils import generate_pipeline_code, expr_to_tree

# print a pipeline and its values
pipeline_str = list(tpot.evaluated_individuals_.keys())[0]
print(pipeline_str)
print(tpot.evaluated_individuals_[pipeline_str])
for pipeline_string in sorted(tpot_obj.evaluated_individuals_.keys()):
    # convert pipeline string to scikit-learn pipeline object
    deap_pipeline = creator.Individual.from_string(pipeline_string, tpot_obj._pset)
    sklearn_pipeline = tpot_obj._toolbox.compile(expr=deap_pipeline)
    # print sklearn pipeline string
    sklearn_pipeline_str = generate_pipeline_code(expr_to_tree(individual, tpot_obj._pset), tpot_obj.operators)
    print(sklearn_pipeline_str)```

mikesaclgithubaccount · 2019-03-07T20:58:04Z

@weixuanfu , thanks so much for the quick reply!

Two follow up questions:

Is there an API way to do this that doesn't rely on private methods on the tpot_obj?
How does one go about initializing the _pset field when you want to convert the pipeline_string into a sklearn.Pipeline in a session in which the original tpot_obj is not available? For example, say I output the pipeline_string to a file and wanted to read it into a sklearn.Pipeline in another session?

If there is currently no API way to do this, would it be possible for me to contribute something along those lines?

weixuanfu · 2019-03-08T13:19:48Z

@miguelehernandez So far, there is no API way or a way without using tpot_obj. You're welcome to make contribution to this function!

utopianpallu · 2020-07-07T12:50:21Z

@weixuanfu Can we access all the intermediate pipelines that are completed without having the fit function to complete all the pipelines fully and then access using the evaluated_individuals_ on tpot object.
Are there any log files that keep these information like in autosklearn?

weixuanfu · 2020-07-07T13:52:20Z

You may interrupt fit function (like Ctrl+C), then TPOT should store all intermediate pipelines into evaluated_individuals_.

utopianpallu · 2020-07-07T14:39:25Z

@weixuanfu Thanks for the reply. But if we keep interrupting it will take too much time to complete.
can you suggest any other workaround for this ?

weixuanfu · 2020-07-07T15:33:15Z

You could use a loop of fitting TPOT object with warm_start=True and generation=1 and save evaluated_individuals_ Python dictionary every generation/loop.

utopianpallu · 2020-07-08T05:09:05Z

You could use a loop of fitting TPOT object with warm_start=True and generation=1 and save evaluated_individuals_ Python dictionary every generation/loop.

@weixuanfu Thank you.

utopianpallu · 2020-07-09T10:29:40Z

@weixuanfu is this the right way ? In generation1 i am getting only 7 pipelines , but it should generate 8.
population_size+generations*offspring_size = 4 + 4 = 8. Please Clarify.

weixuanfu · 2020-07-09T11:03:34Z

Hmm, it is not right. Please put tp=TPOTClassifier... outside loop and random_state=0 means no random_state in TPOT (I think we need change this in the future).

I am not sure why only 7 pipeines each loop in the stdout in your notebook. evaluated_individuals_ should gain 4 pipelines in each loop unless there is a duplicated pipeline. Please check this demo for reference.

utopianpallu · 2020-07-09T14:15:21Z

Hmm, it is not right. Please put tp=TPOTClassifier... outside loop and random_state=0 means no random_state in TPOT (I think we need change this in the future).

I am not sure why only 7 pipeines each loop in the stdout in your notebook. It should add 4 pipelines in each loop unless there is a duplicated pipeline. Please check this demo for reference.

Okay @weixuanfu Thanks for the clarification :)

ahmedafnouch816 · 2020-12-04T21:10:36Z

AttributeError: 'TPOTClassifier' object has no attribute '_optimized_pipeline'

wayneking517 · 2021-08-30T21:18:21Z

I'd like to re-ask the original question. How can one output information about performance of different solvers (perhaps visualizing ranges, colored scatter plots, etc.) and the impact of different parameters on specific solvers in a way that can be easily visualized. I can't see how to pick tpot.evaluated_individuals_ apart

Here is part of tpot.evaluated_individuals_:

{'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=100.0, GaussianProcessRegressor__kernel=1**2 * RationalQuadratic(alpha=0.1, length_scale=0.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.8067110730819081}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=0.01, GaussianProcessRegressor__kernel=1**2 * ExpSineSquared(length_scale=0.5, periodicity=3) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=5e-09, GaussianProcessRegressor__kernel=1**2 * ExpSineSquared(length_scale=0.5, periodicity=3) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.7751851811051406}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=10.0, GaussianProcessRegressor__kernel=1**2 * Matern(length_scale=0.5, nu=1.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=1.0, GaussianProcessRegressor__kernel=1**2 * Matern(length_scale=0.5, nu=1.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': -0.013497379992263215}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=1.0, GaussianProcessRegressor__kernel=1**2 * RBF(length_scale=0.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=5e-09, GaussianProcessRegressor__kernel=1**2 * RationalQuadratic(alpha=0.1, length_scale=0.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.7965992910211398}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=0.001, GaussianProcessRegressor__kernel=1**2 * ExpSineSquared(length_scale=0.5, periodicity=3) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=100.0, GaussianProcessRegressor__kernel=0.316**2 * DotProduct(sigma_0=1) ** 2 + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.7844064509089768}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=0.001, GaussianProcessRegressor__kernel=1**2 * Matern(length_scale=0.5, nu=1.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=100.0, GaussianProcessRegressor__kernel=1**2 * ExpSineSquared(length_scale=0.5, periodicity=3) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=True, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.8176275484733164}, 'GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=0.01, GaussianProcessRegressor__kernel=1**2 * ExpSineSquared(length_scale=0.5, periodicity=3) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)': {'generation': 1, 'mutation_count': 1, 'crossover_count': 0, 'predecessor': ('GaussianProcessRegressor(input_matrix, GaussianProcessRegressor__alpha=100.0, GaussianProcessRegressor__kernel=1**2 * RationalQuadratic(alpha=0.1, length_scale=0.5) + WhiteKernel(noise_level=0.1), GaussianProcessRegressor__normalize_y=False, GaussianProcessRegressor__optimizer=fmin_l_bfgs_b)',), 'operator_count': 1, 'internal_cv_score': 0.7799099422133781}}

rhiever added the question label Jan 3, 2017

weixuanfu mentioned this issue May 11, 2017

Add 7 unit tests #444

Merged

weixuanfu mentioned this issue May 15, 2017

Export intermediate results #448

Closed

axelroy mentioned this issue May 22, 2017

Visualize constructed features and get best pipeline found. #459

Closed

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

PGijsbers mentioned this issue Jun 23, 2017

Tpot examples do not seem to differentiate/evolve? #503

Closed

gosuto-inzasheru mentioned this issue Oct 26, 2019

.export() to something else than a file #938

Closed

perib mentioned this issue Sep 21, 2023

TPOT2 and the future of TPOT development -- From the Devs #1322

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow to visualize Tpot results #337

Workflow to visualize Tpot results #337

TheodoreGalanos commented Jan 3, 2017

sskarkhanis commented May 11, 2017

weixuanfu commented May 11, 2017 •

edited

Loading

weixuanfu commented May 11, 2017

sskarkhanis commented May 12, 2017

zhuangyanbuaa commented Dec 7, 2017

weixuanfu commented Dec 7, 2017 via email

sergiolucero commented Apr 13, 2018

mikesaclgithubaccount commented Mar 7, 2019

weixuanfu commented Mar 7, 2019 •

edited

Loading

mikesaclgithubaccount commented Mar 7, 2019 •

edited

Loading

weixuanfu commented Mar 8, 2019

utopianpallu commented Jul 7, 2020 •

edited

Loading

weixuanfu commented Jul 7, 2020

utopianpallu commented Jul 7, 2020 •

edited

Loading

weixuanfu commented Jul 7, 2020

utopianpallu commented Jul 8, 2020

utopianpallu commented Jul 9, 2020 •

edited

Loading

weixuanfu commented Jul 9, 2020 •

edited

Loading

utopianpallu commented Jul 9, 2020

ahmedafnouch816 commented Dec 4, 2020

wayneking517 commented Aug 30, 2021

Workflow to visualize Tpot results #337

Workflow to visualize Tpot results #337

Comments

TheodoreGalanos commented Jan 3, 2017

sskarkhanis commented May 11, 2017

weixuanfu commented May 11, 2017 • edited Loading

weixuanfu commented May 11, 2017

sskarkhanis commented May 12, 2017

zhuangyanbuaa commented Dec 7, 2017

weixuanfu commented Dec 7, 2017 via email

sergiolucero commented Apr 13, 2018

mikesaclgithubaccount commented Mar 7, 2019

weixuanfu commented Mar 7, 2019 • edited Loading

mikesaclgithubaccount commented Mar 7, 2019 • edited Loading

weixuanfu commented Mar 8, 2019

utopianpallu commented Jul 7, 2020 • edited Loading

weixuanfu commented Jul 7, 2020

utopianpallu commented Jul 7, 2020 • edited Loading

weixuanfu commented Jul 7, 2020

utopianpallu commented Jul 8, 2020

utopianpallu commented Jul 9, 2020 • edited Loading

weixuanfu commented Jul 9, 2020 • edited Loading

utopianpallu commented Jul 9, 2020

ahmedafnouch816 commented Dec 4, 2020

wayneking517 commented Aug 30, 2021

weixuanfu commented May 11, 2017 •

edited

Loading

weixuanfu commented Mar 7, 2019 •

edited

Loading

mikesaclgithubaccount commented Mar 7, 2019 •

edited

Loading

utopianpallu commented Jul 7, 2020 •

edited

Loading

utopianpallu commented Jul 7, 2020 •

edited

Loading

utopianpallu commented Jul 9, 2020 •

edited

Loading

weixuanfu commented Jul 9, 2020 •

edited

Loading