-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visualize constructed features and get best pipeline found. #459
Comments
For the first question, I think this is right way to use it for current version and next version (0.8). You can also find the example in the unit test, test_evaluated_individuals, if the way is updated in future versions. For the second one. for now TPOT cannot provide the ranking of features' importance as Figure 5 in the paper. The importance of features on page 12 was estimated using Random Forest method. |
Hi @axelroy, If you want to access the best model from the TPOT run, you can access it with the In terms of presenting feature importances, those are limited to specific models. In the case of the paper you linked, those were decision trees and random forests, so I was displaying tree-based feature importances. If TPOT discovers a pipeline for you that uses a decision tree or other tree-based method as the final classifier, for example, then you could access those feature importances with the following code: # The first index to -1 gets the last step in the pipeline
# The second index to 1 gets the actual classifier object
tpot._fitted_pipeline.steps[-1][1].feature_importances_ which is an array of feature importances that you can then match with the feature names. The same applies with linear models, except you'd access the |
Thank you very much for the responses, I'll test this as soon as possible. |
Closing this issue for now. Please feel free to re-open if you have any more questions or comments. |
I'm using tpot and loving it, but am struggling to join the names of the features I provide tpot with the list of feature importances that I extract using
Here's an example pipeline to which I would like to apply such a method:
Apologies if I missed this being addressed previously. |
Hmm, I think those synthetic features should be in those first (left) columns but they usually had very high importance scores in the last operator of pipeline. For now, TPOT does not provide this option for disabling synthetic feature generation. But:
Please check #152 for more details. We are working on a more advanced pipeline configuration option. |
Thanks @weixuanfu ! For purposes of transparency, explainability, and trust, it would be lovely to have the ability to connect TPOT to something like eli5 for feature importance inspection and exploration. This may not be so important for biological work (I don't really know), but for public safety work, it's quite important to be able to be able to explain -- if only very roughly -- how a model works. |
@weixuanfu I'm using TPOT and I want to extract the feature importance for every evaluated individual and not just the best pipeline. I am able to access all the pipelines using |
Greetings,
First of all, thank you for the amazing job you did on this project. I'm trying to use TPOT on a research context, and after a few tests, I have some questions about how to use it :
I've seen in the issue Workflow to visualize Tpot results #337 that we can retrieve the explored pipelines with the
tpot._evaluated_individuals
set. Is this a good way to use it, or could it change over the versions? I want to be able to retrieve the best model, the features and the parameters to store it into a DB.Is there any way to retrieve the best features, as shown in this paper on page 12, and to know from which the constructed ones have been based on?
Thank you for your help,
Kind regards,
Axel.
The text was updated successfully, but these errors were encountered: