Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]
What does this PR do?
Some bug fixes
edited ColumnOneHotEncoder to simulate the behavior of the OneHotEncoder. It will now automatically select columns with fewer than 10 unique values and one hot encode them (same behavior as TPOT1). The original OneHotEncoder is not compatible with pandas dataframes, but this one should be. Replaced the OneHotEncoder with ColumnOneHotEncoder in the tpot2 search space. We could also change this later to make the number of unique values a searchable parameter.
A bug in the initial pipeline generator caused the initial pipeline to all be of size 1 when leaf_config_dict was not set. Added an additional check to make sure that the initial population pipelines will include more nodes from the inner_config_dict when leaf_config_dict is None.
A typo prevented the complexity scorer from recursively searching sklearn Pipeline classes. Fixed the typo to correctly pass in the estimator to the recursive function. Previously it was passing in a tuple from the pipeline.steps, rather than the actual estimator found in the second index of that tuple.