You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FeatureSetSelector only works when set as the first step of a template. When no template is used, or when it is set to be in the middle of a template, the behavior is not well defined. It would be helpful if the FSS could be used without a template. For example, TPOT can set the FSS to be the first step of the pipeline, but then have the rest of the pipeline be unrestricted.
for example the following works normally.
template='FeatureSetSelector-Transformer-Classifier')
However "Transformer-FeatureSetSelector-Classifier" does not work, nor will the base model without a template. There are two issues with that:
When using string column names: those are not preserved in the other transformations so when FSS is not first, it cannot use the feature names and crashes.
2. When using indexes of columns, the ordering is not guaranteed to be preserved with transformations in the other steps. This leads to FSS picking out a different subset than indented while also discarding the rest of the data up to that point.
for example lets say subset 1 is indexes 0 and 1. and our pipeline is Some Transformer-FSS-Classifier
out data is [0,1,8,9]
the first transformation adds two columns
data is now [7, 7, 0, 1, 8, 9]
FSS will now select [7,7], and discard the rest (including the added transformation in the last step).
The text was updated successfully, but these errors were encountered:
An additional useful feature, but may be more difficult to implement, would be to have FSS pass in different data to different "branches". For example:
This could be possible by using FeatureUnion to group the outputs of two branches. TPOT could be initialized to have a FeatureUnion with a user specified number of items that begin with with FSS, the rest being determined through GP.
FeatureSetSelector only works when set as the first step of a template. When no template is used, or when it is set to be in the middle of a template, the behavior is not well defined. It would be helpful if the FSS could be used without a template. For example, TPOT can set the FSS to be the first step of the pipeline, but then have the rest of the pipeline be unrestricted.
for example the following works normally.
template='FeatureSetSelector-Transformer-Classifier')
However "Transformer-FeatureSetSelector-Classifier" does not work, nor will the base model without a template. There are two issues with that:
When using string column names: those are not preserved in the other transformations so when FSS is not first, it cannot use the feature names and crashes.
2. When using indexes of columns, the ordering is not guaranteed to be preserved with transformations in the other steps. This leads to FSS picking out a different subset than indented while also discarding the rest of the data up to that point.
for example lets say subset 1 is indexes 0 and 1. and our pipeline is Some Transformer-FSS-Classifier
out data is [0,1,8,9]
the first transformation adds two columns
data is now [7, 7, 0, 1, 8, 9]
FSS will now select [7,7], and discard the rest (including the added transformation in the last step).
The text was updated successfully, but these errors were encountered: