You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a general rule tpot enforces imputation to match sklearn requirements for all real values in the input and output data. XGboost as a special case allows for the input of NaN values.
Context of the issue
I am trying to optimise XGboost specifically using a data set with quite a lot of holes in it. I do not want to perform imputation as it affects the results. I looked in base.py and quickly modified the _check_data function to ignore NaN values and to not perform imputation but was wondering if tpot can be modified to accommodate this scenario with XGboost?
A 'no_imputation' keyword might be added to TPOTBase .__init__ for example to prevent imputation.
Example Edits:
else:
if not self._imputed and np.any(np.isnan(features)):
self._imputed = True
features = self._impute_values(features)
try:
if target is not None:
X, y = check_X_y(features, target, accept_sparse=True, dtype=np.float64,
force_all_finite='allow-nan')
return X, y
The text was updated successfully, but these errors were encountered:
TPOT enforces imputation on dataset with NaN because most operators in TPOT configuration do not support NaN. We may need another configuration if this no_impuation option is added.
I understand, it is a very specific case that I'm working on. Just currently there is no way to escape imputation with TPOT unless you modify the source. It is not a necessity perhaps more a nice to have.
As a general rule tpot enforces imputation to match sklearn requirements for all real values in the input and output data. XGboost as a special case allows for the input of NaN values.
Context of the issue
I am trying to optimise XGboost specifically using a data set with quite a lot of holes in it. I do not want to perform imputation as it affects the results. I looked in base.py and quickly modified the
_check_data
function to ignore NaN values and to not perform imputation but was wondering if tpot can be modified to accommodate this scenario with XGboost?A 'no_imputation' keyword might be added to
TPOTBase .__init__
for example to prevent imputation.Example Edits:
The text was updated successfully, but these errors were encountered: