Detect ID columns #27

gfournier · 2019-10-28T09:09:08Z

Allows to detect "unique identifier" columns - they should be removed from training

* bump version to 0.1.3 * bump version to 0.1.4-dev * massive black reformating (#18) * fix bug when type_of_problem is setted (#21) * fix bug when type_of_problem is setted * add default * add specific test * Fix numericalencoder (#22) * fix NumericalEncoder with default values * fix NumericalEncoder with default values * Fix doc (#26) * ignore .bat file to create doc * fix doc * comment in english * clean test * requirements pandas >= 0.23 (#27) * node -> nodes (was deprecated and is now absent) (#31) * Add test picklable (#23) * test numerical encoder is picklable * test numerical encoder is picklable * test target encoder is picklable * black * black * remove warning printing * add test : unpickled object behave like original object * improve auto ml doc (#30) * re-index 'fit_params' that are indexable * change version 0.1.5 * conversion model to json (#36) * v0.1.6-dev * add conversion model to json : 'param_from_sklearn_model' + corresponding tests * add new numpy type to be cast to python type * test if object can be json serialized * refactoring of columns selection (#29) * add conversion model to json : 'param_from_sklearn_model' + corresponding tests * change wrapper, 'drop_used_columns' and 'drop_unused_columns' * temp : remove useless attribute * temp : fix test * allow selector to select of type of variable among TypeOfVariables.CAT / TEXT / NUM * change default for numerical encoder * change test * temp : new test * renamming * comments * clean docstring * change text models * change 'base' models * change corresponding tests * add numpy array support * fix tests * fix test * fix random_forest_addins columns_to_use * fix Targetencoder * fix special case when no column to pass to the model * typo * fix get_feature_names * allow not to raise when shape between fit and transform differs * corresponding tests * cleanning * fix doc + default * rename * fix registration * black reformat * clean * add helper method * temp add fitting test * clean * add test : try to fit model * add custom default hyper-parameters * fix inf * clean * add test not inf CdfScaler * cap number of component to nb of rows * fix seed by default * clean * make CdfScaler to very small, almost equal values * test very close and very small values * cleanning * divers * add min_count param * more data in test * * remove cast of string that can be parsed * corresponding test * change version 1.0.0 * dev version 1.0.1-dev * change version 0.2.0 * dev 0.2.1-dev

gfournier added the enhancement New feature or request label Oct 28, 2019

gfournier pushed a commit that referenced this issue Feb 28, 2020

requirements pandas >= 0.23 (#27)

8a93376

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect ID columns #27

Detect ID columns #27

gfournier commented Oct 28, 2019

Detect ID columns #27

Detect ID columns #27

Comments

gfournier commented Oct 28, 2019