Skip to content

Latest commit

 

History

History
224 lines (170 loc) · 6.47 KB

CHANGELOG.md

File metadata and controls

224 lines (170 loc) · 6.47 KB

unreleased

  • improved: performance of the hashing encoder (about twice as fast)
    • deprecate the `max_sample`` parameter, it has no use anymore
    • add process_creation_method parameter
    • use concurrent.futures.ProcessPoolExecutor instead of hand-managed queues
    • optimisations to hashlib calls, remove python 2 checks, fork instead of spawn

v2.6.3

  • fixed: issue 424 - pandas NaNs
  • fixed: pandas deprecations and ruff linter suggestions
  • fixed: issue 421 - detect pandas categorical type as categorical column

v2.6.2

  • fixed: issue 414 - broken link
  • fixed: issue 412 - timestamp types in ordinal encoder
  • fixed: importlib instead of pkg_resources

v2.6.1

  • added: ignore option for one-hot-encoding
  • fixed: external dependency in unit test
  • fixed: gaps in ordinal encoding if nan values are present
  • fixed: sklearn compliance: add feature_names_in_ attribute
  • fixed: sklearn compliance: get_feature_names_out function has the correct signature
  • fixed: add RankHotEncoder in documentation
  • fixed: return correct mapping in one hot encoder category_mapping property (issue #256)
  • refactor: quadratic runtime in ordinal encoder (issue #407)

v2.6.0

  • added gray encoder
  • added thermometer / rank-hot encoder
  • introduce compatibility with sklearn 1.2
    • compatibility with feature_names_out_
    • remove boston housing dataset
    • drop support for dataframes with non-homogenous data types in column names (i.e. having both string and integer column names)
  • improve performance of hashing encoder
  • improve catboost documentation
  • fix inverse transform in baseN with special character column names (issue 392)
  • fix inverse transform of ordinal encoder with custom mapping (issue 202)
  • fix re-fittable polynomial wrapper (issue 313)
  • fix numerical stability for target encoding (issue 377)
  • change default parameters of target encoding (issue 327)
  • drop support for sklearn 0.x

v2.5.1.post0

  • fix pypi sdist

v2.5.1

  • Added base class for contrast coding schemes in order to make them more maintainable
  • Added hierarchical column feature in target encoder
  • Fixed maximum recursion depth bug in hashing encoder

v2.5.0

  • Introduce base class for encoders
  • Introduce tagging system on encoders and use it to parametrize tests
  • Drop support for python 3.5 and python 3.6
  • Require pandas >=1.0
  • Introduce f-strings
  • Make BinaryEncoder a BaseNEncoder for base=2
  • FutureWarning for TargetEncoder's default parameters
  • Made all encoders re-fittable on different datasets (c.f. issue 122)
  • Introduced tox.ini file for easier version testing

v2.4.1

  • Fixed a bug with categorical data type in LeaveOneOut encoder
  • Do not install examples as a package on its own

v2.4.0

  • improved documentation
  • fix bug in CatBoost encoder
  • fix future warnings with pandas
  • added tests for python 3.9 and 3.10 in pipeline
  • fix treating np.NaN and python None equal
  • only build docs on release
  • unified conversion of inputs pandas objects that are used internally including some bugfixes.
  • added quantile encoder and summary encoder

v2.3.0

  • many bugfixes
  • added count encoder

v2.2.2

  • Added generalized linear mixed model encoder
  • Added cross-validation wrapper
  • Added multi-class wrapper
  • Support for pandas >= 1.0.1
  • Moved CI to github actions

v2.1.0

  • Added experimental support for multithreading in hashing encoder
  • Support for pandas >=0.24
  • Removed support for missing values represented by None due to changes in Pandas 0.24. Use numpy.NaN
  • Changed the default setting of Helmert encoder for handle_missing and handle_unknown
  • Fixed wrong calculation in m-estimate encoder
  • Fixed missing value handling in CatBoost encoder

v2.0.0

  • Added James-Stein, CatBoost and m-estimate encoders
  • Added get_feature_names method
  • Refactored treatment of missing and new values
  • Speed up the encoders with vectorization
  • Improved compatibility with Pandas Series and Numpy Arrays

v1.3.0

  • Added Weight of Evidence encoder

v1.2.8

  • Critical bugfix in hashing encoder

v1.2.7

  • Bugfixes related to missing value imputation
  • Category names optionally added to encoded column names for some encoders
  • Documentation updates
  • Stats models pinned to avoid errors
  • Performance enhancements

v1.2.6

  • Release for zenodo DOI
  • Inverse transform implemented for some encoders

v1.2.5

  • Onehot transform returns same columns always
  • Missing value and unknown handling now configurable in all relevant encoders

v1.2.4

  • Added more sophisticated missing value or unknown category handling to ordinal
  • Passing through missing value config from onehot into ordinal
  • Onehot will return an extra column when unknown categories are passed in if impute is used.
  • Added BaseNEncoder to allow for more flexible alternatives to ordinal, onehot and binary.

v1.2.3

  • Full support for numpy arrays as input, not just dataframes.

v1.2.2

  • All encoders handle missing values and are tested for their handling
  • Created a onehot encoder that follows the same conventions as the rest of the library instead of using sklearns.
  • Did some basic benchmarking for data compression and memory usage, made some performance improvements
  • Changed all docstrings to numpy style and added more documentation
  • Moved all logic methods into staticmethods of the transformer classes themselves.
  • Added more detailed checks for type and shape of input data in fit and transform
  • Support input as list of lists, alongside numpy arrays and pandas dataframes.

v1.2.1

  • Better handling for missing values in hashing encoder

v1.2.0

  • Testing enhancements
  • Hash type in hashing encoder now defaults to md5 using hashlib, but can be set to any valid hashlib hash

v1.1.2

  • Added optional parameter to return a numpy array rather than a dataframe from all transformers.

v1.1.1

  • Immediately return if cols is empty.

v1.1.0

  • Optionally pass drop_invariant to any encoder to consistently drop columns with 0 variance from the output (based on training set data in fit())
  • If None is passed as the cols param, every string column will be encoded (pandas type = object).

v1.0.5

  • Changed setup.py to not explicitly force reinstalls of other packages

v1.0.4

  • Bugfixes

v1.0.0

  • First real usable release, includes sklearn compatible encoders.

v0.0.1

  • Basic library of encoders, no automated testing.