- Onehot transform returns same columns always
- Missing value and unknown handling now configurable in all relevant encoders
- Added more sophisticated missing value or unknown category handling to ordinal
- Passing through missing value config from onehot into ordinal
- Onehot will return an extra column when unknown categories are passed in if impute is used.
- Added BaseNEncoder to allow for more flexible alternatives to ordinal, onehot and binary.
- Full support for numpy arrays as input, not just dataframes.
- All encoders handle missing values and are tested for their handling
- Created a onehot encoder that follows the same conventions as the rest of the library instead of using sklearns.
- Did some basic benchmarking for data compression and memory usage, made some performance improvements
- Changed all docstrings to numpy style and added more documentation
- Moved all logic methods into staticmethods of the transformer classes themselves.
- Added more detailed checks for type and shape of input data in fit and transform
- Support input as list of lists, alongside numpy arrays and pandas dataframes.
- Better handling for missing values in hashing encoder
- Testing enhancements
- Hash type in hashing encoder now defaults to md5 using hashlib, but can be set to any valid hashlib hash
- Added optional parameter to return a numpy array rather than a dataframe from all transformers.
- Immediately return if cols is empty.
- Optionally pass drop_invariant to any encoder to consistently drop columns with 0 variance from the output (based on training set data in fit())
- If None is passed as the cols param, every string column will be encoded (pandas type = object).
- Changed setup.py to not explicitly force reinstalls of other packages
- Bugfixes
- First real usable release, includes sklearn compatible encoders.
- Basic library of encoders, no automated testing.