Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix defaultdict behavior on NumericalEncoder #21

Closed
wants to merge 13 commits into from
Closed

Conversation

gfournier
Copy link
Member

No description provided.

gfournier and others added 13 commits August 7, 2019 18:48
* fix bug when no elements to iterator on

* remove useless space
* add failing test for categorie

* - add a function that can replace categorical columns by object columns

- recognize 'category' as a CAT type of variable

* ajoute de get ride of categories

modifications des transfo numericalencoder et targetencoder
 ajout d un test de guess_type_of_variables

* - add a get_rid_of_categories in the fit_transform of targetencoder

- add test of targetencoder with categorical dtype
- add test of numericalencoder with categorical dtype

* modif de test_guesss_type_of_variable

* ajout d'un test permettant de vérifier que le numerical encoder ne transforme pas les colonnes catégorielles ayant des int en colonnes numériques.

pour l'instant, le test fail

* modification du code pour que le numericalencoder et le targetencoder fonctionnent correctement

ajout de tests

* modifs prenant en compte les comments de la pull request

* remaining changes for the pull request

* clean commit
* Block Search + other (#2)

* add make_pipeline function (works like sklearn)

* fix type "_if_fitted" -> "_already_fitted"

*  * add handling of columns_to_encode == "--object--" in target encoder

 * corresponding test

* add Numerical encoder test for "columns_to_encode == '--object--' "

* expose command argument parser outside, to be able to add new arguments.

* change WordVectorizer in char mod distributions

+ fix bug in HyperRangeBetaInt

* change default behavior : encode "columns_to_encode == '--object--' "

* remove 'bug' (double return)

* allow text preprocessors to concat their inputs

* add 'RandomTrainTestCv' and 'IndexTrainCv' cv-like object.

* same api as a regular cv object ...
* ... but only one split

* add 'use_for_block_search' attribute + filter models based on that

* * add block search iterator

* automl config : models_to_keep_block_search

* fix typo in test

* ignore Warning in test

* move 'function_has_named_argument' from .transformers.model_wrapper to .tools.helper_functions

* cleanning

* dispatch and split the groups variable to the estimator

* add groups to methods + dispatch it to estimators within the pipeline

* test on cross validation and pipeline to check the passing of groups

* remove useless import

* remove useless

* fix X -> lastX

* debug help

* fix after merge

* make sur benchmark can be computed

* input np.inf  as well as np.nan

* spaces

* don't split and tokenize if not needed

* new tests auto-ml, when only numerical values

* allow scoring to return multiple values

* allow cross_validation to be in Parallel

# Conflicts:
#	aikit/cross_validation.py

* add a custom CV for groups

*  * froze init param

 * allow additionnal function to be computed

* read additionnal results

* allow guiding to be done on an "addtionnal metric"

* typo

* add name of excel print

* test if name of columns has change
* remove config.json

* fix loading

* remove nltk addtional path
* accelerate code using map and dict

* accelerate concatenation code

* Update categories
* fix seed
* new test CdfScaler
* * new helpers function (merge node and subbranch search)

* fix ordering in graph from edges

* * generalize the notion of model graph

* change name representation

* Block Search + other (#2)

* add make_pipeline function (works like sklearn)

* fix type "_if_fitted" -> "_already_fitted"

*  * add handling of columns_to_encode == "--object--" in target encoder

 * corresponding test

* add Numerical encoder test for "columns_to_encode == '--object--' "

* expose command argument parser outside, to be able to add new arguments.

* change WordVectorizer in char mod distributions

+ fix bug in HyperRangeBetaInt

* change default behavior : encode "columns_to_encode == '--object--' "

* remove 'bug' (double return)

* allow text preprocessors to concat their inputs

* add 'RandomTrainTestCv' and 'IndexTrainCv' cv-like object.

* same api as a regular cv object ...
* ... but only one split

* add 'use_for_block_search' attribute + filter models based on that

* * add block search iterator

* automl config : models_to_keep_block_search

* fix typo in test

* ignore Warning in test

* fix type : TransformToBlockManager

* add number of output utils function

* spaces

* new tests with impossible graphs

* fix merged

* fix notebook error

* add list test

* remove useless import

* spaces

* fix docstring

* merge 2 loops

* remove duplicate edge
* add a few ploting functions

* add assert
@gfournier gfournier closed this Oct 2, 2019
gfournier pushed a commit that referenced this pull request Feb 28, 2020
* fix bug when type_of_problem is setted

* add default

* add specific test
gfournier added a commit that referenced this pull request Feb 28, 2020
* bump version to 0.1.3

* bump version to 0.1.4-dev

* massive black reformating (#18)

* fix bug when type_of_problem is setted (#21)

* fix bug when type_of_problem is setted

* add default

* add specific test

* Fix numericalencoder (#22)

* fix NumericalEncoder with default values

* fix NumericalEncoder with default values

* Fix doc (#26)

* ignore .bat file to create doc

* fix doc

* comment in english

* clean test

* requirements pandas >= 0.23 (#27)

* node -> nodes (was deprecated and is now absent) (#31)

* Add test picklable (#23)

* test numerical encoder is picklable

* test numerical encoder is picklable

* test target encoder is picklable

* black

* black

* remove warning printing

* add test : unpickled object behave like original object

* improve auto ml doc (#30)

* re-index 'fit_params' that are indexable

* change version 0.1.5

* conversion model to json (#36)

* v0.1.6-dev

* add conversion model to json : 'param_from_sklearn_model' + corresponding tests

* add new numpy type to be cast to python type

* test if object can be json serialized

* refactoring of columns selection (#29)

* add conversion model to json : 'param_from_sklearn_model' + corresponding tests

* change wrapper, 'drop_used_columns' and 'drop_unused_columns'

* temp : remove useless attribute

* temp : fix test

* allow selector to select of type of variable among TypeOfVariables.CAT / TEXT / NUM

* change default for numerical encoder

* change test

* temp : new test

* renamming

* comments

* clean docstring

* change text models

* change 'base' models

* change corresponding tests

* add numpy array support

* fix tests

* fix test

* fix random_forest_addins columns_to_use

* fix Targetencoder

* fix special case when no column to pass to the model

* typo

* fix get_feature_names

* allow not to raise when shape between fit and transform differs

* corresponding tests

* cleanning

* fix doc + default

* rename

* fix registration

* black reformat

* clean

* add helper method

* temp add fitting test

* clean

* add test : try to fit model

* add custom default hyper-parameters

* fix inf

* clean

* add test not inf CdfScaler

* cap number of component to nb of rows

* fix seed by default

* clean

* make CdfScaler to very small, almost equal values

* test very close and  very small values

* cleanning

* divers

* add min_count param

* more data in test

* * remove cast of string that can be parsed

* corresponding test

* change version 1.0.0

* dev version 1.0.1-dev

* change version 0.2.0

* dev 0.2.1-dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant