Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Commit

Permalink
Dev sklearn preprocess (#157)
Browse files Browse the repository at this point in the history
* age/employment dummies (#104)

* added diff features

* New handcrafted features (#102)

* Dynamic features

* Smart features (#61)

* Update README.md

* Update README.md

* Update

* Smart features update

* More descriptive transformer name

* Reading all data in main

* More application features

* Transformer for cleaning

* Multiinput data dictionary

* Fix (#63)

* fixed configs

* dropped redundand steps, moved stuff to cleaning, refactored groupby (#64)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Fix format string

* Update pipeline_manager.py

clipped prediction -> prediction

* added stratified kfold option (#77)

* Update config (#79)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Update pipeline_config.py

* Dev review (#81)

* dropped feature by type split, refactored pipleine_config

* dropped feature by type split method

* explored application features

* trash

* reverted refactor of aggs

* fixed/updated bureau features

* cleared notebooks

* agg features added to notebook bureau

* credit card cleaned

* added other feature notebooks

* added rank mean

* updated model arch

* reverted to old params

* fixed rank mean calculations

* ApplicationCleaning update (#84)

* Cleaning - application

* Clear output in notebook

* clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85)

* local trash

* External sources notebook (#86)

* Update

* External sources notebook

* Dev lgbm params (#88)

* local trash

* updated configs

* dropped comment

* updated lgb params

* Dev app agg fix (#90)

* dropped app_aggs

* app agg features fixed

* cleaned leftovers

* dropped fast read-in for debug

* External_sources statistics (#89)

* Speed-up ext_src notebook

* exernal_sources statistics

* Weighted mean and notebook fix

* application notebook update

* clear notebook output

* Fix auto submission (#95)

* CreditCardBalance monthly diff mean

* POSCASH remaining installments

* POSCASH completed_contracts

* notebook update

* Resolve conflicts

* Fix

* Update neptune.yaml

* Update neptune_random_search.yaml

* Split static and dynamic features - credit card balance

* Dev nan count (#105)

* added nan_count

* added nan count with parameter

* Dev fe installments (#106)

* added simple features, parallel groupby, last-installment features

* refactored last_installment features

* added features for the very last installment

* Dev fe instalments dynamic (#107)

* added dynamic-trend features

* formated configs

* added skew/iqr features

* added number of credit agreement change features (#109)

* added number of credit agreement change features

* reverted sample size

* Dynamic features - previous application (#108)

* previous_application handcrafted features

* previous application cleaning

* Update neptune.yaml

* code improvement

* Update notebook

* Notebook - feature importance (#112)

* Dev speed up (#111)

* refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby)

* sped up all hand crafted

* fixed bureau worker errors

* fixed isntallment names

* fixed isntallment names

* fixed bureau and prev_app naming bugs

* reverted to vectorized where possible

* updated hyperparams

* updated early stopping params to meet convergence

* reverted to old fallback neptune file

* updated paths

* updated paths, explored prev-app features

* dropped duplicated agg

* POS_CASH added features

* added second level models (#126)

* POS CASH features added

* Family features (#128)

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Features - family - test

* Features - family - aggregate

* Features - family - aggregate 2

* Features - family - aggregate 3

* Features - family - aggregate 4

* Update pipeline_config.py

* POS_CASH_balance feature cleaning

* Yaml adjustment

* Data cleaning and two new features (previous application) (#129)

* new previous application features

* Data cleaning

* update application notebook

* credit card cleaning

* Data cleaning - groupby agg

* Include suggested changes

* Data cleaning - fix (#130)

* new previous application features

* Data cleaning

* update application notebook

* credit card cleaning

* Data cleaning - groupby agg

* Include suggested changes

* Fix

* Dev fractions (#132)

* added fraction features to eda and feature extraction, updated configs

* updated hyperparams

* Path change

* Dev (#134)

* age/employment dummies (#104)

* added diff features

* New handcrafted features (#102)

* Dynamic features

* Smart features (#61)

* Update README.md

* Update README.md

* Update

* Smart features update

* More descriptive transformer name

* Reading all data in main

* More application features

* Transformer for cleaning

* Multiinput data dictionary

* Fix (#63)

* fixed configs

* dropped redundand steps, moved stuff to cleaning, refactored groupby (#64)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Fix format string

* Update pipeline_manager.py

clipped prediction -> prediction

* added stratified kfold option (#77)

* Update config (#79)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Update pipeline_config.py

* Dev review (#81)

* dropped feature by type split, refactored pipleine_config

* dropped feature by type split method

* explored application features

* trash

* reverted refactor of aggs

* fixed/updated bureau features

* cleared notebooks

* agg features added to notebook bureau

* credit card cleaned

* added other feature notebooks

* added rank mean

* updated model arch

* reverted to old params

* fixed rank mean calculations

* ApplicationCleaning update (#84)

* Cleaning - application

* Clear output in notebook

* clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85)

* local trash

* External sources notebook (#86)

* Update

* External sources notebook

* Dev lgbm params (#88)

* local trash

* updated configs

* dropped comment

* updated lgb params

* Dev app agg fix (#90)

* dropped app_aggs

* app agg features fixed

* cleaned leftovers

* dropped fast read-in for debug

* External_sources statistics (#89)

* Speed-up ext_src notebook

* exernal_sources statistics

* Weighted mean and notebook fix

* application notebook update

* clear notebook output

* Fix auto submission (#95)

* CreditCardBalance monthly diff mean

* POSCASH remaining installments

* POSCASH completed_contracts

* notebook update

* Resolve conflicts

* Fix

* Update neptune.yaml

* Update neptune_random_search.yaml

* Split static and dynamic features - credit card balance

* Dev nan count (#105)

* added nan_count

* added nan count with parameter

* Dev fe installments (#106)

* added simple features, parallel groupby, last-installment features

* refactored last_installment features

* added features for the very last installment

* Dev fe instalments dynamic (#107)

* added dynamic-trend features

* formated configs

* added skew/iqr features

* added number of credit agreement change features (#109)

* added number of credit agreement change features

* reverted sample size

* Dynamic features - previous application (#108)

* previous_application handcrafted features

* previous application cleaning

* Update neptune.yaml

* code improvement

* Update notebook

* Notebook - feature importance (#112)

* Dev speed up (#111)

* refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby)

* sped up all hand crafted

* fixed bureau worker errors

* fixed isntallment names

* fixed isntallment names

* fixed bureau and prev_app naming bugs

* reverted to vectorized where possible

* updated hyperparams

* updated early stopping params to meet convergence

* reverted to old fallback neptune file

* updated paths

* updated paths, explored prev-app features

* dropped duplicated agg

* POS_CASH added features

* POS CASH features added

* POS_CASH_balance feature cleaning

* Yaml adjustment

* Path change

* fix misinterpretations

'<' instead of '>'

* fix misinterpretations

'<' instead of '>'

* Add cleaning in application_groupby_agg (#137)

* application agg cleaning

* update neptune.yaml

* New branch

* Notebook dev

* q

* Sklearn models modified

* Minor bug fix

* Whatever

* Space refactor

* Old forgotten merge

* Final refactor

* Minor update

* last k features with fraction removal

* Fix PR isuuses
  • Loading branch information
karol.strzalkowski authored and jakubczakon committed Jul 26, 2018
1 parent 40af415 commit 64466d9
Show file tree
Hide file tree
Showing 12 changed files with 608 additions and 198 deletions.
3 changes: 2 additions & 1 deletion configs/neptune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,8 @@ parameters:
rf__max_features: 0.2
rf__min_samples_split: 10
rf__min_samples_leaf: 5
rf__class_weight: 1
rf__max_leaf_nodes: None
rf__class_weight: balanced

# Logistic regression
lr_random_search_runs: 0
Expand Down
5 changes: 3 additions & 2 deletions configs/neptune_random_search.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,15 +124,16 @@ parameters:
rf__max_features: '[0.01, 0.5, "uniform"]'
rf__min_samples_split: '[2, 50]'
rf__min_samples_leaf: '[1, 50]'
rf__class_weight: '[None, "balanced_subsample", "balanced", "list"]'
rf__max_leaf_nodes: None
rf__class_weight: balanced

# Logistic regression
lr_random_search_runs: 50
lr__penalty: '["l2", "l1", "list"]'
lr__tol: '[0.00001, 0.01, "log-uniform"]'
lr__C: '[0.1, 100, "log-uniform"]'
lr__fit_intercept: '[0, 1, "list"]'
lr__class_weight: '[None, "balanced", "list"]'
lr__class_weight: balanced
lr__solver: '["liblinear", "saga", "list"]'
lr__max_iter: '[100, 1000, 10000, 50000, "list"]'

Expand Down
3 changes: 2 additions & 1 deletion configs/neptune_stacking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,8 @@ parameters:
rf__max_features: 0.2
rf__min_samples_split: 10
rf__min_samples_leaf: 5
rf__class_weight: 1
rf__max_leaf_nodes: None
rf__class_weight: balanced

# Logistic regression
lr_random_search_runs: 0
Expand Down
Loading

0 comments on commit 64466d9

Please sign in to comment.