Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Dev time correction #144

Merged
merged 41 commits into from
Jul 20, 2018
Merged

Conversation

kstrzala
Copy link

Code contributions

  1. Correction of time features for Installments and POS_CASH;
  2. bureau_balance features, including time features;
  3. added period fraction features for POS_CASH

Sorry that there are two major issues in one PR, I will try to improve in the future!

jakubczakon and others added 30 commits July 4, 2018 09:45
* Dynamic features

* Smart features (minerva-ml#61)

* Update README.md

* Update README.md

* Update

* Smart features update

* More descriptive transformer name

* Reading all data in main

* More application features

* Transformer for cleaning

* Multiinput data dictionary

* Fix (minerva-ml#63)

* fixed configs

* dropped redundand steps, moved stuff to cleaning, refactored groupby (minerva-ml#64)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Fix format string

* Update pipeline_manager.py

clipped prediction -> prediction

* added stratified kfold option (minerva-ml#77)

* Update config (minerva-ml#79)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Update pipeline_config.py

* Dev review (minerva-ml#81)

* dropped feature by type split, refactored pipleine_config

* dropped feature by type split method

* explored application features

* trash

* reverted refactor of aggs

* fixed/updated bureau features

* cleared notebooks

* agg features added to notebook bureau

* credit card cleaned

* added other feature notebooks

* added rank mean

* updated model arch

* reverted to old params

* fixed rank mean calculations

* ApplicationCleaning update (minerva-ml#84)

* Cleaning - application

* Clear output in notebook

* clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (minerva-ml#85)

* local trash

* External sources notebook (minerva-ml#86)

* Update

* External sources notebook

* Dev lgbm params (minerva-ml#88)

* local trash

* updated configs

* dropped comment

* updated lgb params

* Dev app agg fix (minerva-ml#90)

* dropped app_aggs

* app agg features fixed

* cleaned leftovers

* dropped fast read-in for debug

* External_sources statistics (minerva-ml#89)

* Speed-up ext_src notebook

* exernal_sources statistics

* Weighted mean and notebook fix

* application notebook update

* clear notebook output

* Fix auto submission (minerva-ml#95)

* CreditCardBalance monthly diff mean

* POSCASH remaining installments

* POSCASH completed_contracts

* notebook update

* Resolve conflicts

* Fix

* Update neptune.yaml

* Update neptune_random_search.yaml

* Split static and dynamic features - credit card balance
* added nan_count

* added nan count with parameter
* added simple features, parallel groupby, last-installment features

* refactored last_installment features

* added features for the very last installment
* added dynamic-trend features

* formated configs

* added skew/iqr features
* added number of credit agreement change features

* reverted sample size
* previous_application handcrafted features

* previous application cleaning

* Update neptune.yaml

* code improvement

* Update notebook
* refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby)

* sped up all hand crafted

* fixed bureau worker errors

* fixed isntallment names

* fixed isntallment names

* fixed bureau and prev_app naming bugs

* reverted to vectorized where possible

* updated hyperparams

* updated early stopping params to meet convergence

* reverted to old fallback neptune file

* updated paths

* updated paths, explored prev-app features
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Features - family - test

* Features - family - aggregate

* Features - family - aggregate 2

* Features - family - aggregate 3

* Features - family - aggregate 4

* Update pipeline_config.py
…#129)

* new previous application features

* Data cleaning

* update application notebook

* credit card cleaning

* Data cleaning - groupby agg

* Include suggested changes
* new previous application features

* Data cleaning

* update application notebook

* credit card cleaning

* Data cleaning - groupby agg

* Include suggested changes

* Fix
* added fraction features to eda and feature extraction, updated configs

* updated hyperparams
* age/employment dummies (minerva-ml#104)

* added diff features

* New handcrafted features (minerva-ml#102)

* Dynamic features

* Smart features (minerva-ml#61)

* Update README.md

* Update README.md

* Update

* Smart features update

* More descriptive transformer name

* Reading all data in main

* More application features

* Transformer for cleaning

* Multiinput data dictionary

* Fix (minerva-ml#63)

* fixed configs

* dropped redundand steps, moved stuff to cleaning, refactored groupby (minerva-ml#64)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Fix format string

* Update pipeline_manager.py

clipped prediction -> prediction

* added stratified kfold option (minerva-ml#77)

* Update config (minerva-ml#79)

* dropped redundand steps, moved stuff to cleanining, refactored groupby

* restructured, added stacking + CV

* Update pipeline_config.py

* Dev review (minerva-ml#81)

* dropped feature by type split, refactored pipleine_config

* dropped feature by type split method

* explored application features

* trash

* reverted refactor of aggs

* fixed/updated bureau features

* cleared notebooks

* agg features added to notebook bureau

* credit card cleaned

* added other feature notebooks

* added rank mean

* updated model arch

* reverted to old params

* fixed rank mean calculations

* ApplicationCleaning update (minerva-ml#84)

* Cleaning - application

* Clear output in notebook

* clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (minerva-ml#85)

* local trash

* External sources notebook (minerva-ml#86)

* Update

* External sources notebook

* Dev lgbm params (minerva-ml#88)

* local trash

* updated configs

* dropped comment

* updated lgb params

* Dev app agg fix (minerva-ml#90)

* dropped app_aggs

* app agg features fixed

* cleaned leftovers

* dropped fast read-in for debug

* External_sources statistics (minerva-ml#89)

* Speed-up ext_src notebook

* exernal_sources statistics

* Weighted mean and notebook fix

* application notebook update

* clear notebook output

* Fix auto submission (minerva-ml#95)

* CreditCardBalance monthly diff mean

* POSCASH remaining installments

* POSCASH completed_contracts

* notebook update

* Resolve conflicts

* Fix

* Update neptune.yaml

* Update neptune_random_search.yaml

* Split static and dynamic features - credit card balance

* Dev nan count (minerva-ml#105)

* added nan_count

* added nan count with parameter

* Dev fe installments (minerva-ml#106)

* added simple features, parallel groupby, last-installment features

* refactored last_installment features

* added features for the very last installment

* Dev fe instalments dynamic (minerva-ml#107)

* added dynamic-trend features

* formated configs

* added skew/iqr features

* added number of credit agreement change features (minerva-ml#109)

* added number of credit agreement change features

* reverted sample size

* Dynamic features - previous application (minerva-ml#108)

* previous_application handcrafted features

* previous application cleaning

* Update neptune.yaml

* code improvement

* Update notebook

* Notebook - feature importance (minerva-ml#112)

* Dev speed up (minerva-ml#111)

* refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby)

* sped up all hand crafted

* fixed bureau worker errors

* fixed isntallment names

* fixed isntallment names

* fixed bureau and prev_app naming bugs

* reverted to vectorized where possible

* updated hyperparams

* updated early stopping params to meet convergence

* reverted to old fallback neptune file

* updated paths

* updated paths, explored prev-app features

* dropped duplicated agg

* POS_CASH added features

* POS CASH features added

* POS_CASH_balance feature cleaning

* Yaml adjustment

* Path change
'<' instead of '>'
'<' instead of '>'
* application agg cleaning

* update neptune.yaml
Karol Strzałkowski and others added 10 commits July 17, 2018 12:06
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Features - family - test

* Features - family - aggregate

* Features - family - aggregate 2

* Features - family - aggregate 3

* Features - family - aggregate 4

* Update pipeline_config.py

* Features - family - added new cols to agg

* Features - interaction features

* Features - interaction features - fix

* Added is_unbalance to configs
@kstrzala kstrzala requested review from Ninoko and jakubczakon July 20, 2018 11:29
@@ -346,10 +346,106 @@ def fit(self, bureau, **kwargs):
features['bureau_overdue_debt_ratio'] = \
features['bureau_total_customer_overdue'] / features['bureau_total_customer_debt']

features = features.merge(g, on='SK_ID_CURR', how='left')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kstrzala it seems that you are merging features wit the very same g twice.

return self

@staticmethod
def _status_to_int(status):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kstrzala generally static methods are used when you want to be able to use something as a function (likely outside of the Class it belongs to). private methods are to be used within the scope of the class. I don't think I've ever seen private static method. I would personally simply go with a private method here

new_name_chunk = '_{}by{}_fraction_'.format(short_period, long_period)
fraction_feature_name = short_feature.replace(old_name_chunk, new_name_chunk)
fraction_features[fraction_feature_name] = features[short_feature] / features[long_feature]
return fraction_features.fillna(0.0)
Copy link
Contributor

@jakubczakon jakubczakon Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kstrzala this is a decision that we should make explicitly in some fillna step and notsilently here.
Remember that lgbm is dealing with np.nans on it's own terms.
I think it is important to make this distinction. I guess this is a legacy of the safe_div method so that is my fault actually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is just kind-of imitation of safe_div, because this NaN information is stored elsewhere anyway.

@jakubczakon jakubczakon merged commit 9c2662d into minerva-ml:dev Jul 20, 2018
This was referenced Jul 23, 2018
@kstrzala kstrzala deleted the dev_time_correction branch July 26, 2018 13:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants