-
Notifications
You must be signed in to change notification settings - Fork 170
Conversation
* Dynamic features * Smart features (minerva-ml#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (minerva-ml#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (minerva-ml#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (minerva-ml#77) * Update config (minerva-ml#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (minerva-ml#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (minerva-ml#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (minerva-ml#85) * local trash * External sources notebook (minerva-ml#86) * Update * External sources notebook * Dev lgbm params (minerva-ml#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (minerva-ml#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (minerva-ml#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (minerva-ml#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance
* added nan_count * added nan count with parameter
* added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment
* added dynamic-trend features * formated configs * added skew/iqr features
* added number of credit agreement change features * reverted sample size
* previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook
* refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features
* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py
…#129) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes
* new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Fix
* added fraction features to eda and feature extraction, updated configs * updated hyperparams
* age/employment dummies (minerva-ml#104) * added diff features * New handcrafted features (minerva-ml#102) * Dynamic features * Smart features (minerva-ml#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (minerva-ml#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (minerva-ml#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (minerva-ml#77) * Update config (minerva-ml#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (minerva-ml#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (minerva-ml#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (minerva-ml#85) * local trash * External sources notebook (minerva-ml#86) * Update * External sources notebook * Dev lgbm params (minerva-ml#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (minerva-ml#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (minerva-ml#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (minerva-ml#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (minerva-ml#105) * added nan_count * added nan count with parameter * Dev fe installments (minerva-ml#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (minerva-ml#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (minerva-ml#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (minerva-ml#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (minerva-ml#112) * Dev speed up (minerva-ml#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change
'<' instead of '>'
'<' instead of '>'
* application agg cleaning * update neptune.yaml
* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * Features - family - added new cols to agg * Features - interaction features * Features - interaction features - fix * Added is_unbalance to configs
src/feature_extraction.py
Outdated
@@ -346,10 +346,106 @@ def fit(self, bureau, **kwargs): | |||
features['bureau_overdue_debt_ratio'] = \ | |||
features['bureau_total_customer_overdue'] / features['bureau_total_customer_debt'] | |||
|
|||
features = features.merge(g, on='SK_ID_CURR', how='left') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kstrzala it seems that you are merging features wit the very same g twice.
src/feature_extraction.py
Outdated
return self | ||
|
||
@staticmethod | ||
def _status_to_int(status): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kstrzala generally static methods are used when you want to be able to use something as a function (likely outside of the Class it belongs to). private methods are to be used within the scope of the class. I don't think I've ever seen private static method. I would personally simply go with a private method here
src/feature_extraction.py
Outdated
new_name_chunk = '_{}by{}_fraction_'.format(short_period, long_period) | ||
fraction_feature_name = short_feature.replace(old_name_chunk, new_name_chunk) | ||
fraction_features[fraction_feature_name] = features[short_feature] / features[long_feature] | ||
return fraction_features.fillna(0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kstrzala this is a decision that we should make explicitly in some fillna step and notsilently here.
Remember that lgbm is dealing with np.nans on it's own terms.
I think it is important to make this distinction. I guess this is a legacy of the safe_div
method so that is my fault actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is just kind-of imitation of safe_div, because this NaN information is stored elsewhere anyway.
Code contributions
Sorry that there are two major issues in one PR, I will try to improve in the future!