-
Notifications
You must be signed in to change notification settings - Fork 170
Conversation
feature_extraction.py
Outdated
super().__init__() | ||
|
||
@property | ||
def application_names(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut If there is no calculation in the method why not have
python self.application_names = ['A', 'B']
feature_extraction.py
Outdated
@@ -140,3 +140,215 @@ def transform(self, X): | |||
how='left') | |||
|
|||
return {'numerical_features': X[self.groupby_aggregations_names].astype(np.float32)} | |||
|
|||
|
|||
class Application(BaseTransformer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut this name is not very descriptive. What is the operation (logic) that this transformer performs?
feature_extraction.py
Outdated
'PAYMENT_RATE'] | ||
|
||
def transform(self, X, y=None): | ||
X['DAYS_EMPLOYED'].replace(365243, np.nan, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut I think the cleane-up where we deal with missing values outliers etc should be done in a seperate step (or steps)
feature_extraction.py
Outdated
|
||
def fit(self, X): | ||
bureau = pd.read_csv(self.filepath) | ||
bureau['AMT_CREDIT_SUM'].fillna(0, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut I'd rather put NA handling in a separate transformer
feature_extraction.py
Outdated
bureau['bureau_active_loans_percentage'] = bureau.groupby( | ||
by=['SK_ID_CURR'])['bureau_credit_active_binary'].agg('mean').reset_index()['bureau_credit_active_binary'] | ||
|
||
# AVERAGE NUMBER OF DAYS BETWEEN SUCCESSIVE PAST APPLICATIONS FOR EACH CUSTOMER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut I prefer putting some logic in a method with descriptive name. That makes comments obsolete and makes code easier to read
feature_extraction.py
Outdated
] | ||
|
||
def fit(self, X): | ||
bureau = pd.read_csv(self.filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut I don't like that we read data in the fit method. It is very NOT single responsibility principle. I don't mind reading data in a separate transformer but I would rather read them in the main.py
and pass objects to pipeline
feature_extraction.py
Outdated
] | ||
|
||
def fit(self, X): | ||
credit_card = pd.read_csv(self.filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut same here
pipeline_blocks.py
Outdated
def _bureau(config, train_mode, **kwargs): | ||
if train_mode: | ||
bureau = Step(name='bureau', | ||
transformer=fe.GroupbyAggregationFromFile(**config.bureau), | ||
transformer=fe.Bureau(**config.bureau), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut BureauAggregations or BureauFeatures is more descriptive
pipelines.py
Outdated
@@ -76,6 +76,7 @@ def sklearn_main(config, ClassifierClass, clf_name, train_mode, normalize=False) | |||
cache_output=True, | |||
load_persisted_output=True) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut you can drop this line :)
@@ -140,40 +140,89 @@ def classifier_sklearn(sklearn_features, ClassifierClass, full_config, clf_name, | |||
def feature_extraction(config, train_mode, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut @jakubczakon I think that we need feature_extraction
refactor. Now, User cannot try multiple models with fewer features. In solution-1
we had 122 features, in solution-2
we have 2.5k features, and here we are adding even more. IMHO it should be parametrizable what freature-sets User want to use in their training. For example, pick only basic_features and bureau features.
pipeline_config.py
Outdated
@@ -22,6 +22,7 @@ | |||
TARGET_COLUMN = 'TARGET' | |||
|
|||
TIMESTAMP_COLUMNS = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut @jakubczakon I think we should drop in, since we do not use it.
main.py
Outdated
@@ -79,8 +79,20 @@ def _train(pipeline_name, dev_mode): | |||
if dev_mode: | |||
logger.info('running in "dev-mode". Sample size is: {}'.format(cfg.DEV_SAMPLE_SIZE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut since there is a lot of repetition here I would probably go with if dev_mode: nrows=cfg.DEV_SAMPLE else nrows=None and then just pass nrows=nrows
pipeline_blocks.py
Outdated
def _bureau(config, train_mode, **kwargs): | ||
if train_mode: | ||
bureau = Step(name='bureau', | ||
transformer=fe.GroupbyAggregationFromFile(**config.bureau), | ||
transformer=fe.BureauFeatures(**config.bureau), | ||
input_data=['input'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pknut I think it would be better to have input_data=['bureau','main'] and then in adapter X: 'main', 'X and bureau: 'bureau', 'X' . It shows the benefit of multiinput data dictionary that is passed to step in the main.py file
* Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * updated best model name * changed best model path * corrections
* Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * updated best model name * changed best model path * added groupby diff features * dropped unreasonable agg diffs
* Smart features (minerva-ml#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (minerva-ml#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (minerva-ml#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (minerva-ml#77) * Update config (minerva-ml#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (minerva-ml#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (minerva-ml#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (minerva-ml#85) * local trash * External sources notebook (minerva-ml#86) * Update * External sources notebook * Dev lgbm params (minerva-ml#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (minerva-ml#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (minerva-ml#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (minerva-ml#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix
* Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance
* age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * notebook - feature importance - small fixes (#124) * Notebook - feature importance * Notebook - feature importance - search by text * Notebook - feature importance - search by text * Notebook - feature importance - Plots description * fixed typo in feature adding (affected installments)
* age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change
* added second level models (#126) * Family features (#128) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * Data cleaning and two new features (previous application) (#129) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Data cleaning - fix (#130) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Fix * Dev fractions (#132) * added fraction features to eda and feature extraction, updated configs * updated hyperparams * Dev (#134) * age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change * fix misinterpretations '<' instead of '>' * fix misinterpretations '<' instead of '>' * Add cleaning in application_groupby_agg (#137) * application agg cleaning * update neptune.yaml * Interaction features (#139) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * Features - family - added new cols to agg * Features - interaction features * Features - interaction features - fix * Added is_unbalance to configs * updated paths, added corr prints in pos cash balance * dropped unused dependencies * updated pandas version
* age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * added second level models (#126) * POS CASH features added * Family features (#128) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * POS_CASH_balance feature cleaning * Yaml adjustment * Data cleaning and two new features (previous application) (#129) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Data cleaning - fix (#130) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Fix * Initail bureau_balance features * Dev fractions (#132) * added fraction features to eda and feature extraction, updated configs * updated hyperparams * Dev (#134) * age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change * fix misinterpretations '<' instead of '>' * fix misinterpretations '<' instead of '>' * Code cleanup * Bug fix * NaN handling * Add cleaning in application_groupby_agg (#137) * application agg cleaning * update neptune.yaml * Interaction features (#139) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * Features - family - added new cols to agg * Features - interaction features * Features - interaction features - fix * Added is_unbalance to configs * Time correction * Full time count correction * Time features correction and bureau_balance features * Bug fixing * Bug fixing
* age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * added second level models (#126) * POS CASH features added * Family features (#128) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * POS_CASH_balance feature cleaning * Yaml adjustment * Data cleaning and two new features (previous application) (#129) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Data cleaning - fix (#130) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Fix * Dev fractions (#132) * added fraction features to eda and feature extraction, updated configs * updated hyperparams * Path change * Dev (#134) * age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change * fix misinterpretations '<' instead of '>' * fix misinterpretations '<' instead of '>' * Add cleaning in application_groupby_agg (#137) * application agg cleaning * update neptune.yaml * New branch * Notebook dev * q * Sklearn models modified * Minor bug fix * Whatever * Space refactor * Old forgotten merge * Final refactor * Minor update * last k features with fraction removal * Fix PR isuuses
* age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * added second level models (#126) * POS CASH features added * Family features (#128) * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Features - family - test * Features - family - aggregate * Features - family - aggregate 2 * Features - family - aggregate 3 * Features - family - aggregate 4 * Update pipeline_config.py * POS_CASH_balance feature cleaning * Yaml adjustment * Data cleaning and two new features (previous application) (#129) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Data cleaning - fix (#130) * new previous application features * Data cleaning * update application notebook * credit card cleaning * Data cleaning - groupby agg * Include suggested changes * Fix * Dev fractions (#132) * added fraction features to eda and feature extraction, updated configs * updated hyperparams * Path change * Dev (#134) * age/employment dummies (#104) * added diff features * New handcrafted features (#102) * Dynamic features * Smart features (#61) * Update README.md * Update README.md * Update * Smart features update * More descriptive transformer name * Reading all data in main * More application features * Transformer for cleaning * Multiinput data dictionary * Fix (#63) * fixed configs * dropped redundand steps, moved stuff to cleaning, refactored groupby (#64) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Fix format string * Update pipeline_manager.py clipped prediction -> prediction * added stratified kfold option (#77) * Update config (#79) * dropped redundand steps, moved stuff to cleanining, refactored groupby * restructured, added stacking + CV * Update pipeline_config.py * Dev review (#81) * dropped feature by type split, refactored pipleine_config * dropped feature by type split method * explored application features * trash * reverted refactor of aggs * fixed/updated bureau features * cleared notebooks * agg features added to notebook bureau * credit card cleaned * added other feature notebooks * added rank mean * updated model arch * reverted to old params * fixed rank mean calculations * ApplicationCleaning update (#84) * Cleaning - application * Clear output in notebook * clenaed names in steps, refactored mergeaggregate transformer, changed caching/saving specs (#85) * local trash * External sources notebook (#86) * Update * External sources notebook * Dev lgbm params (#88) * local trash * updated configs * dropped comment * updated lgb params * Dev app agg fix (#90) * dropped app_aggs * app agg features fixed * cleaned leftovers * dropped fast read-in for debug * External_sources statistics (#89) * Speed-up ext_src notebook * exernal_sources statistics * Weighted mean and notebook fix * application notebook update * clear notebook output * Fix auto submission (#95) * CreditCardBalance monthly diff mean * POSCASH remaining installments * POSCASH completed_contracts * notebook update * Resolve conflicts * Fix * Update neptune.yaml * Update neptune_random_search.yaml * Split static and dynamic features - credit card balance * Dev nan count (#105) * added nan_count * added nan count with parameter * Dev fe installments (#106) * added simple features, parallel groupby, last-installment features * refactored last_installment features * added features for the very last installment * Dev fe instalments dynamic (#107) * added dynamic-trend features * formated configs * added skew/iqr features * added number of credit agreement change features (#109) * added number of credit agreement change features * reverted sample size * Dynamic features - previous application (#108) * previous_application handcrafted features * previous application cleaning * Update neptune.yaml * code improvement * Update notebook * Notebook - feature importance (#112) * Dev speed up (#111) * refactored aggs to calculate only once per training, sped up installment and credit card (only single index groupby) * sped up all hand crafted * fixed bureau worker errors * fixed isntallment names * fixed isntallment names * fixed bureau and prev_app naming bugs * reverted to vectorized where possible * updated hyperparams * updated early stopping params to meet convergence * reverted to old fallback neptune file * updated paths * updated paths, explored prev-app features * dropped duplicated agg * POS_CASH added features * POS CASH features added * POS_CASH_balance feature cleaning * Yaml adjustment * Path change * fix misinterpretations '<' instead of '>' * fix misinterpretations '<' instead of '>' * Add cleaning in application_groupby_agg (#137) * application agg cleaning * update neptune.yaml * New branch * Notebook dev * q * Sklearn models modified * Minor bug fix * Whatever * Space refactor * Old forgotten merge * Final refactor * Minor update * last k features with fraction removal * Fix PR isuuses * Fillna bug fix
No description provided.