san_cus_trans_pre

Santander Customer Transaction Prediction data from a very very interesting kaggle competitions..

https://www.kaggle.com/c/santander-customer-transaction-prediction/kernels?sortBy=voteCount&group=everyone&pageSize=20&competitionId=10385

The challanges in the data:

-- Imbalanced data sets. The class distribution is 10% vs 90%.

-- High dimension: there are 200 features.

-- Interestingly, all the features are uncorrelated which shows that they were preprocessed with some kind of algorithms such as principical component analysis.

-- Shuffling observations within each feature does not affect the accuracy which indicates that the features may be categorical although they look like real numbers, probably due to the principal component analysis.

Libraries used:

Lightgbm, Keras, Sklearn, Scipy, Numpy, Pandas

Interesing Observations/Lessons learnt

-- Binning actually worked very well in this data

-- When conducting binning, test and training data was combined which created some sort of leakage. But it worked.

-- Testing data had some "unreal" data points that may impact the results. Therefore, they need to be remove. Here is the excellent kernel that removes the fake test data.

https://www.kaggle.com/yag320/list-of-fake-samples-and-public-private-lb-split

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
all training_prediction.py		all training_prediction.py
deep_learning_division.py		deep_learning_division.py
deep_learning_sing;e.py		deep_learning_sing;e.py
feature_selection.py		feature_selection.py
feature_selection_dl.py		feature_selection_dl.py
feature_selection_rfe.py		feature_selection_rfe.py
gradient_boost_no_division_for_imbalaced.py		gradient_boost_no_division_for_imbalaced.py
gradient_boost_single_grid_search.py		gradient_boost_single_grid_search.py
gradient_boost_single_grid_search_FE.py		gradient_boost_single_grid_search_FE.py
gradient_boost_trail.py		gradient_boost_trail.py
gradient_boost_trail_1_subs.py		gradient_boost_trail_1_subs.py
lsq_trail_1_subs.py		lsq_trail_1_subs.py
lsq_trail_1_subs_reverse.py		lsq_trail_1_subs_reverse.py
submission.py		submission.py
submission_single.py		submission_single.py
submisson_svm.py		submisson_svm.py
submisson_svm_batch.py		submisson_svm_batch.py
svm_all.py		svm_all.py
svm_division.py		svm_division.py
svm_grid_search.py		svm_grid_search.py
svm_trail_1_subs.py		svm_trail_1_subs.py
svm_tree_submission.py		svm_tree_submission.py
svm_trees.py		svm_trees.py
svm_trees_columsn.py		svm_trees_columsn.py
trtr.txt		trtr.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

san_cus_trans_pre

The challanges in the data:

Libraries used:

Interesing Observations/Lessons learnt

About

Uh oh!

Releases

Packages

Languages

gungor2/san_cus_trans_pre

Folders and files

Latest commit

History

Repository files navigation

san_cus_trans_pre

The challanges in the data:

Libraries used:

Interesing Observations/Lessons learnt

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages