Machine learning algorithm implemented by python3: trying to build a clear, modular, easy-to-use-and-modify machine learning library. all the machine learning algorithms are rewrited as Class, with same and clear interface. also implement common dataset Class that can be easily used in any algorithms. As this is a simplified machine learning algorithm implement, the accuracy is not the main factor to be considered, but it can be taken as a baseline, a better acc result is possible to get by optimizing the training hyper params.
- 2019/07/08 add gbdt algorithm
- 2019/07/06 add cart reg algorithm
- 2019/07/04 add ada boost algorithm
- 2019/07/02 add random forest algorithm
- 2019/06/27 add cart algorithm
- 2019/06/26 add naive bayes algorithm
- 2019/06/25 add kdtree algorithm
- 2019/06/21 add svm algorithm
- 2019/06/15 add perceptron algorithm
- 2019/06/14 add softmax regression algorithm
- 2019/06/12 add logistic regression algorithm
- 2019/06/10 add knn regression algorithm
- 2019/06/03 reconstruct this repo
- pure python code to implement all the algorithm.
- all the algorithms integrated as Class, easy to use and modify.
- all the datasets integrated as Class, easy to use and modify.
- all the algorithms are validated on several datasets(mainly focus on sklearn exist datasets include digits dataset).
- support multi class classify by using multi-class-model_wrapper on top of two-class-classify-model.
- support training hyper-parameters modify: batch_size change, learning rate change, model save and load.
- visualization training process: log text and loss curve generation.
- detailed code explanation.
- prepare main dataset: mnist(from kaggle), other datasets have been prepared by sklearn or in ./dataset/simple/ folder.
python3 setup.sh
- train(knn/kdtree don't need to train)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
- eval a dataset(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.evaluation(test_feats, test_labels)
- test a sample(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.load(path='./softmax_reg_weight_2019-5-1_150341.pkl')
sm.predict_single([-1, 8.5])
- visualize the linear divide hyperplane(only support logistic_reg/perceptron)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_points_line()
- visualize the predict boundary(support all models)
from core.softmax_reg_lib import SoftmaxReg
sm = SoftmaxReg(feats, labels)
sm.train()
sm.vis_boundary()
- save model(support all models)
sm.save('save_folder_path')
- load model(support all models)
sm.load('model_path')
feature:
- no model weight
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features.
test code: test_knn.
source code: knn_reg_lib.
feature:
- with model weight(n_feat+1, 1).
- only support two-classes-classification.
- support linear separable features.
test code: test_logistic_reg.
source code: logistic_reg_lib.
feature:
- with model weight(n_feat+1, n_class).
- support two-classes-classification and multi-classes-classification.
- support linear separable features.
test code: test_softmax_reg.
source code: softmax_reg_lib.
feature:
- with model weight(n_feat+1, 1).
- only support two-classes-classification.
- support linear separable features.
test code: test_perceptron.
source code: perceptron_lib.
feature:
- with model weight.
- only support two-classes-classification.
- support linear separable features and nonlinear separable features.
test code: test_svm.
source code: svm_lib.
feature:
- no model weight
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features.
test code: test_knn.
source code: knn_reg_lib.
feature:
- no model weight
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features(but strongly restricted by features distribution).
- support continuous features and discrete features
test code: test_cart.
source code: cart_lib.
feature:
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features.
- support continuous features and discrete features
test code: test_decision_tree.
source code: decision_tree_lib.
feature:
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features.
- support continuous features and discrete features
test code: test_random_forest.
source code: random_forest_lib.
feature:
- only support two-classes-classification.
- support linear separable features and nonlinear separable features.
- support continuous features and discrete features
test code: test_ada_boost.
source code: ada_boost_lib.
feature:
- support two-classes-classification and multi-classes-classification.
- support linear separable features and nonlinear separable features.
- support continuous features and discrete features
test code: test_gbdt.
source code: gbdt_lib.
to be update ...
to be update ...
to be update ...
feature:
- as a wrapper to transform 2-class classifer to multi-class classifier, can be used on logistic-reg/svm/perceptron
test code: test_cart.
source code: cart_lib.
to be update ...
to be update ...
test code: test_decision_tree_regressor.
source code: decision_tree_lib.
to be update ...
to be update ...
- Machine Learning in Action, Peter Harrington
- Python Machine learning Algorithm, Zhiyong Zhao
- Statical learning method, Hang Li