[0.29.0]
- Add transformer class for Sparse Principal Component Analysis.
- SparsePCA
- Add transformer class for Local Tangent Space Alignment.
- LocalTangentSpaceAlignment
- No changes, or minor changes in configuration files.
- No changes, or minor changes using RuboCop.
[0.28.1]
- Add transformer classes for Hessian Eigenmaps.
- HessianEigenmaps
- No changes, or minor changes using RuboCop.
[0.28.0]
Breaking change
- Rewrite native exntension codes with C++.
- Reimplements stop_growing? private method in DecisionTreeRegressor with native extension.
- Add classifier and regressor classes for Radial Basis Function (RBF) Network.
- RBFClassifier
- RBFRegressor
- Add classifier and regressor classes for Random Vector Functinal Link (RVFL) Network.
- RVFLClassifier
- RVFLRegressor
- No changes, minor changes in configuration files, or minor refactoring using RuboCop.
- Add
partial_fit
method to SGDClassifier and SGDRegressor.- It performs 1-epoch of stochastic gradient descent. It only supports binary labels and single target variables.
- Remove unnecessary array generation in native extension.
- No changes, or minor changes using RuboCop.
- Add cluster analysis class for mean-shift method.
- MeanShift
- Add transformer classes for Loccally Linear Embedding and Laplacian Eigenmaps.
- LocallyLinearEmbedding
- LaplacianEigenmaps
- Add transformer class for Local Fisher Discriminant Analysis.
- LocalFisherDiscriminantAnalysis
- No changes, or only slight changes to configuration files.
Breaking change
- Add new SGDClassfier and SGDRegressor by extracting stochastic gradient descent solver from each linear model.
- Change the optimization method of ElasticNet and Lasso to use the coordinate descent algorithm.
- Change the optimization method of SVC and SVR to use the L-BFGS method.
- Change the loss function of SVC to the squared hinge loss.
- Change the loss function of SVR to the squared epsilon-insensitive loss.
- Change not to use random vector for initialization of weights.
- From the above changes, keyword arguments such as learning_rate, decay, momentum, batch_size, and random_seed for LinearModel estimators have been removed.
- Fix the column and row vectors of weight matrix are reversed in LinearRegression, Ridge, and NNLS.
- Fix missing require method to load Rumale::Utils in PCA class. It is needed to initialize the principal components when optimizing with fixed-point algorithm.
- Apply automatic correction for Style/ZeroLengthPredicate of RuboCop to ROCAUC class.
- No changes, or only modifications in test code or configuration.
- Divided into gems for each machine learning algorithm, with Rumale as the meta-gem.
- Changed the license of Rumale to the 3-Caluse BSD License.
- Fix build failure with Xcode 14 and Ruby 3.1.x.
Rumale project will be rebooted on version 0.24.0. This version is probably the last release of the series starting with version 0.8.0.
- Refactor some codes and configs.
- Deprecate VPTree class.
- Fix all estimators to return inference results in a contiguous narray.
- Fix to use until statement instead of recursive call on apply methods of tree estimators.
- Rename native extension files.
- Introduce clang-format for native extension codes.
- Change automalically selected solver from sgd to lbfgs in
LinearRegression and
Ridge.
- When given 'auto' to solver parameter, these estimator select the 'svd' solver if Numo::Linalg is loaded. Otherwise, they select the 'lbfgs' solver.
- Add transformer class for calculating kernel matrix.
- Add classifier class based on Ridge regression.
- Add supported kernel functions to Nystroem.
- Add parameter for specifying the number of features to load_libsvm_file.
- Add classifier and regressor classes for voting ensemble method.
- Refactor some codes.
- Fix some typos on API documentation.
- Add regressor class for non-negative least square method.
- Add lbfgs solver to Ridge and LinearRegression.
- In version 0.23.0, these classes will be changed to attempt to optimize with 'svd' or 'lbfgs' solver if 'auto' is given to the solver parameter. If you use 'sgd' solver, you need specify it explicitly.
- Add GC guard to native extension codes.
- Update API documentation.
- Add classifier and regressor classes for stacking method.
- Refactor some codes with Rubocop.
- Add transfomer class for MLKR, that implements Metric Learning for Kernel Regression.
- Refactor NeighbourhoodComponentAnalysis.
- Update API documentation.
- Add lbfgsb.rb gem to runtime dependencies. Rumale uses lbfgsb gem for optimization. This eliminates the need to require the mopti gem when using NeighbourhoodComponentAnalysis.
- Add lbfgs solver to LogisticRegression and make it the default solver.
- Change the default value of max_iter argument on LinearModel estimators to 1000.
- Fix to use automatic solver of PCA in NeighbourhoodComponentAnalysis.
- Refactor some codes with Rubocop.
- Update README.
- Add cross-validator class for time-series data.
- Add cross-validator classes that split data according group labels.
- Fix fraction treating of the number of samples on shuffle split cross-validator classes.
- Refactor some codes with Rubocop.
- Delete deprecated estimators such as PolynomialModel, Optimizer, and BaseLinearModel.
- Add preprocessing class for Binarizer
- Add preprocessing class for MaxNormalizer
- Refactor some codes with Rubocop.
- Fix L2Normalizer to avoid zero divide.
- Add preprocssing class for L1Normalizer.
- Add transformer class for TfidfTransformer.
- Add cluster analysis class for mini-batch K-Means.
- Fix some typos.
- Change mmh3 and mopti gem to non-runtime dependent library.
- The mmh3 gem is used in FeatureHasher.
You only need to require mmh3 gem when using FeatureHasher.
require 'mmh3' require 'rumale' encoder = Rumale::FeatureExtraction::FeatureHasher.new
- The mopti gem is used in NeighbourhoodComponentAnalysis.
You only need to require mopti gem when using NeighbourhoodComponentAnalysis.
require 'mopti' require 'rumale' transformer = Rumale::MetricLearning::NeighbourhoodComponentAnalysis.new
- The mmh3 gem is used in FeatureHasher.
You only need to require mmh3 gem when using FeatureHasher.
- Change the default value of solver parameter on PCA to 'auto'. If Numo::Linalg is loaded, 'evd' is selected for the solver, otherwise 'fpt' is selected.
- Deprecate PolynomialModel, Optimizer, and the estimators contained in them. They will be deleted in version 0.20.0.
- Many machine learning libraries do not contain factorization machine algorithms, they are provided by another compatible library. In addition, there are no plans to implement estimators in PolynomialModel. Thus, the author decided to deprecate PolynomialModel.
- Currently, the Optimizer classes are only used by PolynomialModel estimators. Therefore, they have been deprecated together with PolynomialModel.
- Fix to convert target_name to string array in classification_report method.
- Refactor some codes with Rubocop.
- Fix some configuration files.
- Update API documentation.
- Add functions for calculation of cosine similarity and distance to Rumale::PairwiseMetric.
- Refactor some codes with Rubocop.
- Fix API documentation on KNeighborsRegressor
- Refector rbf_kernel method.
- Delete unneeded marshal dump and load methods. The deletion work is complete.
- Change file composition of naive bayes classifiers.
- Add classifier class for ComplementNaiveBayes.
- Add classifier class for NegationNaiveBayes.
- Add module function for calculating confusion matrix.
- Delete unneeded marshal dump and load methods.
- Add module function for generating summary of classification performance.
- Delete marshal dump and load methods for documentation. The marshal methods are written in estimator classes for indicating on API documentation that the learned model can be saved with Marshal. Even without these methods, Marshal can save the learned model, so they are deleted sequentially.
- Add transformer class for FisherDiscriminantAnalysis.
- Add transformer class for NeighbourhoodComponentAnalysis.
- Add module function for hold-out validation.
- Add pipeline class for FeatureUnion.
- Fix to use mmh3 gem for generating hash value on FeatureHasher.
- Add transformer class for kernel approximation with Nystroem method.
- Delete array validation on Pipeline class considering that array of hash is given to HashVectorizer.
- Add transformer class for PolynomialFeatures
- Add verbose and tol parameter to FactorizationMachineClassifier and FactorizationMachineRegressor
- Fix bug that factor elements of Factorization Machines estimators are not learned caused by initializing factors to zero.
- Fix all linear model estimators to use the new abstract class (BaseSGD) introduced in version 0.16.1. The major differences from the old abstract class are that the optimizer of LinearModel estimators is fixed to mini-batch SGD with momentum term, the max_iter parameter indicates the number of epochs instead of the maximum number of iterations, the fit_bias parameter is true by default, and elastic-net style regularization can be used. Note that there are additions and changes to hyperparameters. Existing trained linear models may need to re-train the model and adjust the hyperparameters.
- Change the default value of solver parameter on LinearRegression and Ridge to 'auto'. If Numo::Linalg is loaded, 'svd' is selected for the solver, otherwise 'sgd' is selected.
- The meaning of the
max_iter
parameter of the factorization machine estimators has been changed from the maximum number of iterations to the number of epochs.
- Add regressor class for ElasticNet.
- Add new linear model abstract class.
- In version 0.17.0, all LinearModel estimators will be changed to use this new abstract class. The major differences from the existing abstract class are that the optimizer of LinearModel estimators is fixed to mini-batch SGD with momentum term, the max_iter parameter indicates the number of epochs instead of the maximum number of iterations, the fit_bias parameter is true by default, and elastic-net style regularization can be used.
- The meaning of the
max_iter
parameter of the multi-layer perceptron estimators has been changed from the maximum number of iterations to the number of epochs. The number of epochs is how many times the whole data is given to the training process. As a future plan, similar changes will be applied to other estimators used stochastic gradient descent such as SVC and Lasso.
- Add feature extractor classes:
- Fix to suppress deprecation warning about keyword argument in Ruby 2.7.
- Add metric parameter that specifies distance metric to KNeighborsClassifier and KNeighborsRegressor.
- Add algorithm parameter that specifies nearest neighbor search algorithm to KNeighborsClassifier and KNeighborsRegressor.
- Add nearest neighbor search class with vantage point tree.
- Fix documents of GradientBoosting, RandomForest, and ExtraTrees.
- Refactor gaussian mixture clustering with Rubocop.
- Refactor specs.
- Refactor extension codes of decision tree estimators.
- Refactor specs.
- Fix bug where MDS optimization is not performed when tol paremeter is given.
- Refactor specs.
- Add classifier and regressor class with multi-layer perceptron.
- Refactor specs.
- Change predict method of SVC, LogisticRegression, and FactorizationMachineClassifier classes to return the original label instead of -1 or 1 labels when binary classification problem.
- Fix hyperparameter validation to check if the type of given value is Numeric type.
- Fix array validation for samples, labels, and target values to accept Ruby Array.
require 'rumale'
samples = [[-1, 1], [1, 1], [1, -1], [-1, -1]]
labels = [0, 1, 1, 0]
svc = Rumale::LinearModel::SVC.new(reg_param: 1, batch_size: 1, random_seed: 1)
svc.fit(samples, labels)
svc.predict([[-1, 0], [1, 0]])
# => Numo::Int32#shape=[2]
# [0, 1]
- Add module function for generating artificial dataset with gaussian blobs.
- Add documents about Rumale::SVM.
- Refactor specs.
- Add some evaluator classes for clustering.
- Add transformer class for Factor Analysis.
- Add covariance_type parameter to Rumale::Clustering::GaussianMixture.
- Add cluster analysis class for HDBSCAN.
- Add cluster analysis class for spectral clustering.
- Refactor power iteration clustering.
- Several documentation improvements.
- Add transformer class for Kernel PCA.
- Add regressor class for Kernel Ridge.
- Add preprocessing class for label binarization.
- Fix to use LabelBinarizer instead of OneHotEncoder.
- Fix bug that OneHotEncoder leaves elements related to values that do not occur in training data.
- Add class for Shared Neareset Neighbor clustering.
- Add function for calculation of manhattan distance to Rumale::PairwiseMetric.
- Add metric parameter that specifies distance metric to Rumale::Clustering::DBSCAN.
- Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::LinearRegression.
- Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::Ridge.
- Fix bug that the ndim of NArray of 1-dimensional principal components is not 1.
- Introduce Numo::Linalg to use linear algebra algorithms on the optimization.
- Add the solver parameter that specifies the optimization algorithm to Rumale::Decomposition::PCA.
require 'rumale'
# Loading Numo::Linalg enables features based on linear algebra algorithms.
require 'numo/linalg/autoloader'
decomposer = Rumale::Decomposition::PCA.new(n_components: 2, solver: 'evd')
low_dimensional_samples = decomposer.fit_transform(samples)
- Add class for K-Medoids clustering.
- Fix extension codes of decision tree regressor for using Numo::NArray.
- Fix bug that fails to build and install on Windows again. Fix extconf to add Numo::NArray libraries to $lib.
- Fix bug that fails to build and install on Windows. Add search for Numo::NArray static library path to extconf.
- Fix extension codes of decision tree classifier and gradient tree regressor for using Numo::NArray.
- Fix random number generator initialization on gradient boosting estimators to obtain the same result with and without parallel option.
- Add class for multidimensional scaling.
- Fix parameter description on artificial dataset generation method.
- Add class for Power Iteration clustering.
- Add classes for artificial dataset generation.
- Add class for cluster analysis with Gaussian Mixture Model.
- Add encoder class for categorical features.
- Refactor kernel support vector classifier.
- Refactor random sampling on tree estimators.
- For reproductivity, Rumale changes to not repeatedly use the same random number generator in the same estimator. In the training phase, estimators use a copy of the random number generator created in the initialize method. Even with the same algorithm and the same data, the order of random number generation may make slight differences in learning results. By this change, even if the fit method is executed multiple times, the same learning result can be obtained if the same data is given.
svc = Rumale::LinearModel::SVC.new(random_seed: 0)
svc.fit(x, y)
a = svc.weight_vec
svc.fit(x, y)
b = svc.weight_vec
err = ((a - b)**2).mean
# In version 0.11.0 or earlier, false may be output,
# but from this version, true is always output.
puts(err < 1e-4)
- Introduce Parallel gem to improve execution speed for one-vs-the-rest and bagging methods.
- Add the n_jobs parameter that specifies the number of jobs for parallel processing in some estimators belong to the Rumale::LinearModel, Rumale::PolynomialModel, and Rumale::Ensemble.
- The n_jobs parameter is valid only when parallel gem is loaded.
require 'rumale'
require 'parallel'
svc = Rumale::LinearModel::SVC.new(n_jobs: -1)
- Add class for t-distributed Stochastic Neighborhood Embedding.
- Fix bug of zero division on min-max scaling class.
- Add class for Gradient tree boosting classifier.
- Add class for Gradient tree boosting regressor.
- Add class for discretizing feature values.
- Refactor extra-trees estimators.
- Refactor decision tree base class.
- Fix some typos on document (#6).
- Add class for Extra-Trees classifier.
- Add class for Extra-Trees regressor.
- Refactor extension modules of decision tree estimators for improving performance.
- Decide to introduce Ruby extensions for improving performance.
- Fix to find split point on decision tree estimators using extension modules.
- Remove unused parameter on Nadam.
- Fix condition to stop growing tree about decision tree.
- Add optimizer class for AdaGrad.
- Add evaluator class for ROC AUC.
- Add class for scaling with maximum absolute value.
- Add class for Adam optimizer.
- Add data splitter classes for random permutation cross validation.
- Add accessor method for number of splits to K-fold splitter classes.
- Add execution result of example script on README (#3).
- Add some evaluator classes.
- MeanSquaredLogError
- MedianAbsoluteError
- ExplainedVarianceScore
- AdjustedRandScore
- MutualInformation
- Refactor normalized mutual infomation evaluator.
- Fix typo on document (#2).
- Rename SVMKit to Rumale.
- Rename SGDLienareEstimator class to BaseLienarModel class.
- Add data type option to load_libsvm_file method. By default, the method represents the feature with Numo::DFloat.
- Refactor factorization machine estimators.
- Refactor decision tree estimators.
- Add class for grid search performing hyperparameter optimization.
- Add argument validations to Pipeline.
- Add class for Pipeline that constructs chain of transformers and estimators.
- Fix some typos on document (#1).
- Fix to use CSV class in parsing libsvm format file.
- Refactor ensemble estimators.
- Add class for AdaBoost classifier.
- Add class for AdaBoost regressor.
- Fix bug on setting random seed and max_features parameter of Random Forest estimators.
- Refactor decision tree classes for improving performance.
- Add abstract class for linear estimator with stochastic gradient descent.
- Refactor linear estimators to use linear esitmator abstract class.
- Refactor decision tree classes to avoid unneeded type conversion.
- Add class for Principal Component Analysis.
- Add class for Non-negative Matrix Factorization.
- Add class for DBSCAN clustering.
- Fix bug on class probability calculation of DecisionTreeClassifier.
- Add class for K-Means clustering.
- Add class for evaluating purity.
- Add class for evaluating normalized mutual information.
- Add class for linear regressor.
- Add class for SGD optimizer.
- Add class for RMSProp optimizer.
- Add class for YellowFin optimizer.
- Fix to be able to select optimizer on estimators of LineaModel and PolynomialModel.
SVMKit introduces optimizer algorithm that calculates learning rates adaptively on each iteration of stochastic gradient descent (SGD). While Pegasos SGD runs fast, it sometimes fails to optimize complicated models like Factorization Machine. To solve this problem, in version 0.3.3, SVMKit introduced optimization with RMSProp on FactorizationMachineRegressor, Ridge and Lasso. This attempt realized stable optimization of those estimators. Following the success of the attempt, author decided to use modern optimizer algorithms with all SGD optimizations in SVMKit. Through some preliminary experiments, author implemented Nadam as the default optimizer. SVMKit plans to add other optimizer algorithms sequentially, so that users can select them.
- Fix to use Nadam for optimization on SVC, SVR, LogisticRegression, Ridge, Lasso, and Factorization Machine estimators.
- Combine reg_param_weight and reg_param_bias parameters on Factorization Machine estimators into the unified parameter named reg_param_linear.
- Remove init_std paramter on Factorization Machine estimators.
- Remove learning_rate, decay, and momentum parameters on Ridge, Lasso, and FactorizationMachineRegressor.
- Remove normalize parameter on SVC, SVR, and LogisticRegression.
- Add class for Ridge regressor.
- Add class for Lasso regressor.
- Fix bug on gradient calculation of FactorizationMachineRegressor.
- Fix some documents.
- Add class for Factorization Machine regressor.
- Add class for Decision Tree regressor.
- Add class for Random Forest regressor.
- Fix to support loading and dumping libsvm file with multi-target variables.
- Fix to require DecisionTreeClassifier on RandomForestClassifier.
- Fix some mistakes on document.
- Fix bug on decision function calculation of FactorizationMachineClassifier.
- Fix bug on weight updating process of KernelSVC.
- Add class for Support Vector Regression.
- Add class for K-Nearest Neighbor Regression.
- Add class for evaluating coefficient of determination.
- Add class for evaluating mean squared error.
- Add class for evaluating mean absolute error.
- Fix to use min method instead of sort and first methods.
- Fix cross validation class to be able to use for regression problem.
- Fix some typos on document.
- Rename spec filename for Factorization Machine classifier.
- Add predict_proba method to SVC and KernelSVC.
- Add class for evaluating logarithmic loss.
- Add classes for Label- and One-Hot- encoding.
- Add some validator.
- Fix bug on training data score calculation of cross validation.
- Fix fit method of SVC for performance.
- Fix criterion calculation on Decision Tree for performance.
- Fix data structure of Decision Tree for performance.
- Fix bug on gradient calculation of Logistic Regression.
- Fix to change accessor of params of estimators to read only.
- Add parameter validation.
- Fix to support multiclass classifiction into LinearSVC, LogisticRegression, KernelSVC, and FactorizationMachineClassifier
- Add class for Decision Tree classifier.
- Add class for Random Forest classifier.
- Fix to use frozen string literal.
- Refactor marshal dump method on some classes.
- Introduce Coveralls to confirm test coverage.
- Add classes for Naive Bayes classifier.
- Fix decision function method on Logistic Regression class.
- Fix method visibility on RBF kernel approximation class.
- Add class for Factorization Machine classifier.
- Add classes for evaluation measures.
- Fix the method for prediction of class probability in Logistic Regression.
- Add class for cross validation.
- Add specs for base modules.
- Fix validation of the number of splits when a negative label is given.
- Add data splitter classes for K-fold cross validation.
- Add class for K-nearest neighbors classifier.
- Migrated the linear algebra library to Numo::NArray.
- Add module for loading and saving libsvm format file.
- Add class for Kernel Support Vector Machine with Pegasos algorithm.
- Add module for calculating pairwise kernel fuctions and euclidean distances.
- Add the function learning a model with bias term to the PegasosSVC and LogisticRegression classes.
- Rewrite the document with yard notation.
- Add class for Logistic Regression with SGD optimization.
- Fix some mistakes on the document.
- Add basic classes.
- Add an utility module.
- Add class for RBF kernel approximation.
- Add class for Support Vector Machine with Pegasos alogrithm.
- Add class that performs mutlclass classification with one-vs.-rest strategy.
- Add classes for preprocessing such as min-max scaling, standardization, and L2 normalization.