RSVM is a Ruby gem to perform Support Vector Machine classification and regresion in Ruby. It is FFI wrapper of libsvm.
problem = Svm::Problem.new
# These are the training samples. The first element in each array is the label
# for the sample, the rest is the sample coordinates.
problem.data = [
[1, 1, 0, 1],
[-1, -1, 0, -1]
]
# Generate a model from this problem
model = problem.generate_model(:kernel_type => :linear, :c => 10)
# And make predictions
model.predict([-1, 0, -1]) # - 1
model.predict([1, 0, 1]) # 1
# Models can be saved to a file
model.save(file.path)
loaded_model = Svm::Model.load(file.path)
loaded_model.predict([-1, 0, -1]) # -1
loaded_model.predict([1, 0, 1]) # 1
csv_path = File.join(File.dirname(__FILE__), '..', 'fixtures', 'heart_scale.csv')
problem = Svm::Problem.load_from_csv(csv_path)
For the Support Vector Machine to perform well the features in the samples data must be of the same order of magnitude. RSVM can scale your data linearly to the [-1, 1] range.
data = [
[1, 12.0, -7.6, 100_000, 0],
[2, 30.0, 0, -100_000, 0],
[3, 36.0, 7.6, 0, 0]
]
problem = Svm::Problem.new(data, scale: true)
You can also estimate probabilities for the diferent labels.
problem.estimate_probabilities = true
model = problem.generate_model
sample = [60.0, 1.0, 3.0, 140.0, 185.0, 0.0, 2.0, 155.0, 0.0, 3.0, 2.0, 0.0, 3.0]
probs = model.predict_probabilities(sample)
# Return a hash with the probabilities associated with the sample
# {1=>0.4443737921739047, -1=>0.5556262078260953}
If you are not sure which parameters to use in your problem RSVM can do a simple grid search to find the parameters that perform better doing crossvalidation.
problem = Svm::Problem.load_from_csv(UNBALANCED_CSV)
n_folds = 3
# This will perform a grid search using each combination with c from 2^1 up to 2^14
# and gamma from 2^-13 up to 2^-1. For each combination it will use crossvalidation
# using 3 folds.
options = problem.find_best_parameters(n_folds)
# Result:
# {:c=>64, :gamma=>(1/128)}