GitHub - moygit/c_rbf: C port of Random Binary Forest (nearest neighbors)

A "random binary forest" is a hybrid between kd-trees and random forests. For nearest neighbors this ends up being similar to Minhash Forests and to Spotify's annoy library.

We build a forest of roughly-binary search trees, with each tree being built as follows: pick a random subset of features at each split, look for the "best" feature (see below), split on that feature, and then recurse.

We want the split to be close to the median for the best search speeds (as this will give us trees that are almost binary), but we want to maximize variance for accuracy-optimization (e.g. if we have two features A = [5, 5, 5, 6, 6, 6] and B = [0, 0, 0, 10, 10, 10], then we want to choose B so that noisy data is less likely to fall on the wrong side of the split).

These two goals can conflict, so right now we just use a simple split function that splits closest to the median. This has the added advantage that you don't need to normalize features to have similar distributions.

We have another split function that takes variance into account, but this is currently unused.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Makefile		Makefile
README.md		README.md
README.txt		README.txt
_rbf_query.h		_rbf_query.h
_rbf_train.h		_rbf_train.h
_rbf_utils.h		_rbf_utils.h
ctypes2.md		ctypes2.md
mnist.c		mnist.c
rbf.h		rbf.h
rbf_io.c		rbf_io.c
rbf_query.c		rbf_query.c
rbf_test.c		rbf_test.c
rbf_test.check		rbf_test.check
rbf_train.c		rbf_train.c
rbf_utils.c		rbf_utils.c
test_wrapped_rbf.py		test_wrapped_rbf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

moygit/c_rbf

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages