- A modern compiler that supports C++11: G++ 4.7, Intel compiler 14, Clang 3.4, or Visual Studio 14 (version 12 can probably be used as well, but the project files need to be downgraded).
- 64-bit Linux is recommended, but most of our code builds on 64-bit Windows and MACOS as well.
- Only for Linux/MACOS: CMake (GNU make is also required)
- An Intel or AMD processor that supports SSE 4.2 is recommended
- Extended version of the library requires a development version of the following libraries: Boost, GNU scientific library, and Eigen3.
To install additional prerequisite packages on Ubuntu, type the following
sudo apt-get install libboost-all-dev libgsl0-dev libeigen3-dev
To compile, go to the directory similarity_search and type:
cmake .
make
To build an extended version (need extra library):
cmake . -DWITH_EXTRAS=1
make
The compiler is chosen by setting two environment variables: CXX
and CC
. In the case of GNU
C++ (version 8), you may need to type:
export CCX=g++-8 CC=gcc-8
To create makefiles for a release version of the code, type:
cmake -DCMAKE_BUILD_TYPE=Release .
If you did not create any makeles before, you can shortcut by typing:
cmake .
To create makefiles for a debug version of the code, type:
cmake -DCMAKE_BUILD_TYPE=Debug .
When makefiles are created, just type:
make
Important note: a shortcut command:
cmake .
(re)-creates makefiles for the previously created build. When you type cmake .
for the first time, it creates release makefiles. However, if you create debug
makefiles and then type cmake .
, this will not lead to creation of release makefiles!
To prevent this, you need to to delete the cmake cache and makefiles, before
running cmake. For example, you can do the following (assuming the
current directory is similarity search):
rm -rf `find . -name CMakeFiles CMakeCache.txt`
Also note that, for some reason, cmake might sometimes ignore environmental
variables CXX
and CC
. In this unlikely case, you can specify the compiler directly
through cmake arguments. For example, in the case of the GNU C++ and the
release build, this can be done as follows:
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++-8 \
-DCMAKE_GCC_COMPILER=gcc-8 CMAKE_CC_COMPILER=gcc-8 .
Finally, if cmake cannot find the Boost libraries, their location can be specified manually as follows:
export BOOST_ROOT=$HOME/boost_download_dir
You can download and install nmslib using the vcpkg dependency manager on both Windows and Mac/Linux. In a command line do the following:
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg install nmslib
The nmslib port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.
Building on Windows requires Visual Studio 2019 Express for Desktop. Some earlier versions may work as well, but we have trouble using Visual Studio 2022, which failed to initialized the working environment properly.
Generally, building python bindings/binaries from the command line is pretty straightforward. It requires:
- Initializing the working environment by calling, e.g.
"c:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
. - Installing necessary python packages and running
python setup.py build_ext
: see AppVeyor config file for details.
AppVeyor builds only Python bindings. To create other binaries, one needs CMake for Windows. First, generate Visual Studio solution file for 64 bit architecture using CMake:
cmake -G "Visual Studio 16 2019" -A x64
Note that you have to specify both the platform and the version of Visual Studio. Then, the generated solution can be built using Visual Studio.
If building on Windows is not working for some reasion, please, also try using vcpkg as explained above.
We have two main testing utilities bunit
and test_integr
(experiment.exe
and
test_integr.exe
on Windows).
Both utilities accept the single optional argument: the name of the log file.
If the log file is not specified, a lot of informational messages are printed to the screen.
The bunit
verifies some basic functitionality akin to unit testing.
In particular, it checks that an optimized version of the, e.g., Eucledian, distance
returns results that are very similar to the results returned by unoptimized and simpler version.
The utility bunit
is expected to always run without errors.
The utility test_integr
runs complete implementations of many methods
and checks if several effectiveness and efficiency characteristics
meet the expectations.
The expectations are encoded as an array of instances of the class MethodTestCase
(see the code here).
For example, we expect that the recall falls in a certain pre-recorded range.
Because almost all our methods are randomized, there is a great deal of variance in the observed performance characteristics. Thus, some tests may fail infrequently, if e.g., the actual recall value is slightly lower or higher than an expected minimum or maximum. From an error message, it should be clear if the discrepancy is substantial, i.e., something went wrong, or not, i.e., we observe an unlikely outcome due to randomization.