Skip to content

Latest commit

 

History

History
68 lines (49 loc) · 4.08 KB

README.md

File metadata and controls

68 lines (49 loc) · 4.08 KB

Testbed for Learned Indexes (TLI)

TLI is a testbed to compare (learned) indexes on various datasets and workloads, and it is generally composed of three components (i.e., workload generation, hyper-parameter tuning, performance evaluation). We develop this system from the well-known SOSD framework. Besides, we use perf and pmu-tools to measure micro-architectural metrics.

Dependencies

One dependency that should be emphasized is Intel MKL, used when testing the performance of XIndex and SIndex. The detailed steps of installation can be found here.

Generally, the dependencies can be installed in the following steps.

$ cd /tmp
$ wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
$ rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB

$ sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list'
$ apt-get update
$ apt-get install -y intel-mkl-2019.0-045

$ apt -y install zstd python3-pip m4 cmake clang libboost-all-dev
$ pip3 install --user numpy scipy
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env

After the installation, the following two lines in CMakeLists.txt may require modification.

set(MKL_LINK_DIRECTORY "/opt/intel/mkl/lib/intel64")
set(MKL_INCLUDE_DIRECTORY "/opt/intel/mkl/include")

Running the testbed

We provide a number of scripts to automate things. Each is located in the scripts directory, but should be executed from the repository root.

  • ./scripts/download.sh downloads and stores required data from the Internet
  • ./scripts/build_rmis.sh compiles and builds the RMIs for each dataset. If you run into the error message error: no override and no default toolchain set, try running rustup install stable.
  • ./scripts/download_rmis.sh will download pre-built RMIs instead, which may be faster. You'll need to run build_rmis.sh if you want to measure build times on your platform.
  • ./scripts/prepare.sh constructs the single-thread workloads and compiles the testbed, and ./scripts/prepare_multithread.sh for concurrency workloads.
  • ./scripts/execute.sh, execute_latency.sh, execute_errors.sh, execute_perf.sh executes the testbed on single-thread workloads, storing the results in results, and ./scripts/execute_multithread.sh for concurrency workloads.

Build times can be long, as we make aggressive use of templates to ensure we do not accidentally measure vtable lookup time.

Results

The results in results/through-results are obtained in single-thread workloads, results/multithread-results in concurrency workloads, results/string-results for string indexes. They are shown in the following format.

(index name) (bulk loading time) (index size) (throughput) (hyper-parameters)

The results in results/latency-results are obtained measuring latencies in single-thread workload, and are shown in the following format.

(index name) (bulk loading time) (index size) (average, P50, P99, P99.9, max, standard derivation of latency) (hyper-parameters)

The results in results/errors-results are obtained measuring position searches, and are shown in the following format.

(index name) (bulk loading time) (index size) (average, P50, P99, P99.9, max, standard derivation of latency) (average position search overhead) (position search latency per operation) (average prediction error) (hyper-parameters)

The filenames of csvs in results mainly comply with the following rule.

{dataset}_ops_{operation count}_{range query ratio}_{negative lookup ratio}_{insert ratio}_({insert pattern}_)({hotspot ratio}_)({thread number}_)(mix_)({loaded block number}_)({bulk-loaded data size}_)results_table.csv

The results in results/perf-results are obtained measuring micro-architectural metrics.