WIP: Benchmarking #83

andrewosh · 2018-03-21T09:48:45Z

Hey all,

Here's an initial stab at a benchmarking system that should help us get some solid numbers. Each benchmark is performed on 4 databases, with a customizable number of trials per benchmark (the default is 5). The initial set of databases is (and perhaps we want to add to this?):

hyperdb on disk
hyperdb in memory
leveldb on disk
leveldb in memory

The initial set of benchmarks are very simple: large batch writes, many single writes, and iterations over various subsets of a large db. The database has a single writer, and is entirely local. This set will surely need to be expanded to reflect real-world use-cases.

Speaking of real-world use-cases, all the data so far is randomly generated. @mafintosh suggested a dictionary as a more realistic dataset. Any other ideas for fixtures?

At the end of benchmarking, results are dumped into CSV files in bench/stats. Here are some examples of what those look like, from a recent run:
https://github.com/andrewosh/hyperdb/blob/benchmarking-2/bench/stats/writes-random-data.csv
https://github.com/andrewosh/hyperdb/blob/benchmarking-2/bench/stats/reads-random-data.csv
(Timing is in nanoseconds, so some post-processing is required to make it readable).

A few of things of note:

I'm currently using a modified version of nanobench because I started abusing it and I'm unsure if the changes I made should be reflected upstream. Before merging, that dependency (on my nanobench fork) will have to be changed.
The generated prefixes in the current read tests (and reflected in the above benchmarks) aren't yet split into path components -- oops. Unsure if this will affect performance, but worth noting.
Currently the maximum number of keys for any benchmark is 100k, since after that I'm getting consistent heap memory errors in the batch write.

mafintosh · 2018-05-21T13:52:48Z

package.json

@@ -20,14 +20,20 @@
    "varint": "^5.0.0"
  },
  "devDependencies": {
+    "@andrewosh/nanobench": "^2.2.0",


there are two nanobench in the deps

mafintosh · 2018-05-21T13:53:24Z

@andrewosh whats missing for landing this? would be a cool addition

andrewosh added 17 commits March 17, 2018 16:09

Starting to add benchmarking system

1d7edb1

Progress setting up benchmarking system

1cbbe0a

Resolved merge conflict

65adf73

Added benchmarking for writes and random iterations over random data.

c9eb4f7

Added temp nanobench package

fd8e60f

Added npm bench script

0c1c950

Temporarily changed nanobench import

8018198

Style fixes

42401d4

Update # of trials to see variance

63a660a

Add initial benchmarking csvs

a7dd828

Update writes-random-data.csv

8091b0d

Update reads-random-data.csv

7426fda

Correct stats in CSVs

a5ff60d

Fixed merge issue

4a3eda4

Ensure that all trials have completed before starting next benchmark

f536af1

Resolved conflict in csvs

9bab6b8

Standardized

9ed6325

mjp0 mentioned this pull request Apr 1, 2018

Writing large quantities of data into Hyperdb can be incredibly slow #74

Open

mafintosh reviewed May 21, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Benchmarking #83

WIP: Benchmarking #83

andrewosh commented Mar 21, 2018 •

edited

Loading

mafintosh May 21, 2018

mafintosh commented May 21, 2018

WIP: Benchmarking #83

Are you sure you want to change the base?

WIP: Benchmarking #83

Conversation

andrewosh commented Mar 21, 2018 • edited Loading

mafintosh May 21, 2018

Choose a reason for hiding this comment

mafintosh commented May 21, 2018

andrewosh commented Mar 21, 2018 •

edited

Loading