Skip to content

Commit

Permalink
links to benchmark results
Browse files Browse the repository at this point in the history
  • Loading branch information
dangermike committed Feb 2, 2022
1 parent 250bdb5 commit e182705
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion generic/sharded/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Generics are not as powerful as C++ templates or generics in some other language

## Performance

The reason to shard is to reduce lock contention. However, that assumes that lock contention is the problem. The purpose of LazyLRU was to reduce exclusive locks, and thus lock contention. Benchmarks were run on my laptop (8-core MacBook Pro M1 14" 2021) and on a Google Cloud N2 server (N2 "Ice Lake" 64 cores at 2.60GHz). If we compare the unsharded vs. 64-way sharded performance with 1 thread and 64 threads, we should get some sense of the trade-offs. We will also compare oversized (guaranteed evictions), undersized (no evictions, no shuffles), and equal-sized (no evictions, shuffles) caches to see how the "lazyness" may save us some writes. Because the sharded caches use the regular LazyLRU behind the scenes, the capacity of each shard is `total/shard_count` so we aren't unfairly advantaging the sharded versions.
The reason to shard is to reduce lock contention. However, that assumes that lock contention is the problem. The purpose of LazyLRU was to reduce exclusive locks, and thus lock contention. Benchmarks were run on my [laptop (8-core MacBook Pro M1 14" 2021)](benchmark_results_macbook_pro_m1_8.txt) and on a [Google Cloud N2 server (N2 "Ice Lake" 64 cores at 2.60GHz)](benchmark_results_n2-highcpu-64_64.txt). If we compare the unsharded vs. 64-way sharded performance with 1 thread and 64 threads, we should get some sense of the trade-offs. We will also compare oversized (guaranteed evictions), undersized (no evictions, no shuffles), and equal-sized (no evictions, shuffles) caches to see how the "lazyness" may save us some writes. Because the sharded caches use the regular LazyLRU behind the scenes, the capacity of each shard is `total/shard_count` so we aren't unfairly advantaging the sharded versions.

All times in nanoseconds/operation, so lower is better. Mac testing was not done on a clean environment, so some variability is to be expected. The first set of numbers are from the Mac. The numbers after the gutter are from the server -- tables in markdown aren't great.

Expand Down

0 comments on commit e182705

Please sign in to comment.