Sharding for the lazylru cache #15

dangermike · 2022-01-24T05:15:04Z

Please see sharding/README.md for details.

generic/expected_stats_test.go

generic/sharded/sharded.go

ravitch · 2022-03-22T23:52:22Z

generic/sharded/sharded.go

+	for _, s := range slru.shards {
+		if s.IsRunning() {
+			return true
+		}
+	}
+	return false


Does atomicity matter here?

The individual IsRunning calls are locked internally. Since we don't need an exact count, I don't think we need a global lock. Honestly, I'm not sure anyone will ever use this function in the real world. You'd start up the cache(s), do your thing, then shut down the process -- graceful shutdown won't really be a thing. If that seemed like a more likely use case, I'd probably make the Close call block.

ravitch · 2022-03-22T23:55:51Z

generic/sharded/sharded.go

+		for k, v := range slru.shards[shardIx].MGet(skeys...) {
+			retval[k] = v
+		}


Just curious - Did you benchmark this in separate goroutines?

Yeah, the benchmarks are all concurrent. See the Run function in lazylru_benchmark_test.go.

Sorry, that wasn't clear - I think (it's been a while, haha) what I meant was did you compare performance between this way (accessing each shard serially in a loop) vs spawning a new goroutine for each shard (with appropriate locking around the map write, of course)?

(Of course, performance probably changes with the expected number of shards - if it's relatively small I wouldn't expect goroutines to be a win.)

generic/sharded/sharded.go

The current version of the cache uses `string` as the key and `interface{}` as the value. That fits the use case for which it was designed, but it is not as flexible as it could be. Go 1.18 generics created an opportunity to do change that. The [container/heap](https://pkg.go.dev/container/heap) in the standard library doesn't support generics. I'm sure it will at some point, but for now, the source code was copied from the standard libary and made generic. The `pq` and `lazylru` components were copied into a subpackage, `generic`. While the internals of the library (and especially the tests) are littered with type annotations, the external interface is pretty clean. Previously, the cache would be used like so: ```go // import "github.com/TriggerMail/lazylru" lru := lazylru.New(maxItems, ttl) lru.Set("key", "value") if v, ok := lru.Get("key"); ok { vstr, ok := v.(string) if !ok { panic("something terrible has happened") } fmt.Println(vstr) } ``` The new version is a bit cleaner: ```go // import "github.com/TriggerMail/lazylru/generic" lru := lazylru.NewT[string, string](maxItems, ttl) lru.Set("key", "value") if v, ok := lru.Get("key"); ok { fmt.Println(v) } ``` It's expected that the cache is going to be created at the start of a program and accessed many times, so the real win is the lack of casting on the `Get`. It is easy to put in a value when you mean a pointer or a pointer when you mean a value, but the generic version prevents that problem. The `panic` in the sample code above is a maybe overkill, but the caller is likely to do _something_ to deal with type. There is a performance impact to the casting, but it doesn't appear to be huge. In terms of caching performance, there was an improvement in all cases. I tested the original, interface-based implementation as well as a generic implementation of [string, interface{}] to mimic the interface type as closely as possible and a generic implementation of `[string, int]` to see what the improvement would be. Tests were run on an Apple Macbook Pro M1. An excerpt of the benchmarch is listed below: ```text 1% W, 99% R 99% W, 1% R ------------------------- ------------ ------------ interface-based 60.94 ns/op 107.80 ns/op generic[string,interface] 54.21 ns/op 87.76 ns/op generic[string,int] 53.24 ns/op 93.80 ns/op ``` * Separate interface and generic versions to allow the consumer to select the generic as a version * Make testing work under go 1.18 * golang:rc-buster image * go fmt * installing go-junit-report properly * adding test-results to .gitignore * Building with Earthly to make life easier on myself * Using revive rather than golangci-lint because golangci-lint doesn't work with go 1.18 yet * Publishing coverage results to coveralls * On interface (top-level) only because goveralls doesn't like submodules * README badges for coverage * benchmark * results from n2-highcpu-64, 64 cores * results from MacBook Pro M1 14" 2021 * thread-local math/rand - Using the shared RNG in math/rand has some locking in it that dominates the performance test at high thread counts * README updates

ravitch · 2022-07-06T14:42:10Z

generic/sharded/sharded.go

+		for k, v := range slru.shards[shardIx].MGet(skeys...) {
+			retval[k] = v
+		}


Sorry, that wasn't clear - I think (it's been a while, haha) what I meant was did you compare performance between this way (accessing each shard serially in a loop) vs spawning a new goroutine for each shard (with appropriate locking around the map write, of course)?

(Of course, performance probably changes with the expected number of shards - if it's relatively small I wouldn't expect goroutines to be a win.)

dangermike requested review from filipecaixeta, iuga, tiagohcalves and nsohn January 24, 2022 05:15

dangermike requested a review from a team as a code owner January 24, 2022 05:15

dangermike marked this pull request as draft January 24, 2022 05:16

dangermike force-pushed the dangermike/NOTICKET/sharding branch 4 times, most recently from e29db0e to 250bdb5 Compare February 2, 2022 07:00

dangermike force-pushed the dangermike/NOTICKET/go1.18 branch 3 times, most recently from 0624280 to 13c6b9c Compare March 22, 2022 00:04

Base automatically changed from dangermike/NOTICKET/go1.18 to master March 22, 2022 00:07

dangermike force-pushed the dangermike/NOTICKET/sharding branch from e182705 to 4b51a44 Compare March 22, 2022 19:47

dangermike changed the base branch from master to dangermike/NOTICKET/benchmarks March 22, 2022 19:48

dangermike force-pushed the dangermike/NOTICKET/sharding branch 2 times, most recently from b1955db to dfffe0b Compare March 22, 2022 20:02

dangermike marked this pull request as ready for review March 22, 2022 20:02

dangermike requested a review from a team as a code owner March 22, 2022 20:02

Base automatically changed from dangermike/NOTICKET/benchmarks to master March 22, 2022 20:10

ravitch reviewed Mar 23, 2022

View reviewed changes

dangermike added 2 commits July 6, 2022 09:50

code review feedback

a7eb930

dangermike force-pushed the dangermike/NOTICKET/sharding branch from f0088d9 to a7eb930 Compare July 6, 2022 13:51

deps for bench

6056dcc

dangermike enabled auto-merge (squash) July 6, 2022 14:00

ravitch approved these changes Jul 6, 2022

View reviewed changes

dangermike merged commit cf518d8 into master Jul 6, 2022

dangermike deleted the dangermike/NOTICKET/sharding branch July 6, 2022 14:53

ghost mentioned this pull request Jul 30, 2022

Adding a Delete method #17

Merged

ghost mentioned this pull request Dec 5, 2022

upgrading all dependencies #18

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding for the lazylru cache #15

Sharding for the lazylru cache #15

dangermike commented Jan 24, 2022

ravitch Mar 22, 2022

dangermike Jul 6, 2022

ravitch Mar 22, 2022

dangermike Jul 6, 2022

ravitch Jul 6, 2022

ravitch Jul 6, 2022

Sharding for the lazylru cache #15

Sharding for the lazylru cache #15

Conversation

dangermike commented Jan 24, 2022

ravitch Mar 22, 2022

Choose a reason for hiding this comment

dangermike Jul 6, 2022

Choose a reason for hiding this comment

ravitch Mar 22, 2022

Choose a reason for hiding this comment

dangermike Jul 6, 2022

Choose a reason for hiding this comment

ravitch Jul 6, 2022

Choose a reason for hiding this comment

ravitch Jul 6, 2022

Choose a reason for hiding this comment