Use atomic operations and a read lock instead of a write lock #945

shanson7 · 2018-06-14T15:26:57Z

This PR was an attempt to reduce the exclusive lock section in the ingest path.

The idea is that the map isn't truly being modified, so we don't need to hold a write lock. The behavioral change is that if the same point is Updated by two threads, the partition/LastUpdate is not guaranteed to match. In practice, I believe that LastUpdate should pretty much be near realtime (and is mostly heuristic anyway). The partition shouldn't change frequently anyway and should be eventually consistent.

Similar changes could be made to AddOrUpdate (optimistically acquiring a read lock) but I wasn't sure how many calls to AddOrUpdate actually resulted in a write.

In our setup, we saw a 30%-40% bump in our backlog processing from this change.

replay

LGTM
Nice catch that you noticed that this can all be done with atomic operations and hence the write lock is not necessary. I'm still surprised by how large the difference is that you measured.

Dieterbe · 2018-06-18T13:49:20Z

i haven't looked in depth at this yet, but FWIW I'm pretty sure you can't just do atomic writes to integers while elsewhere reading (even when holding the read lock which in this case isn't relevant).
i recently read an interesting article that explained some of this stuff (but not sure if was exactly that), it talked about instruction reordering and how there's much less memory guarantees than some people assume. The gist of it was that compiler and/or hardware are free to repurpose memory locations as it pleases - it only has to comply with guarantees explicit in the memory model - and that this commonly happens. I couldn't find the article but I did find golang/go#5045 (comment) :

I'm fairly certain the rules will be that Go's atomics guarantee sequential consistency among the atomic variables (behave like C/C++'s seqconst atomics), and that you shouldn't mix atomic and non-atomic accesses for a given memory word.

we may want to pursue tracking of partition/lastUpdate in a separate structure that is entirely based on atomics, just thinking out loud.

shanson7 · 2018-06-18T14:04:45Z

It seems that it would be easy enough to create an AtomicUint32 (and the whole family) to encapsulate these actions. The trick is that the MetricDefinition would need to use it all over...

Dieterbe · 2018-06-18T14:11:08Z

The trick is that the MetricDefinition would need to use it all over...

related: the in-memory index (including its types and MetricDefinition) are due for a make-over anyway.
they're quite inefficient.

shanson7 · 2018-07-30T19:57:20Z

So, we were running the "unsafe" version of this (mixing atomic/non-atomic accesses) for over a month and didn't see issues, but I recently rolled out a new build without this branch as I didn't want race conditions in there, and I wasn't ready to commit to this change. However, without this change I saw about a 30% slowdown in backfill processing without it. So, I decided to just bite the bullet and look through all the .Partition and .LastUpdate accesses and access them atomically (unless a write lock is already held).

shanson7 · 2018-07-30T20:15:51Z

Opened #969 for failed test

Dieterbe · 2018-08-24T08:17:11Z

did deploying this PR but with these atomic reads added give you your 30% speedup back?
would you say this PR is ready for merging? if so, i'll do an ingest speed test as well

Dieterbe · 2018-08-24T08:35:56Z

idx/memory/memory.go

-		if existing.LastUpdate < int64(point.Time) {
-			existing.LastUpdate = int64(point.Time)
+		if atomic.LoadInt64(&existing.LastUpdate) < int64(point.Time) {
+			atomic.SwapInt64(&existing.LastUpdate, int64(point.Time))


note that this is racey.
let's say existing.LastUpdate is very old (30 days ago)
then a point comes in for 29 days ago, and concurrently another one for a day ago via a different kafka partition, and then no more points.
in that case, we can have concurrent Update calls, resulting in the LastUpdate field being updated to 29 days ago, but never to a day ago.
note that for any kafka given partition, carbon stream or prometheus POST we never have overlap in our Update calls.

so in practice, doesn't seem like an issue, but perhaps we should document this something under "tradeoffs and extremely rare edge cases" or something.

or we can solve it by either:

doing CompareAndSwap in a loop until we're able to swap for the value we wanted to swap

confirming the swapped out value (return value of SwapInt64) is smaller than what we swapped in. if not, put the old value back, check that we didn't swap for an ever higher value (placed by a concurrent Update call), etc

This operation is racey at that point even with a write lock. It's dependent on the order individual threads hit that lock call which ( with the hypothetical assumption that data can come for the same series from different threads) can be out of order from the kafka ingest step.

I'm hesitant to add anything overly complex to Update for no real world benefit, but I'll defer to your preference.

I don't understand this. if you use a write lock, you can lock, check value, if we have a larger one, update, unlock. this works regardless of the order between two concurrent update operations. (the most recent will always survive).

I think my proposal above will also solve it, and at almost no additional cost.

I was extending it to the Partition update as well. Perhaps Partition should only be updated when we have a newer timestamp as well?

the partition property is really only to shard the index in cassandra, so nodes on startup know which metrics they're responsible for.
i'm not sure if we even properly support live partition changes of metrics (i.e. whether after the change we properly display both the old and new data)
under concurrent updates it's probably ok for the partition to be ambiguous ("under transition"), but once update operations are serialized, the later one should probably always win, even when data is sent out of order. I think MT's behavior in these scenarios is so undefined that probably either way works.

shanson7 · 2018-08-24T14:26:25Z

Yes, this PR definitely got us back where we wanted to be. Ingest rate is not very consistent, but we average ~90k dp/s/core (on 8 cores, we see spikes up to 1M dp/s but average ~700k). Without this change, we were barely breaking 400k dp/s. We have been running this in production for about 2 months now, with no noticeable issues.

I really look forward to seeing if it benefits your speeds as well (the trade-off, I suppose, is greater CPU usage during ingest).

Dieterbe · 2018-08-27T09:38:13Z

I have a new branch:
https://github.com/grafana/metrictank/tree/readLockUpdates

rebased your branch on top of master
added a commit with what I think is the solution to update LastUpdate properly. I think it should not be any slower (or more complicated) then what you had.
my bumpLastUpdate function should be inlined but for some reason couldn't prove it via go build -gcflags '-m' so i also added a commit manually inlining it (but i don't intend to merge that commit)

then i filled up kafka with some MetricData data and tested ingestion with each version (twice)

Mon Aug 27 11:08:00 CEST 2018 running mt-dieter
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
Mon Aug 27 11:10:00 CEST 2018 running mt-dieter
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
/home/dieter/go/src/github.com/grafana/metrictank
Mon Aug 27 11:12:00 CEST 2018 running mt-dieter-inline
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
Mon Aug 27 11:14:01 CEST 2018 running mt-dieter-inline
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
/home/dieter/go/src/github.com/grafana/metrictank
Mon Aug 27 11:16:02 CEST 2018 running mt-master
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
Mon Aug 27 11:18:02 CEST 2018 running mt-master
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
/home/dieter/go/src/github.com/grafana/metrictank
Mon Aug 27 11:20:03 CEST 2018 running mt-sean
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
Mon Aug 27 11:22:03 CEST 2018 running mt-sean
Restarting docker-dev-custom-cfg-kafka_metrictank_1 ... done
/home/dieter/go/src/github.com/grafana/metrictank

https://snapshot.raintank.io/dashboard/snapshot/hMzSp4LGcBaJ5iKDrvrWMueMWzkxTUL6?orgId=2
https://snapshot.raintank.io/dashboard/snapshot/OCgXyoe0RQEURuAS6gV3Iuzo3K7c5qJS?orgId=2

cpu difference looks fine (tiny. if anything, proportional to the increased ingest but perhaps even less)
version sean, dieter and dieter-inline all seem to perform the same, so i suggest we merge my branch minus the inline commit.
important to note that this is not under a http workload, so any gains here are the baseline gains, under concurrency I expect to gain more.

sound good @shanson7 ?

shanson7 · 2018-08-27T10:50:02Z

Yeah, looks great to me! I'm excited to see if you see a difference in ingest speed as pronounced as we did.

Dieterbe · 2018-08-27T20:36:25Z

rollout to our internal monitoring environment..
backfill speeds of different instances as the pods restart one by one.
left: old, right: new.
note increased speeds and also shorter cluster restart time (both charts are exactly 7h wide)

old deploy duration 8:30 - 14:15 = 5:45
new 14:51 - 19:00 = 4:09
in minutes: (345-249)/249 = 38% faster

Dieterbe self-requested a review June 14, 2018 15:56

replay reviewed Jun 15, 2018

View reviewed changes

shanson7 force-pushed the readLockUpdates branch from 3f91288 to 426815f Compare July 30, 2018 19:49

shanson7 added 2 commits August 8, 2018 19:09

Protect all non-write locked atomic memory accesses

09aff3f

Fix compiler errors

f4b7793

shanson7 force-pushed the readLockUpdates branch from 0c15f36 to f4b7793 Compare August 8, 2018 23:09

Dieterbe reviewed Aug 24, 2018

View reviewed changes

Dieterbe added a commit that referenced this pull request Aug 27, 2018

Merge branch 'readLockUpdates' a slightly better version of #945

0a7ae34

Dieterbe closed this Aug 27, 2018

shanson7 deleted the readLockUpdates branch October 2, 2018 16:58

Dieterbe mentioned this pull request Dec 11, 2019

Fix deadlock when write queue full #1569

Merged

Use atomic operations and a read lock instead of a write lock #945

Use atomic operations and a read lock instead of a write lock #945

Uh oh!

Conversation

shanson7 commented Jun 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

replay left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dieterbe commented Jun 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shanson7 commented Jun 18, 2018

Uh oh!

Dieterbe commented Jun 18, 2018

Uh oh!

shanson7 commented Jul 30, 2018

Uh oh!

shanson7 commented Jul 30, 2018

Uh oh!

Dieterbe commented Aug 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dieterbe Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

Dieterbe Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

shanson7 Aug 24, 2018

Choose a reason for hiding this comment

Uh oh!

Dieterbe Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

shanson7 Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

Dieterbe Aug 26, 2018

Choose a reason for hiding this comment

Uh oh!

shanson7 commented Aug 24, 2018

Uh oh!

Dieterbe commented Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shanson7 commented Aug 27, 2018

Uh oh!

Dieterbe commented Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shanson7 commented Jun 14, 2018 •

edited

Loading

replay left a comment •

edited

Loading

Dieterbe commented Jun 18, 2018 •

edited

Loading

Dieterbe commented Aug 24, 2018 •

edited

Loading

Dieterbe commented Aug 27, 2018 •

edited

Loading

Dieterbe commented Aug 27, 2018 •

edited

Loading