Robert/addr memory leak #8717

robert-zaremba · 2021-02-26T20:21:36Z

Description

In #8694 cache for AccAddress.String() was added as a simple map without control and estimates how that map will grow. This leads to unmanageable memory growth, which could put a node under a risk.

In this PR,

we add a benchmark for the Address.String method.
we add an LRU cache instead of the map.
remove deferred calls
optimize the Address.Empty function

Before:

BenchmarkAccAddressString-4   	 4533326	       268 ns/op	     144 B/op	       3 allocs/op
BenchmarkAccAddressString-4   	 4592979	       283 ns/op	     144 B/op	       3 allocs/op

After:

BenchmarkAccAddressString-4   	 4180477	       279 ns/op	     144 B/op	       3 allocs/op
BenchmarkAccAddressString-4   	 4633738	       274 ns/op	     144 B/op	       3 allocs/op

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (see CONTRIBUTING.md)
Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
Code follows the module structure standards.
Wrote unit and integration tests
Updated relevant documentation (docs/) or specification (x/<module>/spec/)
Added relevant godoc comments.
Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
Re-reviewed Files changed in the Github PR explorer
Review Codecov Report in the comment section below once CI passes

robert-zaremba · 2021-02-26T20:24:45Z

BTW: we could add same thing for ValAddress and ConsAddress.
However, to be honest, I don't see much sense adding cache for Address. There are tonnes of other, better places for optimization without using a cache. Since it was added, I wanted to fix the memory issue.

codecov · 2021-02-26T20:33:23Z

Codecov Report

Merging #8717 (ecc288e) into master (30f58b5) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #8717      +/-   ##
==========================================
- Coverage   61.41%   61.37%   -0.04%     
==========================================
  Files         674      671       -3     
  Lines       38399    38336      -63     
==========================================
- Hits        23581    23529      -52     
- Misses      12335    12348      +13     
+ Partials     2483     2459      -24

Impacted Files	Coverage Δ
types/address.go	`65.99% <100.00%> (-0.94%)`	⬇️
x/ibc/core/02-client/abci.go	`60.00% <0.00%> (-24.62%)`	⬇️
client/keys/migrate.go	`43.47% <0.00%> (-12.86%)`	⬇️
x/upgrade/keeper/grpc_query.go	`50.00% <0.00%> (-8.83%)`	⬇️
x/upgrade/keeper/keeper.go	`68.10% <0.00%> (-6.90%)`	⬇️
x/upgrade/types/plan.go	`83.78% <0.00%> (-3.18%)`	⬇️
x/ibc/core/02-client/types/codec.go	`77.52% <0.00%> (-2.48%)`	⬇️
store/cachekv/store.go	`87.12% <0.00%> (-1.05%)`	⬇️
x/auth/tx/builder.go	`75.59% <0.00%> (-0.79%)`	⬇️
simapp/app.go	`85.46% <0.00%> (ø)`
... and 23 more

alessio · 2021-02-26T21:01:41Z

types/address.go

-
-	aa2 := AccAddress{}
-	return bytes.Equal(aa.Bytes(), aa2.Bytes())
+	return aa == nil || len(aa) == 0


types/address.go

types/address_bench_test.go

alessio · 2021-02-26T21:07:33Z

types/address.go

 var addMu sync.Mutex
-var addrStrMemoize = make(map[string]string)
+var addrStrMemoize, _ = simplelru.NewLRU(1000, nil)


This initialization should be in an init() function, so that we can check the error returned and panic() if it is != nil.

I was following the current design. That being said I agree with you - I will use init.

alessio

ACK on the concept from me

types/address.go

alessio

ACK

odeke-em

Thank you for this change @robert-zaremba! My apologies for the late reply, I was swamped with a couple of things.

A few things:
a) Please make the []byte->string conversion zero cost and not incur allocations
b) Please revert to using defers, the Go compiler made defer dirt cheap as of Go1.13 in 2019 so these are no longer required, please see https://go.googlesource.com/proposal/+/refs/heads/master/design/34481-opencoded-defers.md which was also publicized for that release
c) To get a really good reason to benchmark the LRU just remember that even if each Bech32 address is 90 characters at most as per https://wiki.trezor.io/Bech32#:~:text=A%20Bech32%20string%20is%20at,The%20human%2Dreadable%20part.&text=The%20separator%2C%20which%20is%20always%20%221%22, to consume 1GiB of RAM, that would need to process ~12M Bech32 addresses per (1<<30)/90, am not sure how often this will happen
d) the consideration of the LRU's contention should be checked and this can be done by writing a benchmark with say 1006 (>LRU_SIZE) bech32 values, run the benchmark before and after this change, and then run benchstat before.txt after.txt

Thank you!

types/address.go

odeke-em · 2021-03-01T01:33:28Z

types/address.go

+		addrMu.Lock()
+		addrCache.Add(key, "")
+		addrMu.Unlock()


There is no need to add this, I'd move all this into the defer as it was before.

Defer is doing extra allocation ;)
If we won't add a lock here, the race detector will complain.

types/address_bench_test.go

types/address.go

odeke-em · 2021-03-01T01:42:44Z

types/address_bench_test.go

+	var str string
+	for i := 0; i < b.N; i++ {
+		str = aa.String()
+	}
+	require.NotEmpty(b, str)


To better get a good benchmark in which code won't be optimized away, use a global variable that's an interface e.g.

var sink, revert interface{} ... for i := 0; i < b.N; i++ { result := ... sink = result } if sink == nil { b.Fatal("benchmark never ran") } sink = revert

You mean a variable outside of any function? We are not using it in other benchmarks.

Well just because it wasn't done correctly before doesn't mean we shouldn't do it correctly, I certainly have submitted many other benchmarks where we set to a global sink and check that.

I checked it locally - it doesn't change anything. Are you sure we need to trick the benchmark code?
My only objection is that we make the whole thing less readable and more complex without any observable effect.

I am sure about code at times getting optimized away and that being the only way to block that. Humbly, for a bit of clarity, I work on the Go project, across almost all the packages and I am bringing this advice from the many years of working on benchmarking related code :-)

I see. But why to obfuscate more if there is no difference (I verified it)? The require.NoEmpty already assures that value must not be ignored.

robert-zaremba · 2021-03-01T13:43:53Z

We can make more sophisticated benchmarks later (feel free to create a task for that).
I will add the unsafe conversion - but I don't think this is needed (especially if we want to use defer - which will create a new function stack)

odeke-em

Thank you for the next round of review, @robert-zaremba!

We can make more sophisticated benchmarks later (feel free to create a task for that).

Am not sure what you mean by "sophisticated benchmarks", we need to get conclusive data on the performance and results. I did describe above how to do it. This change is adding in an LRU, replacing a global cache, whose size will only matter if we hit millions of accounts. If the performance of the LRU is worse than for the global map at scale, then we perhaps have other problems.

For benchmarks, after you run them before and after, please use benchstathttps://pkg.go.dev/golang.org/x/tools/cmd/benchstat to give a summary of the comparisons otherwise humans can't tell what changed easily. Also to ensure you get good consistent runs, please use -count=5 or even more, to like -count=20.

The benchmark that I had suggested earlier on will give us conclusive results on the impacts and changes plus what performance improvements or adverse effects that this change brings.

Thanks for the PR.

types/address.go

robert-zaremba · 2021-03-02T14:03:27Z

@odeke-em - adding more benchmarks is going out of scope and we should have a good real world scenario here. We can do it in other issue.

The intention of this PR was to remove the memory leak. I think it's better to have this update then rolling back the cache. I suggest to create a new issue with further benchmarks.

I added one more, non trivial address to the benchmark.

types/address.go

amaury1093

utACK

robert-zaremba · 2021-03-03T03:31:36Z

@odeke-em your lock on this PR is still active. Please comment if there is something wrong in this PR. Adding more benchmarks is, imho, out of scope - we would need to go into use-cases / scenario bench-marking to bring bore valuable information.

odeke-em · 2021-03-03T03:36:06Z

Thanks for the ping, Robert! I’ll remove my review to avoid blocking you but I definitely think that without a proper and conclusive benchmark, we might not get the right results, as the point of using an LRU was to counteract hogging up memory that won’t get evicted. If the cost of using an LRU is worse that’s a problem.

Differing opinions so unblocking author

robert-zaremba · 2021-03-03T03:41:07Z

If the cost of using an LRU is worse that’s a problem.

The benchmark (which extensively uses cache) shows that's not worse. Moreover the new code uses less allocations - this is due to avoiding defer and optimizing "empty" case.

colin-axner · 2021-03-03T11:27:48Z

The race-detector tests appears to begin failing with the merge of this pr. Maybe it is due to something else, but thought I'd mention it here

robert-zaremba · 2021-03-03T12:50:27Z

@colin-axner What exactly and Where do you see the test failing?
Here, in the merge checks nothing was failing.
When running make test-race I see the following things failing:

FAIL: TestListSnapshots (137.46s)
FAIL github.com/cosmos/cosmos-sdk/baseapp 468.134s

It's hard to digest the output of maket test-race. There are tonnes of lines related to leveldb....

The only interesting thing I found is the following warning:

WARNING: DATA RACE
Write at 0x00c000fbf470 by goroutine 43:
  container/list.(*List).move()
      /usr/lib/go-1.15/src/container/list/list.go:123 +0x28a
  container/list.(*List).MoveToFront()
      /usr/lib/go-1.15/src/container/list/list.go:188 +0x23f
  github.com/hashicorp/golang-lru/simplelru.(*LRU).Get()
      /home/robert/go/pkg/mod/github.com/hashicorp/golang-lru@v0.5.4/simplelru/lru.go:75 +0xed
  github.com/cosmos/cosmos-sdk/types.AccAddress.String()
      /home/robert/go/src/github.com/cosmos/cosmos-sdk/types/address.go:271 +0x152

This access is OK - because we are fine with a concurrent read, even if it's done during the read - we write data only once, and it never depends on what we read. I can add a mutex there if we want to mute that warning. What do you think?

colin-axner · 2021-03-03T14:40:26Z

Ah, I got a little confused by not looking at the full logs

I also see this (and similar errors in a couple other places):

2021-03-03T03:52:25.9594698Z --- FAIL: TestIntegrationTestSuite (102.02s)
2021-03-03T03:52:25.9595884Z     --- FAIL: TestIntegrationTestSuite/TestCLIMultisign (11.95s)
2021-03-03T03:52:25.9597152Z     --- FAIL: TestIntegrationTestSuite/TestCLISignBatch (4.64s)
2021-03-03T03:52:25.9598103Z     testing.go:1038: race detected during execution of test
2021-03-03T03:52:25.9598655Z FAIL
2021-03-03T03:52:25.9599405Z FAIL	github.com/cosmos/cosmos-sdk/x/auth/client/cli	102.710s

My main concern is having the race-detector become stuck on always failing. I never check the logs, so I imagine it could be dangerous to have a test which always fails. If someone does introduce a race condition, it could go unnoticed

I have little context on this pr, so will defer to the decisions of others here. Maybe the other test-race tests are a good enough to ensure race conditions don't get introduced

robert-zaremba · 2021-03-03T14:51:32Z

Yeah, the logs are huuuuge. I will add a check and let's see what will happen.

alessio · 2021-03-03T14:56:50Z

Oh wow - why was mergify allowed to merge something whose tests are not all green in the first place?

@marbar3778 can we do something to prevent this from happening again?

alessio · 2021-03-03T14:59:17Z

I think I know why. The race-detector-report job was not mandatory for PRs to be merged in.

EDIT: I fixed the branch configuration

colin-axner · 2021-03-03T15:02:54Z

It should be noted that I don't think any pr's can be merged until the race detector is fixed

alessio · 2021-03-03T15:06:53Z

It should be noted that I don't think any pr's can be merged until the race detector is fixed

The branch configuration is now fixed, so PRs will be blocked until the race is fixed. Does that work for you?

colin-axner · 2021-03-03T15:13:20Z

It is fine with me, but it might be blocking a 0.42.0 final release.

I'm hoping to remove IBC #8735 this week (so it is included in 0.42). That pr is just waiting on approval of #8624 which I guess now will wait on a race-detector fix

Overall though, I am most certainly in favor of fixing tests before merging more pr's and making the problem worse. Just hope we can do so soon

robert-zaremba · 2021-03-03T15:15:05Z

PR: #8767

robert-zaremba added 3 commits February 26, 2021 21:03

address: adding cache address.String() cache benchmark

b69e19b

address: use LRU cache for .String()

280168c

optimize address.Empty

60e6d71

robert-zaremba requested review from amaury1093, odeke-em and alexanderbez February 26, 2021 20:21

robert-zaremba requested review from aaronc and alessio as code owners February 26, 2021 20:21

robert-zaremba added the T:Bug label Feb 26, 2021

robert-zaremba added this to the v0.42 milestone Feb 26, 2021

robert-zaremba requested a review from tac0turtle February 26, 2021 20:22

robert-zaremba mentioned this pull request Feb 26, 2021

types: intern AccAddress.String() to gut wasted expensive recomputations #8694

Merged

9 tasks

alessio reviewed Feb 26, 2021

View reviewed changes

types/address.go

aa2 := AccAddress{}

return bytes.Equal(aa.Bytes(), aa2.Bytes())

return aa == nil || len(aa) == 0

Copy link

Contributor

alessio Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

alessio reviewed Feb 26, 2021

View reviewed changes

types/address.go Outdated Show resolved Hide resolved

alessio reviewed Feb 26, 2021

View reviewed changes

types/address_bench_test.go Show resolved Hide resolved

alessio reviewed Feb 26, 2021

View reviewed changes

tac0turtle reviewed Feb 27, 2021

View reviewed changes

types/address.go Outdated Show resolved Hide resolved

Merge branch 'master' into robert/addr-memory-leak

ecc288e

alessio approved these changes Feb 27, 2021

View reviewed changes

sahith-narahari approved these changes Feb 27, 2021

View reviewed changes

move cache initialization to init function

991169a

odeke-em previously requested changes Mar 1, 2021

View reviewed changes

robert-zaremba added 4 commits March 1, 2021 17:58

Merge branch 'master' into robert/addr-memory-leak

fbe6b78

Use UnsafeBytesToStr convertion with Addr cache

2294920

add cache for other address String() methods

7e577b9

fix linter issue

0a3ff3f

robert-zaremba requested a review from odeke-em March 1, 2021 20:44

odeke-em reviewed Mar 2, 2021

View reviewed changes

types/address.go Show resolved Hide resolved

types/address.go Show resolved Hide resolved

add a non trivial address for Address.String benchmark

a02c0d4

robert-zaremba requested a review from odeke-em March 2, 2021 14:03

Merge branch 'master' into robert/addr-memory-leak

a890b1e

amaury1093 reviewed Mar 2, 2021

View reviewed changes

types/address.go Outdated Show resolved Hide resolved

add comment about cache size and update cashe size to 60k

4d26f57

amaury1093 approved these changes Mar 2, 2021

View reviewed changes

fix syntax

cbbf8b3

Merge branch 'master' into robert/addr-memory-leak

f71691f

robert-zaremba added the A:automerge Automatically merge PR once all prerequisites pass. label Mar 3, 2021

mergify bot merged commit 2864eb6 into master Mar 3, 2021

mergify bot deleted the robert/addr-memory-leak branch March 3, 2021 03:53

robert-zaremba mentioned this pull request Mar 3, 2021

Lock mutex on getting element from cache #8767

Merged

9 tasks

odeke-em mentioned this pull request Mar 18, 2021

data race detected #8923

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Robert/addr memory leak #8717

Robert/addr memory leak #8717

robert-zaremba commented Feb 26, 2021 •

edited

Loading

robert-zaremba commented Feb 26, 2021

codecov bot commented Feb 26, 2021 •

edited

Loading

alessio Feb 26, 2021

alessio Feb 26, 2021

robert-zaremba Feb 27, 2021

alessio left a comment

alessio left a comment

odeke-em left a comment

odeke-em Mar 1, 2021

robert-zaremba Mar 1, 2021

odeke-em Mar 1, 2021

robert-zaremba Mar 1, 2021

odeke-em Mar 2, 2021

robert-zaremba Mar 2, 2021

odeke-em Mar 2, 2021

robert-zaremba Mar 2, 2021

robert-zaremba commented Mar 1, 2021

odeke-em left a comment

robert-zaremba commented Mar 2, 2021 •

edited

Loading

amaury1093 left a comment

robert-zaremba commented Mar 3, 2021

odeke-em commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

colin-axner commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

colin-axner commented Mar 3, 2021 •

edited

Loading

robert-zaremba commented Mar 3, 2021

alessio commented Mar 3, 2021

alessio commented Mar 3, 2021 •

edited

Loading

colin-axner commented Mar 3, 2021

alessio commented Mar 3, 2021

colin-axner commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

Robert/addr memory leak #8717

Robert/addr memory leak #8717

Conversation

robert-zaremba commented Feb 26, 2021 • edited Loading

Description

robert-zaremba commented Feb 26, 2021

codecov bot commented Feb 26, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alessio left a comment

Choose a reason for hiding this comment

alessio left a comment

Choose a reason for hiding this comment

odeke-em left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robert-zaremba commented Mar 1, 2021

odeke-em left a comment

Choose a reason for hiding this comment

robert-zaremba commented Mar 2, 2021 • edited Loading

amaury1093 left a comment

Choose a reason for hiding this comment

robert-zaremba commented Mar 3, 2021

odeke-em commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

colin-axner commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

colin-axner commented Mar 3, 2021 • edited Loading

robert-zaremba commented Mar 3, 2021

alessio commented Mar 3, 2021

alessio commented Mar 3, 2021 • edited Loading

colin-axner commented Mar 3, 2021

alessio commented Mar 3, 2021

colin-axner commented Mar 3, 2021

robert-zaremba commented Mar 3, 2021

robert-zaremba commented Feb 26, 2021 •

edited

Loading

codecov bot commented Feb 26, 2021 •

edited

Loading

robert-zaremba commented Mar 2, 2021 •

edited

Loading

colin-axner commented Mar 3, 2021 •

edited

Loading

alessio commented Mar 3, 2021 •

edited

Loading