-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix: Saving big payloads by cache CouchDB #1909
Bug Fix: Saving big payloads by cache CouchDB #1909
Conversation
@przemyslaw - Thanks for the PR. Looks like you need to sign-off on your commit - you likely just need to do:
But the change looks pretty good to me |
Already done |
Signed-off-by: przemyslaw <przemo.wasala@gmail.com>
Nice catch. We did in fact discuss about this 64 KB limit (listed in the comparison table indirectly -- https://docs.google.com/document/d/1Rxczkwni5oG6MBif1s0KN6_PR29tM4WP9IEOvbLMTu8/edit?usp=sharing). However, the change of size and returning of old version didn't strike. @manish-sethi @denyeart what's your opinion on the maximum allowed size? Should we allow any arbitray size or keep an absolute maximum? A few larger size entries migh reduce the number of entries in the cache. As it is a workload dependent, we can make it configurable. If we want simplicity, we can go with any size (as done in this PR) without introducing any config parameter. |
@cendhu I would suggest keeping it simple as is done in this PR, with no new config options. |
@@ -64,9 +65,12 @@ func (c *cache) getState(chainID, namespace, key string) (*CacheValue, error) { | |||
} | |||
|
|||
cacheKey := constructCacheKey(chainID, namespace, key) | |||
valBytes := cache.Get(nil, cacheKey) | |||
valBytes := cache.GetBig(nil, cacheKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clear one doubt, I had a quick look into the code of this library and unfortunately, it turned out to be correct. Blindly looking for a key using GetBig
API can cause unintended consequences. I was able to construct one example that leads to a panic.
func TestDoesItWork(t *testing.T) {
c := fastcache.New(64 * 1024 * 1024)
c.Set([]byte("key"), []byte("sixteen-byte-val"))
fmt.Printf("[%s]\n", c.GetBig(nil, []byte("key")))
}
It appears that the library assumes that the application remembers what keys were stored using Set
API V/s SetBig
API.
In order to fix the bug, my suggestion would be... to check the value size, if greater than size limit then delete the existing key from cache if present
func setImplicitValue(c *fastcache.Cache, cacheKey []byte, value []byte) { | ||
if cap(value) > fastCacheValueSizeLimit { | ||
c.SetBig(cacheKey, value) | ||
} else { | ||
c.Set(cacheKey, value) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would happen if the oldValue was Big but the new value is not and vice-versa? Will the old big value be deleted before storing the new small value and vice-versa? It isn't clear from the documentation or code.
c := fastcache.New(64 * 1024 * 1024)
c.Set([]byte("key1"), []byte("value1"))
val = c.Get(nil, []byte("key1"))
fmt.Printf("Get(): Key is set with small value: length [%d], value[%s]\n", len(val), string(val))
big := make([]byte, 2*64*1024)
rand.Read(big)
c.SetBig([]byte("key1"), big)
val = c.GetBig(nil, []byte("key1"))
fmt.Printf("GetBig(): Key is set with big value: length [%d]\n", len(val))
val = c.Get(nil, []byte("key1"))
fmt.Printf("Get(): Key is set with big value: length [%d], value[%s]\n", len(val), string(val))
c.Set([]byte("key1"), []byte("value1"))
val = c.Get(nil, []byte("key1"))
fmt.Printf("Get(): Key is set with small value: length [%d], value[%s]\n", len(val), string(val))
val = c.GetBig(nil, []byte("key1"))
fmt.Printf("GetBig(): Key is set with big value: length [%d]\n", len(val))
Here, after setting the big value to an existing key with a small value, the Get()
does not return nil
but some random bytes. Though the GetBig()
is always used in this PR first before the Get()
, there is a lot of unspecified assumption and hence, the code look fragile.
There are two options:
- We can always use
GetBig()
andSetBig()
irrespective of the size limit but need to benchmark the performance impact for value < 64 KB. - What @manish-sethi suggested here -- https://github.com/hyperledger/fabric/pull/1909/files#r492786264
I would prefer 2. If needed, we can quantify the impact of always using big APIs and make a separate PR if needed for 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to take a note, when looking for alternate ways, I would add one more approach in which we could remember in the cache itself if the key was saved using the SetBig
api (in a separate namespace) and can evaluate the overheads compared to always using Big apis.
As far as this bug fix is concerned, my inclination also would be to keep it simple as we need to back port this as well. We can maintain the existing behavior (i.e., not storing bigger values) and just fix the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already upgraded fastcache. Take a look in file changes. Currently we can use Has(<key>)
method from fastcache and the problem is getting payload from cache(usage BigCache or Fastcache). So I changed this firstly by using GetBig()
and if length of bytes is less than fastcache capacity then I use Get()
. This will prevent from getting incorrect byte array.
Feel free for any advice/correction/refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still does not solve the problem that I had highlighted in my previous comment. I'll encourage you to run locally the sample test that I posted in my comment to understand the problem with this fix. I am not sure if upgrading fastcache solves the issue. The problem is with the wrong usage of APIs. To elaborate the case, there could be a key that we previously set using the Set()
api. Now, in your fix, you first look for the key using the GetBig()
api and as highlighted in my test, this can lead to a panic crash if the value of the key (that is set previously) has certain properties.
For fixing the bug, as we discussed before, in the function putState
, we can check the size of the value and if greater than the limit, don't call Set
api and instead call Del
api in order to remove the previous version of the key from the cache. This maintains the current behavior i.e., we don't store the big values in the cache and just fix the bug.
For allowing for bigger values later, we discussed a couple of options above and they can be tried and evaluated in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the update PR, the usage of Has()
, GetBig()
, Get()
, and comparing the valueSize
and sizeLimit
complicates the simple cache usage. Let's go with the following suggestion for a bug fix.
For fixing the bug, as we discussed before, in the function putState, we can check the size of the value and if greater than the limit, don't call Set api and instead call Del api in order to remove the previous version of the key from the cache. This maintains the current behavior i.e., we don't store the big values in the cache and just fix the bug.
As storing > 64 KB is a new feature, let's do that separately with less complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already understood that you won't take care about bigValues
in cache.
- You want to save them only in couchDB
- When
GetState
has been called in statecouchDB try get from cache, if null get it directly from couchDb
Am I correct?
In connection with the new version I thought that using provided method like Has
and HasGet
will be nice.
You have seen changes in UpdateStates
and getState
method
I did what @manish-sethi suggested to me.
Not ready yet
Signed-off-by: przemyslaw <przemo.wasala@gmail.com>
Locally tests passed. Any hints how to improve them on CI? |
Discover, report, and fix flakes. The issues you're hitting have probably been around for a long time. |
Actually, this is a newer flake and I'm the one guilty of introducing it: https://jira.hyperledger.org/browse/FAB-18238 Just created a PR with a fix: #1916 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @przemyslaw. The logic of deleting the existing key looks good overall. However, I noticed one issue when I looked closely. You would need to do this unconditionally. A few minor clean up comments as well.
// If payload is bigger than fastcache can store then delete previous key. | ||
// When GetState() has been called for deleted key then it will call database | ||
// against fastcache. | ||
if cap(valBytes) > fastCacheValueSizeLimit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
len(valBytes)
instead ofcap(valBytes)
-
Looking closely, unfortunately, even this is not going to be a sufficient condition because they way this library encodes the data internally,
uint64(len(kvLenBuf) + len(k) + len(v))
is the length that matters not the length of the value only. Here, againkvLenBuf
is a fixed size of4
. But I do not prefer to depend on so much of internal logic of a library. So, my suggestion would be to always delete the key irrespective of the size condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my suggestion would be to always delete the key irrespective of the size condition.
@przemyslaw A clarification on the above suggestion. I think @manish-sethi suggested to remove fastCacheValueSizeLimit
altogether and all related checks associated with it. We will simply delete the existing entry before storing a new one without checking the size. Hence, before every cache.Set()
, we can simply do cache.Del()
(while cache.Has()
is optional). For bigger values, cache.Set()
would anyway ignore. I have verified this with @manish-sethi as I had some confusion. I am sharing it here so that the suggestion is explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
if cache.Has(cacheKey) { | ||
cache.Del(cacheKey) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical piece for this bug fix but this is not covered in the test
|
||
// test GetState | ||
v, err := cache.getState("ch1", "ns1", "k1") | ||
require.NoError(t, err) | ||
require.Nil(t, v) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the data before so that line 96-98 in cache.go
gets tested.
@manish-sethi take a look one more time. I have found that my previous commit didn't work. Actual PR is ready for review |
@przemyslaw - My review comments are applicable to your latest commit. If you can address these today then it will be able to back port this PR to previous release as well. |
@przemyslaw To clarify the timing, we are expecting to release a v2.2.1 tomorrow. If this PR can be merged in time, we can also quickly backport to release-2.2 so that it gets into the v2.2.1 release. |
Corrections are ready for review (@manish-sethi, @cendhu ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@przemyslaw - Thanks. This look good to me now. I have a few minor comments on the test code though.
v, err := cache.getState("ch1", "ns1", "k1") | ||
require.NoError(t, err) | ||
require.Nil(t, v) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not relevant for this test
expectedValue := &CacheValue{Value: []byte("value")} | ||
require.NoError(t, cache.putState("ch1", "ns1", "k1", expectedValue)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this, you could have added following to verify the initial condition for the test
v, err = cache.getState("ch1", "ns1", "k1")
require.NoError(t, err)
require.True(t, proto.Equal(expectedValue, v))
require.NoError(t, cache.putState("ch1", "ns1", "k1", expectedValue1)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to this, you could have added a test for the function cache.UpdateStates
, as that is not covered in this test.
FYI the fix for the integration test flake you hit (again) here was just merged into master. Rebase on master to pull in that commit and you should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @przemyslaw for your patience and consistent improvement on this PR.
Signed-off-by: przemyslaw <przemo.wasala@gmail.com>
@Mergifyio backport release-2.2 |
Command
|
* Bug Fix: Saving big payloads by cache CouchDB Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Update fastcache and use new API from fastcache Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Cache support only small payloads Signed-off-by: przemyslaw <przemo.wasala@gmail.com> Signed-off-by: manish <manish.sethi@gmail.com>
* Bug Fix: Saving big payloads by cache CouchDB Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Update fastcache and use new API from fastcache Signed-off-by: przemyslaw <przemo.wasala@gmail.com> * Cache support only small payloads Signed-off-by: przemyslaw <przemo.wasala@gmail.com> Signed-off-by: manish <manish.sethi@gmail.com>
Type of change
Description
Saving big payloads (bigger than 64kb are not saved in cache)
Additional details
When saving big payloads more than 64kb, changes are updated/saved in CouchDB, but aren't
visible in cache (nothing changed). Then when we execute stub.getState("someID") nothing has been returned
or old revision (smaller than 64kb) will be returned.
Fastcache library has got methods for updating/saving/getting bigPayloads like: GetBig and SetBig.
Release Note
FAB-18245