Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory issue & improve processing speed #212

Merged
merged 12 commits into from
Aug 29, 2022
Merged

Conversation

mga-chka
Copy link
Collaborator

@mga-chka mga-chka commented Aug 25, 2022

Description

There is a huge memory overhead when caching data in either redis or fscache (it's a regression introduce during the redis cache feature).
It was raised by 2 people in this issue
#206
and this PR
#210

The aim of this PR is to fix it and improve (if possible) the query processing speed.

Here is a summary of the improvements.
All the scenarios are improved when cache is on:

  • scenario 1: file system cache on cache miss
  • scenario 2: file system cache on cache hit
  • scenario 3: redis cache on cache miss
  • scenario 4 redis cache on cache hit

And here are the result of a bench on my laptop (that consist of running sequentially 20 queries that fetch 64MB of data each from clickhouse)
before fix:

  • scenario 1: heap size: 150MB | processing speed: 220MB/sec then* 70MB/sec
  • scenario 2: heap size: 150MB | processing speed: 1.2GB/sec
  • scenario 3: heap size: 750MB | processing speed: 120MB/sec then* 80MB/sec
  • scenario 4: heap size: 350MB | processing speed: 140MB/sec then* 70MB/sec

after fix

  • scenario 1: heap size: 10MB** | processing speed: 220MB/sec then* 70MB/sec
  • scenario 2: heap size: 10MB** | processing speed: 3.1GB/sec
  • scenario 3: heap size: 15MB** | processing speed: 140MB/sec
  • scenario 4: heap size: 17MB** | processing speed: 1,2GB/sec

** The size of the heap at startup is 7MB
*The throughput decreases a lot after 5-8 queries, it's likely because my ssd disk speed degrades on bursts (because of some internal caches I guess) or because of the turbo boost of intel CPUs.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Checklist

  • Linter passes correctly
  • Add tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary

Does this introduce a breaking change?

  • Yes
  • No
    This PR will make the objects store in fs & redis cache in the previous version unreachable because the key generation has changed since the format is different

Further comments

@render
Copy link

render bot commented Aug 25, 2022

cache/cache.go Outdated
@@ -24,7 +24,7 @@ type ContentMetadata struct {

type CachedData struct {
ContentMetadata
Data io.Reader
Data io.ReadCloser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be worth to leave a comment here why a client of this object is responsible for closing the reader

Comment on lines 103 to 106
conditionalStr := ""
if len(value) > 30 {
conditionalStr = "..."
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd, I guess it's to display the fact that we display only a fraction of data. Is it needed? If you think so, maybe we can mutualise 30 to variable and rename conditionalStr to logSuffx?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's needed because I added a test with a string of 4MB and if this test fail the logs will contains a string of 4MB ...
I'll do the changes requested

@@ -91,11 +83,11 @@ func (r *redisCache) nbOfBytes() uint64 {
return uint64(cacheSize)
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, good cache

cache/redis_cache.go Show resolved Hide resolved
Comment on lines 131 to 135
if err != nil {
log.Errorf("failed to decode payload: %s , due to: %v ", payload.Payload, err)
log.Errorf("failed to get key %s with error: %s", key.String(), err)
return nil, ErrMissing
}
stringKey := key.String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move it before err handling and use stringKey inside log

func (m io_reader_decorator) Close() error {
return nil
}
func (r *redisCache) stringToBytes(s string) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more than stringToBytes. The way I understand it, this function:

  • on the first 4 bytes encodes len of the string
  • appends the string to the slice
    Can we call it encodeString and leave the comment? Reading this function take a moment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just realised that it's the same code as in filesystem cache, isn't it? (both encode and decode - there it's called writeHeader and readHeader, maybe that's a better naming. Except that there we write to writer and read from reader)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is really transforming a string to a byte[] in order for serialization usage, I'll rename it to encodeString (but we will lose the information that the string becomes a byte array). Yes the logic is closed to writeHeader/readHeader execpt that it's more granulare since we're only dealing with a single string and not the headers.

return b
}

func (r *redisCache) stringFromBytes(bytes []byte) (string, int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then we could call that one decodeString. Also I think both of them are not tightly coupled to redis cache. I'd extract them out to marshalling file or something like this

return string(s), int(4 + n)
}

func (r *redisCache) metadataToByte(contentMetadata *ContentMetadata) []byte {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add suggest adding a method receiver to ContentMetadata instead of keeping it in redis cache

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean?
FYI I renamed this function as encodeMetadata in order to be consistent with encodeString

}
// this struct is here because CachedData requires an io.ReadCloser
// but logic in the the Get function generates only an io.Reader
type io_reader_decorator struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more golang compliant

Suggested change
type io_reader_decorator struct {
type ioReaderDecorator struct {

return r.expire, nil
}

func (r *redisCache) Name() string {
return r.name
}

func toBytes(stream io.Reader) ([]byte, error) {
buf := new(bytes.Buffer)
type RedisStreamReader struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be private I think

Suggested change
type RedisStreamReader struct {
type redisStreamReader struct {

}
}

func (r *RedisStreamReader) Read(destBuf []byte) (n int, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very smart! :)

}

func TestRedisCacheMiss(t *testing.T) {
c := generateRedisClientAndServer(t)
cacheMissHelper(t, c)
}
func TestStringFromToByte(t *testing.T) {
c := generateRedisClientAndServer(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed to test encode and decode functions (once they're decoupled from redis cache)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I asked myself this question before doing this test but thought having a dedicated test that only that only focus on this part would be easier for debugging if this part contains a pb (at least for coding this part very helpful).
Even if it's redundant with other tests it's still worth keep it IMHO

// But before that, since the usage of the reader could take time and the object in redis could disappear btw 2 fetchs
// we need to make sure the TTL will be long enough to avoid nasty side effects
// nb: it would be better to retry the flow if such a failure happened but this require a huge refactoring of proxy.go
if ttl <= 5*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extract it to const next to other timeouts?

Comment on lines 178 to 179
func (r *redisCache) decodeMetadata(b []byte) (*ContentMetadata, int) {
cLength := uint64(b[7]) | (uint64(b[6]) << 8) | (uint64(b[5]) << 16) | (uint64(b[4]) << 24) | uint64(b[3])<<32 | (uint64(b[2]) << 40) | (uint64(b[1]) << 48) | (uint64(b[0]) << 56)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss error handling. What if malicious actor tampers payloads in redis? It would make chproxy panic. We could handle that more securely by ignoring cached data. WDYT?

}

func (e *RedisCacheCorruptionError) Error() string {
return "chproxy can't decode the cached result from redis, it seems to have been corrupted"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it'd be worth to add an info about the key in order to ease analysing the payload

@@ -126,17 +133,37 @@ func (r *redisCache) Get(key *Key) (*CachedData, error) {
}
// since the cached results in redis are too big, we can't fetch all of them because of the memory overhead.
// We will create an io.reader that will fetch redis bulk by bulk to reduce the memory usage.
redisStreamreader := newRedisStreamReader(uint64(offset), r.client, stringKey, metadata.Length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could instantiate after the check if ttl is small?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see what it'll chance since we need the redisStreamreader whether TTL is small (it will be used to write into the tmp file) or long (it will be used to write the http response)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, I misread it. Forget about the comment :)

Copy link
Contributor

@gontarzpawel gontarzpawel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mga-chka , great enhancement 💪

@mga-chka mga-chka merged commit 3f3288a into master Aug 29, 2022
@mga-chka mga-chka deleted the fix-memory-issue branch August 29, 2022 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

2 participants