Back pressure incoming responses #204

hannahhoward · 2021-08-19T05:52:13Z

Goals

Prevent out of memory errors due to slow processing of requests or missing block traversals

Implementation

Add a new memory allocator that back pressures reading from incoming streams on verification of blocks
When a message comes in, process it, but prevent processing the next message until the blocks are verified from the unverified blockstore, if the total in memory data for the unverified blockstore goes above a certain amount.
need to think about whether this is the best solution. It does have advantages in that it limits on a per peer rather than per-unverified blockstore (i.e. request) basis.

Block reading incoming responses to avoid memory pressure buildup

masih · 2021-08-19T11:32:46Z

requestmanager/asyncloader/asyncloader.go

 		// load from response cache
 		data, err := responseCache.AttemptLoad(requestID, link)
+		if data != nil {


I am curious if we intentionally checking this before checking err? If so, would it make sense to add a comment here to document the rationale?

Suggested change

if data != nil {

if err != nil {

return types.AsyncLoadResult{Err: err}

}

if data != nil {

requestmanager/asyncloader/asyncloader.go

gammazero

Mostly minor stuff, and my comments concerning context [ab]use are probably out of scope for this PR.

gammazero · 2021-08-19T19:29:14Z

requestmanager/asyncloader/asyncloader.go

 	blks []blocks.Block) {
+	totalMemoryAllocated := uint64(0)
+	for _, blk := range blks {
+		totalMemoryAllocated += uint64(len(blk.RawData()))


nit: Avoid copying each block into blk by using index instead:

for i := range blks { totalMemoryAllocated += uint64(len(blks[i].RawData()))

just curious -- why?

gammazero · 2021-08-19T19:48:09Z

requestmanager/asyncloader/asyncloader.go

-func New(ctx context.Context, loader ipld.Loader, storer ipld.Storer) *AsyncLoader {
-	responseCache, loadAttemptQueue := setupAttemptQueue(loader, storer)
+func New(ctx context.Context, loader ipld.Loader, storer ipld.Storer, allocator Allocator) *AsyncLoader {
+	responseCache, loadAttemptQueue := setupAttemptQueue(loader, storer, allocator)
 	ctx, cancel := context.WithCancel(ctx)
 	return &AsyncLoader{
 		ctx:              ctx,


Can we stop abusing contexts. Maybe a follow-up PR?

this is a much larger issue. See comment below.

gammazero · 2021-08-19T19:48:25Z

requestmanager/asyncloader/asyncloader.go

+	}
+	select {
+	case <-al.allocator.AllocateBlockMemory(p, totalMemoryAllocated):
+	case <-al.ctx.Done():


Probably out of scope for this review, but ctx should be passed into function as argument, not kept inside the data structure. If it is necessary to wait on something to detect shutdown, then use a channel. Also, is there a real concern that AllocateBlockMemory will not return and that we need to stop waiting for it?

there's a difference here: ctx is the overall shutdown for the whole module. It's passed down from the ctx used to construct graphsync. Essentially we're using context to manage shutting down the whole module when we're done.

as for the larger concern re: abusing contexts, the coding style of graphsync is adopted from go-bitswap, which is generally follows patterns from go-ipfs which was developed pretty early on in golang's development. I think at the time it was "use contexts for manging shutdowns".

I totally do not love this way of managing shutdowns. In fact, I would love to do something much more structured like a supervision tree. I've brought this up several times during my several years at PL but it never seems to get enough traction to do anything.

Graphsync does distinguish frequently between module context (the one here) vs request context (used elsewhere, usually passed into the function)

I'd love to have this as a larger discussion but it's definitely outside the scope of this PR

gammazero · 2021-08-19T19:57:17Z

requestmanager/asyncloader/asyncloader.go

 		// load from response cache
 		data, err := responseCache.AttemptLoad(requestID, link)
+		if data != nil {


Suggested change

if data != nil {

if err != nil {

return types.AsyncLoadResult{Err: err}

}

if data != nil {

gammazero · 2021-08-19T20:11:12Z

requestmanager/testloader/asyncloader.go

-		responses:        make(chan map[graphsync.RequestID]metadata.Metadata, 1),
-		blks:             make(chan []blocks.Block, 1),
+		responses:        make(chan map[graphsync.RequestID]metadata.Metadata, 10),
+		blks:             make(chan []blocks.Block, 10),


Why buffer of 10? Is there a need for a specific number of slots?

oh this is just some test hackiness to avoid a channel block.

basically, we need to be able to absorb a bunch of responses without blocking now

gammazero · 2021-08-19T20:15:37Z

requestmanager/requestmanager.go

 	rm.processTerminations(filteredResponses)
+	select {
+	case <-rm.ctx.Done():
+	case prm.response <- nil:


If this channel is guaranteed to be buffered, and only one response is written to it, then is there any chance that it would block here? If no chance to block, then no need to wait for rm.ctx.Done(). Also, consider closing the channel instead if there will not be any other writers, as that will also not require handling blocking.

It's just a habit. I always assume a channel could block. I guess defense programming

gammazero · 2021-08-19T21:19:00Z

requestmanager/requestmanager.go

@@ -274,16 +274,15 @@ type processResponseMessage struct {
 	p         peer.ID
 	responses []gsmsg.GraphSyncResponse
 	blks      []blocks.Block
+	response  chan error


I think this can be constrained to write-only: response chan<- error

yea, probably. But the others aren't at the moment so I'd rather main the consistency for now

gammazero · 2021-08-19T21:21:29Z

requestmanager/asyncloader/asyncloader_test.go

@@ -370,7 +384,8 @@ func withLoader(st *store, exec func(ctx context.Context, asyncLoader *AsyncLoad
 	ctx := context.Background()
 	ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
 	defer cancel()
-	asyncLoader := New(ctx, st.loader, st.storer)
+	allocator := allocator.NewAllocator(256*(1<<20), 16*(1<<20))


Even though this is test code, it would be nice to define these with const values

hannahhoward · 2021-08-21T04:35:21Z

@gammazero I'm recording most of your comments as useful suggestions worth of deeper consideration, cause they actually should be changed across the codebase. I'm not absorbing them now as this fix is tied to a fairly immediate need to ship a release branch in Lotus.

It is nice to get other folks looking at this code base and giving me this kind of input.

hannahhoward · 2021-08-21T04:56:06Z

I've added a bunch of testground improvements to this PR -- these were done to verify experimentally that this actually resolves the memory backpressure problem.

These are some mid test memory dumps running the memory-stress.toml composition which transfers 10 1GB files at the same time.

Here is the before:

Here is the after:

You'll see that there's a large 1GB backup in the before and not in the after. This is the exact same backup witnessed in filecoin-project/lotus#7077

…e graphsync message (#204) * feat: use different extension names to fit multiple payloads in the same message * remove logs in test * add comments, update tests and loop over extension names * add default extension name for each hook * add comment * simplify extension names loop * trigger OnResponseReceived for multiple extensions * use processExtension + use var instead of prop for ext names Co-authored-by: Hannah Howard <hannah@hannahhoward.net>

feat(asyncloader): memory pressure incoming responses

9171ce6

Block reading incoming responses to avoid memory pressure buildup

hannahhoward mentioned this pull request Aug 19, 2021

high memory utilization in go-graphsync filecoin-project/lotus#7077

Closed

6 tasks

aarshkshah1992 mentioned this pull request Aug 19, 2021

[DO NOT MERGE] Debug GS OOM issue filecoin-project/lotus#7118

Closed

masih reviewed Aug 19, 2021

View reviewed changes

gammazero reviewed Aug 19, 2021

View reviewed changes

hannahhoward added 3 commits August 20, 2021 18:29

feat(testplans): add car store support for faster testing

59d011c

fix

b334f53

fix(testplan): fix file buffer ds setup

f830cc5

fix(asyncloader): cleanup load logic for async loader

13e0ad8

hannahhoward marked this pull request as ready for review August 21, 2021 04:49

aarshkshah1992 merged commit 2532859 into release/0.6.x Aug 23, 2021

mvdan deleted the feat/memory_pressure_incoming_messages branch December 15, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back pressure incoming responses #204

Back pressure incoming responses #204

hannahhoward commented Aug 19, 2021

masih Aug 19, 2021

gammazero Aug 19, 2021

gammazero left a comment

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

hannahhoward Aug 21, 2021

gammazero Aug 19, 2021

hannahhoward commented Aug 21, 2021 •

edited

Loading

hannahhoward commented Aug 21, 2021

Back pressure incoming responses #204

Back pressure incoming responses #204

Conversation

hannahhoward commented Aug 19, 2021

Goals

Implementation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hannahhoward commented Aug 21, 2021 • edited Loading

hannahhoward commented Aug 21, 2021

hannahhoward commented Aug 21, 2021 •

edited

Loading