Skip to content

Conversation

@masaori335
Copy link
Contributor

Arena is used for temporally buffer in some functions of HAPCK, but they are destroyed immediately. This is making some overhead of memory allocation. The arena should be reused.

@masaori335 masaori335 added this to the 10.0.0 milestone Mar 10, 2020
@masaori335 masaori335 self-assigned this Mar 10, 2020
@masaori335 masaori335 linked an issue Mar 10, 2020 that may be closed by this pull request
12 tasks
@masaori335 masaori335 force-pushed the h2-perf-hpack-arena branch from 687b2eb to 93b9d1d Compare March 10, 2020 06:05
@maskit
Copy link
Member

maskit commented Mar 10, 2020

I'm worried about a size of the internal buffer and a length of the block chain in Arena. Using an arena for long time might have a similar issue with HdrHeap.

@SolidWallOfCode SolidWallOfCode changed the title Perf: Fix mis-usage of Arena in HAPCK Perf: Fix mis-usage of Arena in HPACK Mar 10, 2020
@SolidWallOfCode
Copy link
Member

What kind of memory behavior is required? How long do the encoded strings need to persist?

@masaori335
Copy link
Contributor Author

All usage of Arena in the HPACK is allocating temporal strings. When the required buffer is small, we can use a buffer on the stack instead of Arena. ( it's another approach. ) The stored strings are freed at the end of function at longest.

IIUC, unlike HdrHeap, allocated memory for Arena is re-used. The problem of HdrHeap is when the string in the HdrHeep is freed, the actual buffer was not freed. Similar things can happen to Arena?

@maskit
Copy link
Member

maskit commented Mar 11, 2020

I didn't read the code closely so I may be wrong. The space seems like to be freed, but blocks are not. Once the chain gets long it never gets short, IIUC.

@zwoop zwoop added the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Mar 11, 2020
@masaori335
Copy link
Contributor Author

The block will be reused if possible, so the worst case is requested string length keeps growing.
I checked this scenario with a small program like below.

void
test_arena(Arena &arena, int max)
{
  for (int i = 1; i < max; ++i) {
    char *c = arena.str_alloc(i);
    arena.str_free(c);
  }
}

When requested string length is 1byte to 128KB, the total allocated memory is 568KB (in 13 blocks). (128KB is default value of proxy.config.http2.max_header_list_size)
https://gist.github.com/masaori335/3001bb477b29170eb21545d58d734e8e

@masaori335
Copy link
Contributor Author

@SolidWallOfCode Arena is deprecated in favor of MemArena?

@masaori335 masaori335 mentioned this pull request Mar 13, 2020
12 tasks
@SolidWallOfCode
Copy link
Member

Well, not the MemArena in TS. I have a much better version elsewhere, but haven't had the support to bring it back to TS. For Arena, the memory as you noted is freed only if it's the most recent allocated block. The newer MemArena provides direct access to unallocated for use as a temporary. This does mean you need to be careful about only using it for a single string at a time, but it is guaranteed to not accumulate.

@masaori335
Copy link
Contributor Author

The Temporary Allocation will cover this case well. No accumulation seems good. Let's use it when it backported.

For now, let's land this change. Or if we really worry about the total size of Arena, we can go another approach like switching alloca & malloc by required buffer size.

if (len == -1) {
if (use_huffman && value_len) {
arena.str_free(data);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know we have this. I was thinking to introduce "defer" like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dr. Zret picked the name PostScript.

}

return p - buf_start;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to using an arena here would be something like:

template <std::size_t EstSizeBound = 1024>
class LocalBuffer
{
public:
  LocalBuffer(std::size_t size) : _ptr(size > EstSizeBound ? new char[size] : _buf) {}

   char * get() const { return _ptr; }

  ~LocalBuffer()
  {
    if (_ptr != buf) {
      delete [] _ptr;
    }
  }

private:
  char _buf[EstSizeBound];
  char * const _ptr;
};

(Where 1024 is the default arena block size.). I'm guessing it would have better performance in the most likely scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks smarter than what I told as alternative approach (switching alloca & malloc ).
I agree with performance is better in most cases. I'll try this way.

This was referenced Mar 18, 2020
@masaori335
Copy link
Contributor Author

masaori335 commented Mar 19, 2020

#6536 looks much better on performance and no concern about memory accumulation.

@masaori335 masaori335 closed this Mar 19, 2020
@masaori335 masaori335 removed this from the 10.0.0 milestone Mar 19, 2020
@zwoop zwoop removed the OnDocs This is for PR currently running, or will run, on the Docs ATS server label Jun 9, 2020
@masaori335 masaori335 removed a link to an issue Jul 2, 2020
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants