We found shuffle server which under high load is easy encounter java.lang.OutOfMemoryError: Java heap space even we allocate more jvm heap memory and less rss.server.buffer.capacity
The steps for the exception above:
- When shuffle server under high load,
requireBufferId is easy to expire, and Shuffle server release usedMemory
- Client
sendShuffleData using a expired requireBufferId,
- Shuffle server recive shuffle data and store in rpc queue(this part of memory usage was not be added to
usedMemory)
- Other clients
requireBuffer success because usedMemory is enough