Skip to content

Conversation

@summaryzb
Copy link
Contributor

@summaryzb summaryzb commented Jul 22, 2022

What changes were proposed in this pull request?

the behavior of buffer management and memory allocate
1.when adding a huge record to a partition where the buffer is not initialized, we only allocate bufferSegmentSize memory, however release the real huge size memory after the buffer is sent
2.byte array will be fully used as possible to save memory

Why are the changes needed?

fix memory leak & improve buffer management

Does this PR introduce any user-facing change?

no

How was this patch tested?

pass testing

} else {
requestMemory(bufferSegmentSize);
WriterBuffer wb = new WriterBuffer(bufferSegmentSize);
wb.addRecord(serializedData, serializedDataLength);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For memory leak, how about:

int bufferSize = bufferSegmentSize
if (serializedDataLength > bufferSize) {
  bufferSize = serializedDataLength
}
requestMemory(bufferSize);
WriterBuffer wb = new WriterBuffer(bufferSize);
..........
..........

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's almost the same as my implementation, however if we use serializedDataLength as the construct parameter of WriterBuffer, the following record added to this partition will compare length with the first record serializedDataLength, that's not a elegant way

@colinmjj
Copy link

colinmjj commented Jul 25, 2022

@summaryzb can you describe more detail about how to save memory for buffer management?
And I prefer to split this PR into 2 PRs for 'Bug fix' and 'Improvement'

@jerqi
Copy link
Contributor

jerqi commented Jul 25, 2022

If this is a bug fix, Could we add some cases ?

@codecov-commenter
Copy link

Codecov Report

Merging #67 (93c5ea1) into master (174e6bb) will decrease coverage by 0.94%.
The diff coverage is n/a.

@@             Coverage Diff              @@
##             master      #67      +/-   ##
============================================
- Coverage     56.41%   55.47%   -0.95%     
+ Complexity     1173     1098      -75     
============================================
  Files           149      140       -9     
  Lines          7968     7593     -375     
  Branches        761      731      -30     
============================================
- Hits           4495     4212     -283     
+ Misses         3231     3147      -84     
+ Partials        242      234       -8     
Impacted Files Coverage Δ
...storage/handler/impl/DataSkippableReadHandler.java 81.25% <0.00%> (-3.13%) ⬇️
...org/apache/uniffle/server/ShuffleFlushManager.java 76.70% <0.00%> (-1.71%) ⬇️
.../apache/uniffle/coordinator/ClientConfManager.java 91.54% <0.00%> (-1.41%) ⬇️
...pache/spark/shuffle/writer/WriteBufferManager.java
.../org/apache/spark/shuffle/writer/WriterBuffer.java
...org/apache/spark/shuffle/RssSparkShuffleUtils.java
...ava/org/apache/spark/shuffle/RssShuffleHandle.java
...e/spark/shuffle/reader/RssShuffleDataIterator.java
...che/spark/shuffle/writer/BufferManagerOptions.java
...org/apache/spark/shuffle/writer/AddBlockEvent.java
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@jerqi
Copy link
Contributor

jerqi commented Aug 5, 2022

@summaryzb Do you want to continue pr?

@summaryzb
Copy link
Contributor Author

@summaryzb can you describe more detail about how to save memory for buffer management? And I prefer to split this PR into 2 PRs for 'Bug fix' and 'Improvement'

previously, we add WrappedBuffer to buffers in WriterBuffer class, but the byte[] in WrappedBuffer is not fully used.
for example, the length of every record is 2k, every time we add a record, we create a buffer with 3k length by default and wrap the previous buffer as WrappedBuffer which resulting in wasting 1k memory in every WrappedBuffer.

@summaryzb
Copy link
Contributor Author

@summaryzb Do you want to continue pr?

yeah I'll move on this pr, sorry for reply not in time

@summaryzb
Copy link
Contributor Author

If this is a bug fix, Could we add some cases ?

follow this suggesting

@summaryzb
Copy link
Contributor Author

@summaryzb can you describe more detail about how to save memory for buffer management? And I prefer to split this PR into 2 PRs for 'Bug fix' and 'Improvement'

follow this suggesting, 'Bug fix' and 'Improvement' is on the way

@summaryzb
Copy link
Contributor Author

fix memory leak:#145

@summaryzb
Copy link
Contributor Author

fix memory leak:#145

I'll post the improvement pr after this pr merged

@summaryzb
Copy link
Contributor Author

Improvement of writer buffer to save memory:#157

@summaryzb summaryzb closed this Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants