-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37593][CORE] Reduce default page size by LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
#34846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kiszk @cloud-fan @maropu @JoshRosen Can you please help review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid that this line may throw an exception in the future due to specification changes.
How about wrapping this by try ... catch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we always pick this as the page size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we lose a few bytes in the allocation, and maybe that makes some nice power-of-two data structure not fit, but, I wonder if that's pretty rare and if we can just go with this always indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean even if it is not homongous allocation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think the current logic is good, just wondering if it matter much if we do it in all cases. Maybe it's bad for small allocations, but is the offset ever significant relative to the allocation size I wonder? probably not. I wonder if there are future cases, different GCs, that we're not checking here that also need this treatment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revisit here, it's really no need to restrict to only homongous allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a flag for this? when would I not want it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's indeed no need. removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a val? I don't think it will change
Likewise could you just try to get G1HeapRegionSize once here? if it's not G1GC, just store "None". Then you don't check it each time. Because I think this can't change during the JVM's lifetime.
Then you don't need a utility method below, even
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Indentation. We need two more spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did't get this, do you mean Line 261?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the function description with this new addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WangGuangxin . Do you think you can add one specific example about the case to the PR description?
Hi @dongjoon-hyun , We can demo it using following piece of code. Run it using following jvm params with optimized = false with optimzied = true |
|
@WangGuangxin . Thank you. Please include #34846 (comment) into the PR description. It will be a permanent commit log. |
|
Also, cc @sunchao , @viirya , @huaxingao , too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious whether this adjustment can be done in HeapMemoryAllocator.allocate instead? since it's closer to the actual logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to put it in HeapMemoryAllocator.allocate because it may break the semantics. When we do HeapMemoryAllocator.allocate(size) we expected to get memory with specified size or throw oom, but we internally change it to another value (size-Platform.LONG_ARRAY_OFFSET), which may crash the caller's code or brings confusion
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked for how to find out what the garbage collector type is I bumped into this several times:
HotSpotDiagnosticMXBean diagnostic = ManagementFactoryHelper.getDiagnosticMXBean();
VMOption option = diagnostic.getVMOption("UseG1GC");
if (option.getValue().equals("false")) {
...
}For example at the OpenJDK tests.
Is there any reason why a different solution have been chosen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, seems your solution is more elegant. I'll update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get this right in case of G1GC the best would be if we choose a pageSize where the following holds:
G1HeapRegionSize % (pageSize + Platform.LONG_ARRAY_OFFSET) == 0
And when When BUFFER_PAGESIZE is not set we are free to choose it as:
pageSize = G1HeapRegionSize - Platform.LONG_ARRAY_OFFSET;
And with the above code we just try to calculate G1HeapRegionSize with our own way. But what about accessing this value?
Like the same way used in the OpenJDK tests:
HotSpotDiagnosticMXBean diagnostic = ManagementFactoryHelper.getDiagnosticMXBean();
option = diagnostic.getVMOption("G1HeapRegionSize");As I see the OpenJDK test was executed like:
run main/othervm -Xmx64m TestG1HeapRegionSize 1048576
So diagnostic.getVMOption("G1HeapRegionSize") gives back the calculated region size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's better to make sure G1HeapRegionSize % (pageSize + Platform.LONG_ARRAY_OFFSET) == 0.
But when BUFFER_PAGESIZE is not set, I'm not quite sure if it's reasonable to choose it as pageSize = G1HeapRegionSize - Platform.LONG_ARRAY_OFFSET, which seems a bit different with current logic to get default page size.
By following the current logic to calculte default page size and then minus Platform.LONG_ARRAY_OFFSET can also make sure G1HeapRegionSize % (pageSize + Platform.LONG_ARRAY_OFFSET) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when BUFFER_PAGESIZE is not set, I'm not quite sure if it's reasonable to choose it as pageSize = >G1HeapRegionSize - Platform.LONG_ARRAY_OFFSET, which seems a bit different with current logic to get default >page size.
How the current logic for the default handles a case where a custom -XX:G1HeapRegionSize is given as extra java options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, you are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@attilapiros Since -XX:G1HeapRegionSize can only be set to N-th power of 2, and the default size calculated here is also N-th power of 2, so it can be guanranteed that G1HeapRegionSize % (pageSize + Platform.LONG_ARRAY_OFFSET) == 0 or (pageSize + Platform.LONG_ARRAY_OFFSET) % G1HeapRegionSize == 0, right?
Such a change has little effect on the current logic.
|
ok to test |
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me but I am interested in the others opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lazy val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would lazy do anything here? it's a local inside what's evaluated for a val. I could see declaring it lazily outside this block instead. Then even being lazy doesn't matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would do:
scala> lazy val foo = {
| println("Initialized")
| 1
| }
foo: Int = <lazy>
scala> "test"
res0: String = test
scala> None.getOrElse(foo)
Initialized
res1: Int = 1
But have it outside even better!
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146236 has finished for PR 34846 at commit
|
|
Test build #146277 has finished for PR 34846 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146279 has finished for PR 34846 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G1HeapRegionSize -> maybeG1HeapRegionSize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @WangGuangxin .
You are suggesting G1GC issue which is enabled by default in Java 11.
Please double check if your code is running with Java 11 and Java 17.
According to the latest change, it seems that you are working with Java 8 only.
cc @attilapiros too about the above Java8 test case example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WangGuangxin This is not Java11 and Java17 API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great findings! We need to test this on other JVM versions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Thanks for your findings. I've updated and checked against JDK8, 11, and 17. also cc @attilapiros
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
0ff2bd9 to
4362758
Compare
|
@srowen @dongjoon-hyun @attilapiros
Because both the choosed page size by Spark and the G1GC region size if the power of 2, so choosedPageSize and heapRegionSize must be multiple relations. There are three cases:
Compared to existing defaultPageSize, the diff is only |
|
Thank you for updates, @WangGuangxin . |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally I think this looks OK
| val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor) | ||
| val default = math.min(maxPageSize, math.max(minPageSize, size)) | ||
| conf.get(BUFFER_PAGESIZE).getOrElse(default) | ||
| val choosedPageSize = math.min(maxPageSize, math.max(minPageSize, size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Total nit, but choosed -> chosen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| /** | ||
| * Return whether we are using G1GC or not | ||
| */ | ||
| val isG1GC: Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK here, though if it's only used here, maybe leave it in MemoryManager until it has reason to be shared? Utils is quite big already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, updated
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @WangGuangxin and all.
|
cc @viirya , @sunchao , @cloud-fan , @kiszk , @attilapiros, @HyukjinKwon again |
core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
Outdated
Show resolved
Hide resolved
|
cc @rednaxelafx too FYI |
|
Looks making sense to me. |
rednaxelafx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this behavior is guarded on G1 GC in use, I'm okay with it. LGTM.
A few side comments:
- The same behavior affects the Shenandoah GC as well since it reuses the G1 region-based heap structure. Not to complicate this PR, but perhaps consider a follow-up change to also use the same logic for Shenandoah GC (which is also available in OpenJDK)
- If we're using a very big G1 heap region size, the current logic seem to be a bit too conservative, i.e. if the chosen page size is less than 1/2 of G1 region size, then we don't necessarily have to subtract the array header size
- The current logic doesn't take object size alignment into account. The default is 8-byte aligned but if a different value is chosen, the size to subtract might not have to be the array header size... just another detail, this PR doesn't have to deal with it.
|
I have a question to those who are more familiar with Spark internals though: are there any places inside of Spark that implicitly depends on the page size being a power-of-two? Do we run any risk that such assumption is not checked at runtime, which can lead to out-of-bounds access in the boundary cases? |
As far as I know there is no such constraint. And Spark allows users set custom page size by conf |
LONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
LONG_ARRAY_OFFSET if G1GC and ON_HEAP are usedLONG_ARRAY_OFFSET if G1GC and ON_HEAP are used
|
Thank you, @WangGuangxin and all! As the above comments, I agree that there are more combinations like new GC and different alignments. However, this PR is very narrow-downed to G1GC and OnHeap and the adjusted amount is a little. I believe we can merge this and move forward on top of this. I revise the PR title according to the PR content because we have more chances of optimization. Merged to master for Apache Spark 3.3.0. |
…f `G1GC` and `ON_HEAP` are used (apache#1372) ### What changes were proposed in this pull request? Spark's tungsten memory model usually tries to allocate memory by one `page` each time and allocated by `long[pageSizeBytes/8]` in `HeapMemoryAllocator.allocate`. Remember that java long array needs extra object header (usually 16 bytes in 64bit system), so the really bytes allocated is `pageSize+16`. Assume that the `G1HeapRegionSize` is 4M and `pageSizeBytes` is 4M as well. Since every time we need to allocate 4M+16byte memory, so two regions are used with one region only occupies 16byte. Then there are about **50%** memory waste. It can happenes under different combinations of G1HeapRegionSize (varies from 1M to 32M) and pageSizeBytes (varies from 1M to 64M). We can demo it using following piece of code. ``` public static void bufferSizeTest(boolean optimize) { long totalAllocatedSize = 0L; int blockSize = 1024 * 1024 * 4; // 4m if (optimize) { blockSize -= 16; } List<long[]> buffers = new ArrayList<>(); while (true) { long[] arr = new long[blockSize/8]; buffers.add(arr); totalAllocatedSize += blockSize; System.out.println("Total allocated size: " + totalAllocatedSize); } } ``` Run it using following jvm params ``` java -Xmx100m -XX:+UseG1GC -XX:G1HeapRegionSize=4m -XX:-UseGCOverheadLimit -verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xss4m -XX:+ExitOnOutOfMemoryError -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 ``` with optimized = false ``` Total allocated size: 46137344 [GC pause (G1 Humongous Allocation) (young) 44M->44M(100M), 0.0007091 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark)-- 48M->48M(100M), 0.0021528 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000021 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 48M->48M(100M), 0.0011289 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0017284 secs] [Full GC (Allocation Failure) 48M->48M(100M), 0.0013437 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` with optimzied = true ``` Total allocated size: 96468624 [GC pause (G1 Humongous Allocation) (young)-- 92M->92M(100M), 0.0024416 secs] [Full GC (Allocation Failure) 92M->92M(100M), 0.0019883 secs] [GC pause (G1 Evacuation Pause) (young) (initial-mark) 96M->96M(100M), 0.0004282 secs] [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0000040 secs] [GC concurrent-mark-start] [GC pause (G1 Evacuation Pause) (young) 96M->96M(100M), 0.0003269 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012409 secs] [Full GC (Allocation Failure) 96M->96M(100M), 0.0012607 secs] Terminating due to java.lang.OutOfMemoryError: Java heap space ``` This PR try to optimize the pageSize to avoid memory waste. This case exists not only in `MemoryManagement`, but also in other places such as `TorrentBroadcast.blockSize`. I would like to submit a followup PR if this modification is reasonable. ### Why are the changes needed? To avoid memory waste in G1 GC ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UT Closes apache#34846 from WangGuangxin/g1_humongous_optimize. Authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit e81333c) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 92fd5bb) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com>
What changes were proposed in this pull request?
Spark's tungsten memory model usually tries to allocate memory by one
pageeach time and allocated bylong[pageSizeBytes/8]inHeapMemoryAllocator.allocate.Remember that java long array needs extra object header (usually 16 bytes in 64bit system), so the really bytes allocated is
pageSize+16.Assume that the
G1HeapRegionSizeis 4M andpageSizeBytesis 4M as well. Since every time we need to allocate 4M+16byte memory, so two regions are used with one region only occupies 16byte. Then there are about 50% memory waste.It can happenes under different combinations of G1HeapRegionSize (varies from 1M to 32M) and pageSizeBytes (varies from 1M to 64M).
We can demo it using following piece of code.
Run it using following jvm params
with optimized = false
with optimzied = true
This PR try to optimize the pageSize to avoid memory waste.
This case exists not only in
MemoryManagement, but also in other places such asTorrentBroadcast.blockSize. I would like to submit a followup PR if this modification is reasonable.Why are the changes needed?
To avoid memory waste in G1 GC
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing UT