-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the number of memory allocations in def-use #4904
Conversation
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
locations are immutable otherwise Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
…field Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Overall, the pass manager is pretty hostile to GC as it forces roots to be held for very long time. Here is why:
|
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
@@ -0,0 +1,273 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can push #4713 forward if flat_map is required for this PR. What about other, currently available implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I used flat_map
with inlined storage for WithFieldsLocation
(https://github.com/p4lang/p4c/pull/4904/files#diff-f9cda7b4f8eff14d76f4290655d1678d9a25e18f55c1b24cded8239cab74eea6R124) Here we were able to have space for 4 map entries inline (w/o additional memory allocation) while keeping sizeof(WithFieldsLocation) == 112)
. WithFieldsLocation
are used essentially for structs and usually structs do not contain lots of entries (headers might contain much more, true, but still they are tens not thousands). I do not see any performance implication from this change, but we reduced memory allocations here for quite some amount.
I am not aware about viable alternatives currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall
friend class StorageFactory; | ||
WithFieldsLocation(const IR::Type *type, cstring name) : StorageLocation(type, name) {} | ||
flat_map<cstring, const StorageLocation *, std::less<>, | ||
absl::InlinedVector<std::pair<cstring, const StorageLocation *>, 4>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very surprised that a flat_map is faster than an hvec_map here -- for that to be true, almost all the structs/headers in the program have no more than 2-3 fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about "fast". It is about memory consumption. Here we do not allocate memory at all for any structs with 4 or less fields. And we resort to memory allocation for anything more. But yes, normally small flat maps have comparable speed to hash maps and, as far as I can see, most structs are small. There are certainly large ones but they are much rare. And we normally do not have struct larger than, say, 16 fields.
As stated in #4872 we are seeing high peak memory usage during def-use run. def-use creates lots of temporary objects and expects GC to clean them. However, it seems this does not happen reliably and therefore the peak memory usage could be very high. Especially given that def-use internal state objects (
AllDefinitions
) could be quite large. This PR tries to improve this situation in many aspects:As a result:
AllDefinitions
being cleared)gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512
testcase:gtestp4c-gc-main --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-gc --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-nogc-main --gtest_filter=P4CParserUnroll.switch_20160512
gtestp4c-nogc --gtest_filter=P4CParserUnroll.switch_20160512
But notice that GC takes 1.5 seconds out of overall 4.5 seconds runtime (!)
Before:
After:
Notice that # of allocations reduced from 27.6M down to 25.7M, the peak memory consumption also reduced from 3.22 GB down to 3.15 Gb