-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erts: Refactor bitstring (binary) handling #7828
Conversation
CT Test Results 4 files 143 suites 47m 49s ⏱️ Results for commit e376711. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
d821128
to
6f923c1
Compare
17632e9
to
6478b00
Compare
6478b00
to
55affda
Compare
5a8657f
to
edcd6d3
Compare
edcd6d3
to
9b0b1ee
Compare
9e3f6d5
to
01f1d0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks very good to me. I will continue to look at the code little by little every day. I will not post any more nitpicky comments; instead I will push corrections in fixup commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice change that makes handling of binaries a lot simpler. I've looked through most of the code and couldn't find any obvious mistakes/design issues.
Is it only the ErlBinMatchBuffer
you plan to change to ErlSubBits
? The ErlBinMatchState
will be left as is?
No, |
31587be
to
166dc67
Compare
The reason I asked was because this comment above /** @brief This structure represents a binary to be matched, we plan to replace
* this with ErlSubBits in the near future. */
typedef struct erl_bin_match_buffer { Anyway, wont you need to store the |
Yes, offsets are stored as regular integer terms since OTP 22. There's a slight optimization for 32-bit that still uses a single slot, but it's hardly worth the bother so I'm going to remove it. |
166dc67
to
7b714b8
Compare
af8e9ed
to
6c455fe
Compare
2890f9e
to
38c203e
Compare
e376711
to
e37d0f2
Compare
This assertion crashes during the property test suites, which expect an exception to be raised.
Debugging differences between the calculated and actual size is no fun at all when certain kinds of data is overestimated, as a simple `ASSERT(after_encode <= &before_encode[size])` check will often succeed when it shouldn't (e.g. if part of one object is overestimated while another part is underestimated). This commit fixes the differences that I've noticed, and adds an assertion that the encoded size should be exactly equal to the calculated size so that no new differences can fly under the radar from now on.
By reducing the difference between match states and sub-binaries, this commit sets the stage for massive improvements in the bit syntax implementation, where we plan to allow returning matched tails from functions without any loss of performance relative to continuation-passing-style. This commit also simplifies the handling of off-heap Binary objects. ProcBin (now called BinRef) is no longer exposed directly as a term, with off-heap bitstrings instead being represented by an ErlSubBits that references the BinRef. While this results in slightly more on-heap usage, it reduces complexity and makes it easy to determine which regions in a binary a process refers to during a GC, giving us the opportunity to shed references or shrink them to fit.
e37d0f2
to
24ef4cb
Compare
By reducing the difference between match states and sub-binaries, this PR sets the stage for massive improvements in the bit syntax implementation, where we plan to allow returning matched tails from functions without any loss of performance relative to continuation-passing-style.
This PR also simplifies the handling of off-heap
Binary
objects.ProcBin
(now calledBinRef
) is no longer exposed directly as a term, with off-heap bitstrings instead being represented by anErlSubBits
that references theBinRef
. While this results in slightly more on-heap usage, it reduces complexity and makes it easy to determine which regions in a binary a process refers to during a GC, giving us the opportunity to shed references or shrink them to fit.Needless to say this is a massive diff, the bulk of it in a single commit that proved very difficult to break into small components that still made sense. External review would be much appreciated. :-)