Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erts: Refactor bitstring (binary) handling #7828

Merged
merged 3 commits into from
Nov 23, 2023

Conversation

jhogberg
Copy link
Contributor

@jhogberg jhogberg commented Nov 6, 2023

By reducing the difference between match states and sub-binaries, this PR sets the stage for massive improvements in the bit syntax implementation, where we plan to allow returning matched tails from functions without any loss of performance relative to continuation-passing-style.

This PR also simplifies the handling of off-heap Binary objects. ProcBin (now called BinRef) is no longer exposed directly as a term, with off-heap bitstrings instead being represented by an ErlSubBits that references the BinRef. While this results in slightly more on-heap usage, it reduces complexity and makes it easy to determine which regions in a binary a process refers to during a GC, giving us the opportunity to shed references or shrink them to fit.

Needless to say this is a massive diff, the bulk of it in a single commit that proved very difficult to break into small components that still made sense. External review would be much appreciated. :-)

@jhogberg jhogberg added team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI review_help_wanted Help with review wanted labels Nov 6, 2023
@jhogberg jhogberg self-assigned this Nov 6, 2023
Copy link
Contributor

github-actions bot commented Nov 6, 2023

CT Test Results

       4 files     143 suites   47m 49s ⏱️
1 660 tests 1 600 ✔️   60 💤 0
2 449 runs  2 243 ✔️ 206 💤 0

Results for commit e376711.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from d821128 to 6f923c1 Compare November 6, 2023 12:35
@jhogberg jhogberg removed the testing currently being tested, tag is used by OTP internal CI label Nov 7, 2023
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from 17632e9 to 6478b00 Compare November 7, 2023 10:05
@jhogberg jhogberg added the testing currently being tested, tag is used by OTP internal CI label Nov 7, 2023
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from 6478b00 to 55affda Compare November 7, 2023 11:13
@garazdawi garazdawi force-pushed the john/erts/bitsized-binaries branch 3 times, most recently from 5a8657f to edcd6d3 Compare November 9, 2023 08:49
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from edcd6d3 to 9b0b1ee Compare November 10, 2023 10:52
@jhogberg jhogberg removed the testing currently being tested, tag is used by OTP internal CI label Nov 13, 2023
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from 9e3f6d5 to 01f1d0f Compare November 13, 2023 17:46
erts/emulator/beam/bif.c Outdated Show resolved Hide resolved
erts/emulator/beam/bif.c Outdated Show resolved Hide resolved
Copy link
Contributor

@bjorng bjorng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks very good to me. I will continue to look at the code little by little every day. I will not post any more nitpicky comments; instead I will push corrections in fixup commits.

erts/emulator/beam/erl_gc.h Show resolved Hide resolved
erts/emulator/beam/erl_nif.c Outdated Show resolved Hide resolved
erts/emulator/beam/erl_gc.c Outdated Show resolved Hide resolved
Copy link
Contributor

@garazdawi garazdawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice change that makes handling of binaries a lot simpler. I've looked through most of the code and couldn't find any obvious mistakes/design issues.

Is it only the ErlBinMatchBuffer you plan to change to ErlSubBits? The ErlBinMatchState will be left as is?

@jhogberg
Copy link
Contributor Author

Is it only the ErlBinMatchBuffer you plan to change to ErlSubBits? The ErlBinMatchState will be left as is?

No, ErlBinMatchState will be replaced by ErlSubBits (which will change shape altogether in that PR). From the compiler and runtime system's point of view a match state will effectively just be a sub-binary that happens to be mutable, instead of the strange beast it is now.

@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from 31587be to 166dc67 Compare November 14, 2023 14:57
@garazdawi
Copy link
Contributor

No, ErlBinMatchState will be replaced by ErlSubBits (which will change shape altogether in that PR). From the compiler and runtime system's point of view a match state will effectively just be a sub-binary that happens to be mutable, instead of the strange beast it is now.

The reason I asked was because this comment above ErlBinMatchBuffer

/** @brief This structure represents a binary to be matched, we plan to replace
 * this with ErlSubBits in the near future. */
typedef struct erl_bin_match_buffer {

Anyway, wont you need to store the save_offsets somewhere? Or will those be placed in register/stack slots?

@jhogberg
Copy link
Contributor Author

Anyway, wont you need to store the save_offsets somewhere? Or will those be placed in register/stack slots?

Yes, offsets are stored as regular integer terms since OTP 22. There's a slight optimization for 32-bit that still uses a single slot, but it's hardly worth the bother so I'm going to remove it.

@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from 166dc67 to 7b714b8 Compare November 14, 2023 15:42
@jhogberg jhogberg added the testing currently being tested, tag is used by OTP internal CI label Nov 15, 2023
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from af8e9ed to 6c455fe Compare November 16, 2023 10:40
@garazdawi garazdawi force-pushed the john/erts/bitsized-binaries branch 2 times, most recently from 2890f9e to 38c203e Compare November 17, 2023 12:37
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch 5 times, most recently from e376711 to e37d0f2 Compare November 22, 2023 11:35
This assertion crashes during the property test suites, which
expect an exception to be raised.
Debugging differences between the calculated and actual size is no
fun at all when certain kinds of data is overestimated, as a simple
`ASSERT(after_encode <= &before_encode[size])` check will often
succeed when it shouldn't (e.g. if part of one object is
overestimated while another part is underestimated).

This commit fixes the differences that I've noticed, and adds an
assertion that the encoded size should be exactly equal to the
calculated size so that no new differences can fly under the radar
from now on.
By reducing the difference between match states and sub-binaries,
this commit sets the stage for massive improvements in the bit
syntax implementation, where we plan to allow returning matched
tails from functions without any loss of performance relative to
continuation-passing-style.

This commit also simplifies the handling of off-heap Binary
objects. ProcBin (now called BinRef) is no longer exposed
directly as a term, with off-heap bitstrings instead being
represented by an ErlSubBits that references the BinRef. While
this results in slightly more on-heap usage, it reduces complexity
and makes it easy to determine which regions in a binary a process
refers to during a GC, giving us the opportunity to shed references
or shrink them to fit.
@jhogberg jhogberg force-pushed the john/erts/bitsized-binaries branch from e37d0f2 to 24ef4cb Compare November 23, 2023 10:43
@jhogberg jhogberg merged commit 5d7d146 into erlang:master Nov 23, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review_help_wanted Help with review wanted team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants