Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract out usages of Ruby IDs #2175

Merged
merged 3 commits into from
Jan 9, 2025
Merged

Conversation

amomchilov
Copy link
Contributor

@amomchilov amomchilov commented Dec 18, 2024

This PR extracts out an rbs_constant_pool interface, which mimics the pm_constant_pool from Prism.

I use it to wrap all usages of ID as location child keys, instead of directly depending Ruby APIs like rb_intern(), ID2SYM, SYM2ID, etc. This includes the keys used to identify the children in a rb_loc, and the keys used in the parser's typevar tables.

For now, this is just a dummy implementation which delegates to those same Ruby APIs, but will eventually be replaced with Prism's real logic, which doesn't depend on ruby.h at all.

For easier reviewing, have a look at each commit separately.

@amomchilov amomchilov changed the title Extract out usages of Ruby Symbols and IDs Extract out usages of Ruby IDs Dec 18, 2024
@amomchilov amomchilov mentioned this pull request Dec 18, 2024
7 tasks
@amomchilov amomchilov force-pushed the ruby-based-constant-pool branch 2 times, most recently from 4ee0431 to fa6470f Compare December 18, 2024 05:16
@@ -221,9 +239,9 @@ static VALUE rbs_new_location_from_loc_range(VALUE buffer, rbs_loc_range rg) {
static VALUE location_aref(VALUE self, VALUE name) {
rbs_loc *loc = rbs_check_location(self);

ID id = SYM2ID(name);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this used to leak memory, since SYM2ID would allocate a static symbol.

@@ -1023,8 +1031,13 @@ static VALUE parse_simple(parserstate *state) {
return parse_symbol(state);
}
case tUIDENT: {
ID name = INTERN_TOKEN(state, state->current_token);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This INTERN_TOKEN macro was calling rb_intern3(), which was slowly leaking memory by creating static ("immortal") symbols.

The replacement rbs_constant_pool_find() function uses rb_check_id_cstr() instead, which checks for a match without intern a new symbol if there wasn't one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if you plan to introduce another helper function instead of writing the token manipulation code everywhere in the codebase? (maybe later.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there's a lot of repetition in how the current token is handled. I'll add a helper eventually, it's up in our WIP branch :)


/** The overall constant pool, which stores constants found while parsing. */
typedef struct {
void *dummy; // Workaround for structs not being allowed to be empty.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, this will be replaced with the real storage that will store the constant pool, instead of the one built into the Ruby VM.

@amomchilov amomchilov force-pushed the ruby-based-constant-pool branch 2 times, most recently from 0b4f69e to 3ae33c6 Compare December 18, 2024 06:12
@amomchilov amomchilov marked this pull request as ready for review December 18, 2024 06:16
@amomchilov amomchilov force-pushed the ruby-based-constant-pool branch 2 times, most recently from 7fdab83 to 6464270 Compare January 6, 2025 18:31
Comment on lines +14 to +22
_Static_assert(
__builtin_types_compatible_p(ID, rbs_constant_id_t),
"rbs_constant_id_t must be the same as a Ruby ID for now."
);

_Static_assert(
__builtin_types_compatible_p(VALUE, rbs_constant_t),
"rbs_constant_t must be the same as a Ruby Symbol for now."
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions confirm that our temporary definitions of rbs_constant_id_t and rbs_constant_t are memory-compatible with ID and VALUE (Symbol), so that we can safely cast back/forth between them. They'll go away once I add in the pure C implementation of the constant pool.

@amomchilov amomchilov force-pushed the ruby-based-constant-pool branch from 6464270 to 4153580 Compare January 6, 2025 18:36
@soutaro
Copy link
Member

soutaro commented Jan 8, 2025

@amomchilov Thanks!

Let me confirm that: all other usage of Ruby symbols will finally be replaced to rbs_constant_pool to make the implementation independent to MRI C API, right?

(I'm a bit confused that some code continue using INTERN_TOKEN and other Ruby APIs instead of the constant pool.)

rbs__init_constants();
rbs__init_location();
rbs__init_parser();

rbs_constant_pool_init(RBS_GLOBAL_CONSTANT_POOL, 0);
ruby_vm_at_exit(Deinit_rbs_extension);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very sure if we need the deinit function. How can a Ruby vm exit without process termination? (Maybe there’s a reason I’m not aware of…)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding CRuby?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this to minimize the amount of leaked memory at the time the process terminates, so I could more easily find other leaks with tooling like valgrind.

We can remove it and just rely on process termination to clean up everything, but that has the downside of adding noise to any memory-related tooling. Perhaps we could add a DEBUG flag, and only do the cleanup when it's true?

How can a Ruby vm exit without process termination?

Not that I know of, but it seemed like the exact API for this (ther's also atexit())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could more easily find other leaks with tooling like valgrind.

It makes sense! Thanks for the explanation.

@amomchilov
Copy link
Contributor Author

amomchilov commented Jan 8, 2025

@soutaro

Let me confirm that: all other usage of Ruby symbols will finally be replaced to rbs_constant_pool to make the implementation independent to MRI C API, right?

Eventually... yes! But that requires we introduce the C data structures (to replace the Ruby objects) and translation layer (to translate to Ruby objects, for the Ruby API), which aren't ready to ship yet.

This PR chips away by migrating all usages related to location child names and typevars, and leaves the rest as-is for now.

Copy link
Member

@soutaro soutaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 🎉

@soutaro soutaro added this pull request to the merge queue Jan 9, 2025
Merged via the queue into ruby:master with commit 991bacb Jan 9, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants