Fix stack overflow for large projects #1484

Skipants · 2023-02-27T22:26:41Z

Description

When large projects have a lot of missing objects in the first pass of parsing Ruby files we would run into stack overflows. This happens because, when there was a missing object in a parsed Ruby file, we would recursively call #parse_remaining_files. When this happens a lot the stack would get huge.

We fix this by instead keeping a list of files that we want to retry and re-parse them in another pass. When we can no longer resolve any more files we break the loop.

Fixes #1375

Completed Tasks

I have read the Contributing Guide.
The pull request is complete (implemented / written).
Git commits have been cleaned up (squash WIP / revert commits).
I wrote tests and ran bundle exec rake locally (if code is attached to PR).

It's difficult to write a test for this because it depended on a lot of files until it would break.

I tested this by running yard doc on my company's internal monolith and having it working, whereas the current release of yard would SystemStackError

Skipants · 2023-02-27T22:33:44Z

lib/yard/handlers/c/base.rb

-            parser.parse_remaining_files
-            retries += 1
+          if globals.ordered_parser
+            retryable_file = parser.file == "(stdin)" ? StringIO.new("void Init_Foo() { #{statement.source} }") : parser.file


Forgive my sin here. I wrote it this way because of tests that fail when we only try and use parser.file like in lib/yard/handlers/base.rb.

An example test that fails is

yard/spec/handlers/c/class_handler_spec.rb

Line 70 in e91d41c

it "resolves namespace variable names across multiple files" do

Here's what happens/reasoning behind this weird hack:

Foo::Bar is not resolved and is instead retried in the first string of that test

Because this comes from stdin and not a file, parser.file is (stdin).

If we push the string (stdin) onto the globals.ordered_parser.files_to_retry it's not able to be parsed by OrderedParser#parse. It should instead be a StringIO with the C code contents.

The C code contents from statement.source are missing the wrapper void Init_Foo() { ... } and needs to be manually added.

An alternative to this was to save the contents in globals but that felt hacky as well for a couple reasons:

I was afraid pushing contents of files on globals could result in runaway mem usage (though now that I think about it, each new file probably clobbers the last one)

This is only useful in the niche case of C code from STDIN, so having a global for that felt like overkill and potentially confusing code

This PR is definitely not mergeable with a hack like this, especially one that artificially causes a failing test to pass targeted specifically for that test.

Notably, (stdin) parsing is entirely common and not specific to C. What you're suggesting here is this PR does not support this use case. That would be a problem and highlights a possible breaking change.

Excellent feedback, thank you. I'll let this one simmer a bit and see if I can come up with a less-hacky patch.

lsegal

Sorry to get around to this so late. To be honest, the use of the specific workaround to C parsing is definitely concerning and makes me wonder if this will cause a breaking change, and thus I have not looked too deeply at merging.

Unrolling the recursive loop might be useful here, but it would have to be done in a way that respects the existing order of operations and API capabilities. Supporting StringIOs is definitely one of those API capabilities.

I think this PR would need another pass to make it compatible.

lsegal · 2023-05-11T17:24:23Z

lib/yard/handlers/c/base.rb

-            parser.parse_remaining_files
-            retries += 1
+          if globals.ordered_parser
+            retryable_file = parser.file == "(stdin)" ? StringIO.new("void Init_Foo() { #{statement.source} }") : parser.file


This PR is definitely not mergeable with a hack like this, especially one that artificially causes a failing test to pass targeted specifically for that test.

Notably, (stdin) parsing is entirely common and not specific to C. What you're suggesting here is this PR does not support this use case. That would be a problem and highlights a possible breaking change.

When large projects have a lot of missing objects in the first pass of parsing Ruby files we would run into stack overflows. This happens because, when there was a missing object in a parsed Ruby file, we would recursively call `#parse_remaining_files`. When this happens a lot the stack would get huge. We fix this by instead keeping a list of files that we want to retry and re-parse them in another pass. When we can no longer resolve any more files we break the loop.

Skipants commented Feb 27, 2023

View reviewed changes

dduugg mentioned this pull request Mar 3, 2023

Merge sigs in RBI files with existing documentation dduugg/yard-sorbet#170

Merged

lsegal reviewed May 11, 2023

View reviewed changes

Skipants force-pushed the fix-stack-overflow branch from ec5be9e to 8a14e13 Compare July 26, 2023 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stack overflow for large projects #1484

Fix stack overflow for large projects #1484

Skipants commented Feb 27, 2023 •

edited

Loading

Skipants Feb 27, 2023

lsegal May 11, 2023

Skipants May 24, 2023

lsegal left a comment

lsegal May 11, 2023

Fix stack overflow for large projects #1484

Are you sure you want to change the base?

Fix stack overflow for large projects #1484

Conversation

Skipants commented Feb 27, 2023 • edited Loading

Description

Completed Tasks

Skipants Feb 27, 2023

Choose a reason for hiding this comment

lsegal May 11, 2023

Choose a reason for hiding this comment

Skipants May 24, 2023

Choose a reason for hiding this comment

lsegal left a comment

Choose a reason for hiding this comment

lsegal May 11, 2023

Choose a reason for hiding this comment

Skipants commented Feb 27, 2023 •

edited

Loading