Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ripper compatibility layer #2354

Closed
kddnewton opened this issue Feb 2, 2024 · 4 comments
Closed

Ripper compatibility layer #2354

kddnewton opened this issue Feb 2, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@kddnewton
Copy link
Collaborator

We want to replace Ripper with prism within the community as the default choice when a developer needs access to a parser. It's possible that given enough time we could replace all of the public uses ourselves, but we won't be able to migrate private repositories/applications that are using it. Even though Ripper has been marked as "early alpha" since it was first introduced, it has been around a long enough time that people have been relying on it for years.

We want to provide a compatibility layer such that people can continue to use Ripper for as long as they need to, but have the parsing be backed by prism. This means replacing all of Ripper's surface area, which is no small task. That surface area can loosely be broken up into two different groups: Ripper core and the Ripper API.

Ripper core is a SAX-style Ruby parser. This means it calls out to Ruby methods whenever a rule is reduced and uses the results of calling those methods as arguments to more methods higher in the tree. For example:

class Foo < Ripper
  def on_binary(left, op, right) = [:binary, left, op, right]
  def on_int(value) = [:@int, value]
  def on_stmts_new = []
  def on_stmts_add(stmts, stmt) = stmts << stmt
  def on_program(stmts) = stmts
end

Using Ripper.new("1 + 2").parse you will get back [[:binary, [:@int, "1"], :+, [:@int, "2"]]]. The results of our on_int method get passed to on_binary, which get passed to on_stmts_add, which get passed up to on_program.

We're not going to put hooks into prism's parsing for every rule reduction. We would not be able to maintain it, and it would clutter the entire codebase. Instead, we're going to replicate Ripper by parsing the entire tree with prism and then walking the tree in the same order as Ripper to generate the same output. Note that we're going to be calling out to methods that are user-defined, not providing the implementation of those methods ourselves. The order of the methods that are called therefore very much matters. You can see the beginnings of this approach here: https://github.com/ruby/prism/blob/main/lib/prism/ripper_compat.rb.

Unfortunately, because Ripper exposes so much of the internals of the Ruby parser, you can also access various state on the parser. This includes the current line number, column, and parser state. We're not going to be able to replicate the parser state very accurately, so I don't think we should try. The lines and columns we can though. This will involve changing the line and column numbers to be in a specific state before calling out to the user-defined hooks.

The second half of this work is the ripper API. This is Ripper::SexpBuilder, Ripper::SexpBuilderPP, and Ripper::Lexer. Because all of these are just Ruby libraries, they can come after Ripper core and we can mostly copy-paste. We'll see if there are other blockers later, but it looks like it will be manageable.

Success/deliverables for this ticket can come in multiple stages. In loose order, they would be:

  • Parsing various codebases (start with prism itself, then ruby/ruby, etc.) with Prism::RipperCompat::SexpBuilder and Ripper::SexpBuilder and comparing the outputs.
  • Ensuring all of the fixtures in prism pass with the same comparison.
  • Running ruby-syntax-tree/syntax_tree's test suite against Prism::RipperCompat until it passes (this is the biggest project I know of based on ripper).
@kddnewton kddnewton added the enhancement New feature or request label Feb 2, 2024
@eregon
Copy link
Member

eregon commented Feb 2, 2024

This would be awesome!
Also for TruffleRuby which currently reuses the Ripper C code, which is slow and uses too much internals (including two copies of parse.y, it's not a proper C extension but basically CRuby core C code behind a require).

@noahgibbs
Copy link
Contributor

Mentioned so we don't forget: in #2439, I broke up Translation::Ripper into two files, one inheriting from the other. The encapsulation is pretty messy -- it's mostly the same as it was before in terms of everything seeing everything. We can't break it up in quite the way we'd like (ref: #2439 (comment)) by doing it this way. The hard part is the on_* methods, which are used from the compiler to emit Ripper output, but we want to redefine them to change where that output goes (e.g. SexpBuilder).

I think we could fix this by having a RipperEmitter (or some other name) that turns on_* methods into Ripper output. Then the RipperCompiler would just know how to turn Prism nodes into Ripper output names, but we could separate out things like combining lists of raw ripper (SexpBuilderPP) or just emitting as arrays (SexpBuilder) as an emitter. I think this would yield a cleaner design. In any case, we'd like a cleaner design.

@noahgibbs
Copy link
Contributor

Right now we primarily test Translations::Ripper output against Ripper's own output using sexp_raw. That's not a bad start. Tom Enebo of JRuby appreciates that we've found at least one incompatibility in their Ripper implementation. Benoit Daloze would rather just drop Translations::Ripper into TruffleRuby to replace their Ripper implementation as soon as reasonable, and so isn't as bothered about fixing minor Ripper incompatibilities. Fair.

We still want Prism and Translations::Ripper to run well on both JRuby and TruffleRuby. But especially for Truffle it makes sense to collect Ripper fixtures from at least Ruby 3.3 and 3.4. This would allow us to test against reference Ripper on TruffleRuby, and against specific CRuby-version Ripper on different CRuby versions. That may also let us enable tests on more TruffleRuby and/or JRuby versions since we can test against known Ripper output.

@noahgibbs noahgibbs removed their assignment Feb 27, 2024
@kddnewton
Copy link
Collaborator Author

This is now on main. There are a couple of incompatibilities that I'll work through, but it should work for the vast majority of applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants