Stop generating more methods than necessary #293

paracycle · 2021-04-23T19:26:35Z

Motivation

We used to treat methods that didn't have a source location the same as methods that explicitly were not defined in the current gem. That resulted in Tapioca creating more method definitions than necessary.

We would only skip method generation for a method if the constant it was on was an ignored type (i.e. a built-in type), so that we wouldn't keep redefining methods for built-in types. However, for all other types, especially types that come from other gems, we would just keep on generating all the methods regardless of if they were defined by this gem or not.

Moreover, the source location check was happening at the wrong location, before unwrapping the method signature. Thus, many methods with signatures would not be generated when the previous problem was fixed, since our code would see them as being defined in Sorbet runtime.

Implementation

The fix is to return a more fine-grained result from method_in_gem? which signals yes/no/don't-have-source-location. Based on that we can skip generating don't-have-source-location cases if they are for built-in types and totally ignore the methods that have a source location and are definitely not defined in the current gem.

Additionally, if we try to unwrap the method signature and we get an exception, that means the signature block raised an error. If we continue with the method as is, the source location checks would think the method definition does not belong in the gem (since the method is still wrapped), and we would thus skip the method generation. To avoid that, the signature_for method is now raising a custom exception to signal that exceptional case, so that we can at least continue generating "a" method definition.

Tests

Updated existing tests.

vinistock · 2021-04-23T19:47:56Z

lib/tapioca/compilers/symbol_table/symbol_generator.rb


-          gem.contains_path?(source_location)
+          source_location == "(eval)" || gem.contains_path?(source_location)


When is the source_location equal to (eval)?

Ah, forgot to mention that in the PR description, good catch. That happens when:

class Foo module_eval("def foo; end") end Foo.instance_method(:foo).source_location # => ["(eval)", 1]

Most usages of eval and friends do:

class Foo module_eval("def bar; end", __FILE__, __LINE__) end puts Foo.instance_method(:bar).source_location # => ["foo.rb", 2]

though, so I would expect this to be an edge-case situation.

Still, I think erring on the side of caution for eval defined methods is the right call.

mutecipher · 2024-08-09T16:57:44Z

We used to treat methods that didn't have a source location the same as methods that explicitly were not defined in the current gem. That resulted in Tapioca creating more method definitions than necessary. We would only skip method generation for a method if the constant it was on was an ignored type (i.e. a built-in type), so that we wouldn't keep redefining methods for built-in types. However, for all other types, especially types that come from other gems, we would just keep on generating all the methods regardless of if they were defined by this gem or not. Moreover, the source location check was happening at the wrong location, before unwrapping the method signature. Thus, many methods with signatures would not be generated when the previous problem was fixed, since our code would see them as being defined in Sorbet runtime. The fix is to return a more fine-grained result from `method_in_gem?` which signals yes/no/don't-have-source-location. Based on that we can skip generating don't-have-source-location cases if they are for built-in types and totally ignore the methods that have a source location and are definitely not defined in the current gem. Additionally, if we try to unwrap the method signature and we get an exception, that means the signature block raised an error. If we continue with the method as is, the source location checks would think the method definition does not belong in the gem (since the method is still wrapped), and we would thus skip the method generation. To avoid that, the `signature_for` method is now raising a custom exception to signal that exceptional case, so that we can at least continue generating "a" method definition.

paracycle · 2024-08-09T17:56:48Z

Unfortunately, while the CI is green, this work is still not complete, since the implementation ends up removing too many methods from generated RBI files.

In general there are 2 ways in which "gem B" can add methods to a constant Foo from "gem A":

"gem B" reopens Foo and adds a method to it:
```
# gem A
class Foo
  def foo
  end
end

# gem B
class Foo
  def bar
  end
end
```
This PR correctly attributes Foo#foo to "gem A" and Foo#bar to "gem B", and they are generated in the respective gem's RBI file. So, this use-case works properly today against this PR.
"gem B" does not reopen the Foo constant from "gem A" but defines methods on Foo in respond to some mechanism triggered by the loading of constant Foo. For example:
```
# gem A
class Foo
  extend ModuleFromB

  def foo
  end

  add_methods_to_me :bar
end

# gem B
module ModuleFromB
  def add_methods_to_me(*names)
    names.each do |name|
      define_method(name) { 42 }
    end
  end
end
```
In this case, the method Foo#bar will exist on Foo at runtime, but its source location will point to the file in "gem B". Thus, this PR will filter that method from the RBI file of "gem A". But, the method will not be generated in "gem B"'s RBI file either, since the Foo constant is never visited in "gem B". Thus, Tapioca ends up missing the definition of Foo#bar altogether.

To do this properly, we need to attribute such dynamic method definitions to the file require that triggered them, just like how we did with mixin tracking.

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

egiurleo · 2024-08-09T21:00:17Z

I've taken a stab at fixing the problem above. I have a failing test demonstrating Ufuk's use case and some half-working code. Here's my WIP PR, I'll return to it next week.

amomchilov · 2024-08-11T22:01:13Z

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

Does hooking method_added solve that? @paracycle

paracycle · 2024-08-11T22:28:25Z

Unlike mixin tracking, though, this is much harder to do, since we can't intercept def calls and analyze the backtrace. It won't be enough to intercept define_method calls (even though it will take us some of the way there), since methods can always be defined via class_eval/module_eval/eval using def.

Does hooking method_added solve that? @paracycle

That is exactly what @egiurleo's PR in the comment above is doing, but in general I am afraid it will be brittle since you need one constant to forget to call super from their method_added implementation and you never get your one called.

…ectly attributed

This commit adds a `MethodDefinition` tracker which properly attributes methods definitions to the gem that triggers them. This is useful for tracking which gem is responsible for a method definition, and is used in the `Tapioca::Gem::Pipeline` to check if a method should be generated for the given gem or not.

By tracking line numbers for method definitions, we can generate more accurate source locations for methods in RBI files. This will help with tools that rely on source locations, such as Ruby LSP.

When we are dealing with C-extension/native methods, we have been relying on the fact that `source_location` method returning `nil` for those methods. When we switched to using the method definition tracker, we ended up breaking this and started generating method definitions for all the stdlib gem constant methods. This commit tries to catch the case the same case by looking at the previous frame and seeing if it is a `require`, which means that something was required but there are no more Ruby frames, which implies that the attribution should be to a C-extension/native method.

Aliased methods seem to have equal `Method` objects associated with them, so we can't use the method object the a key in a hash. Instead, we should store the method definitions in a hash keyed by the owner of the method, and then the method name. This way, we can disambiguate between aliased methods.

By checking the same flag that Sorbet runtime uses to skip method_added handling, we can avoid processing method redefinitions that are triggered by Sorbet runtime.

KaanOzkan · 2024-08-16T22:53:37Z

Gemfile

@@ -7,7 +7,7 @@ gemspec
 CURRENT_RAILS_VERSION = "7.1"
 rails_version = ENV.fetch("RAILS_VERSION", CURRENT_RAILS_VERSION)

-gem "minitest"
+gem "minitest", "< 5.25.0" # minitest 5.25.0+ is incompatible with minitest-hooks


I think this can be removed minitest/minitest@6d83843

Sure, I'll clean up this branch quite a lot, but I needed to make sure the tests weren't failing.

Previously we were doing two things wrong: 1. Looking up the method object from the class that the `{singleton_}method_added` call was happening on and indexing method definitions keyed by that was mostly working. However, it was failing in an interesting way when a method was an alias to another method. In that case, the method object would be the same, but the method definition would be different. For example, `alias_method :foo, :bar` would cause the lookup of `foo` method to return the definition of `bar` method. That would cause us to fail generating `foo` method for a gem, if `bar` was defined in another gem. 2. We were storing a single definition for a method, but that proved to be not enough. For example, if a method was defined in a gem and then redefined in application code, we would only store the definition from the application location, and never be able to attribute the method to the gem it was originally defined in. That lead us to not generate that method for the gem. Instead, we now store all definitions of a method in `MethodDefinition` tracker. When looking up a method definition, we now look up all definitions. If there are no definitions, that means the method is probably a C-method, so we return `nil`. If there are definitions, we see if we can find one that matches the gem location. If not, we check for `(eval)` locations, in which case, we include the method but can't return a source location. If all fails, we return `false` to signal that we couldn't find a definition. If we are able to find a definition that matches the gem location, we return that definition. This new logic is used both in `Tapioca::Gem::Listeners::Methods` and `Tapioca::Gem::Listeners::SourceLocation` listeners. The former uses it to check if the method should be included in the gem RBI, and the latter uses it to add more correct source location information in the comments.

amomchilov

Fantastic work!

amomchilov · 2024-08-19T13:31:49Z

lib/tapioca/gem/pipeline.rb

+      sig { params(method_name: Symbol, owner: Module).returns(T.any([String, Integer], NilClass, T::Boolean)) }
+      def method_definition_in_gem(method_name, owner)


The name is less-than-obvious, but I can't think of a better one. How about we add a doc comment? The return value is particularly tricky to grok

Suggested change

sig { params(method_name: Symbol, owner: Module).returns(T.any([String, Integer], NilClass, T::Boolean)) }

def method_definition_in_gem(method_name, owner)

# @returns - The source file and line number, if this method was defined in this pipeline's gem

- `nil` if the source location is unknown

- `true` if the method was defined via `eval` without a known file/line number

- `false` if this method wasn't defined by this gem

sig { params(method_name: Symbol, owner: Module).returns(T.any([String, Integer], NilClass, T::Boolean)) }

def method_definition_in_gem(method_name, owner)

I wonder if we could use a class to represent the result, maybe accompanied with an enum to state the found/not found?

Or we can go full blown inheritance:

class MethodLocationLookupResult abstract! end # The method doesn't seem to exist class MethodLocationUnknown < MethodLocationLookupResult; end # The method exists but doesn't have a source location class MethodLocationNotSpecified < MethodLocationLookupResult; end # The method exists and has a source location class MethodLocationLookupInGem < MethodLocationLookupResult @location: [String, Integer] end

This would make handling the return result much easier.

amomchilov · 2024-08-19T13:36:25Z

lib/tapioca/runtime/reflection.rb

+      # this looks something like:
+      # "(eval at /path/to/file.rb:123)"
+      # and we are just interested in the "/path/to/file.rb" part
+      EVAL_SOURCE_FILE_PATTERN = T.let(/\(eval at (.+):\d+\)/, Regexp)


Can we anchor the start or end of this with ^ and/or $?

amomchilov · 2024-08-19T13:36:47Z

lib/tapioca/runtime/reflection.rb

+
+        file, line = Object.const_source_location(constant_name)
+
+        # Ruby 3.3 adds automatic definition of source location for evals if


amomchilov · 2024-08-19T13:37:05Z

lib/tapioca/runtime/reflection.rb

+        # `file` and `line` arguments are not provided. This results in the source
+        # file being something like `(eval at /path/to/file.rb:123)`. We try to parse
+        # this string to get the actual source file.
+        file = file&.sub(EVAL_SOURCE_FILE_PATTERN, "\\1")


Suggested change

file = file&.sub(EVAL_SOURCE_FILE_PATTERN, "\\1")

file = file&.sub(EVAL_SOURCE_FILE_PATTERN, '\1')

Morriar · 2024-08-20T14:37:02Z

lib/tapioca/gem/pipeline.rb


        gem.contains_path?(source_file)
      end

-      sig { params(method: UnboundMethod).returns(T::Boolean) }
+      sig { params(method: UnboundMethod).returns(T.nilable(T::Boolean)) }


Let's add a comment about nil meaning no source location

Morriar · 2024-08-20T14:38:22Z

lib/tapioca/runtime/reflection.rb

@@ -129,15 +129,19 @@ def qualified_name_of(constant)
        end
      end

+      SignatureBlockError = Class.new(StandardError)


Should this be a Tapioca::Error?

Morriar · 2024-08-20T14:49:12Z

lib/tapioca/runtime/reflection.rb

+        resolved_loc = locations.find { |loc| loc.absolute_path == resolved_loc.absolute_path }
+        return ["", 0] unless resolved_loc
+
+        [resolved_loc.absolute_path || "", resolved_loc.lineno]


Is there the risk of absolute_path being nil yet lineno is not?

Morriar · 2024-08-20T14:57:07Z

lib/tapioca/gem/listeners/source_location.rb

+          definition = @pipeline.method_definition_in_gem(event.method.name, event.constant)
+
+          case definition
+          when NilClass, FalseClass, TrueClass


Yeah this is unfortunate with the return type of method_definition_in_gem, I think we can do a bit more elegant, see my comment on the method itself.

Morriar · 2024-08-20T15:01:12Z

lib/tapioca/gem/pipeline.rb

+      sig { params(method_name: Symbol, owner: Module).returns(T.any([String, Integer], NilClass, T::Boolean)) }
+      def method_definition_in_gem(method_name, owner)


I wonder if we could use a class to represent the result, maybe accompanied with an enum to state the found/not found?

Or we can go full blown inheritance:

class MethodLocationLookupResult abstract! end # The method doesn't seem to exist class MethodLocationUnknown < MethodLocationLookupResult; end # The method exists but doesn't have a source location class MethodLocationNotSpecified < MethodLocationLookupResult; end # The method exists and has a source location class MethodLocationLookupInGem < MethodLocationLookupResult @location: [String, Integer] end

This would make handling the return result much easier.

Morriar · 2024-08-20T15:06:13Z

lib/tapioca/runtime/trackers/method_definition.rb

@@ -10,14 +10,14 @@ module MethodDefinition

        @method_definitions = T.let(
          {}.compare_by_identity,
-          T::Hash[Module, T::Hash[Symbol, T.nilable([String, Integer])]],
+          T::Hash[Module, T::Hash[Symbol, T::Array[[String, Integer]]]],


We're really using this tuple a lot, I wonder if we should extract it as a FileLocation:

class FileLocation @path: String @line: Integer end

paracycle requested a review from a team April 23, 2021 19:26

vinistock approved these changes Apr 23, 2021

View reviewed changes

Morriar approved these changes Apr 26, 2021

View reviewed changes

mutecipher approved these changes Apr 26, 2021

View reviewed changes

RyanBrushett approved these changes May 12, 2021

View reviewed changes

paracycle marked this pull request as draft May 12, 2021 18:58

KaanOzkan assigned paracycle Sep 20, 2021

Morriar self-assigned this Jan 25, 2022

Morriar added the enhancement New feature or request label Jan 25, 2022

Morriar added this to the Tapioca 1.0 milestone Feb 10, 2022

sambostock mentioned this pull request Jul 8, 2022

Implement Rubocop DSL RBI compiler #1046

Open

1 task

Morriar removed this from the Tapioca 1.0 milestone Jan 18, 2023

This was referenced Mar 18, 2024

Bump the minor-and-patch group with 7 updates #1828

Closed

Bump the minor-and-patch group with 8 updates #1833

Merged

This was referenced Apr 1, 2024

Bump the minor-and-patch group with 24 updates jeffcarbs/tapioca#156

Closed

Bump the minor-and-patch group with 24 updates jeffcarbs/tapioca#157

Closed

dependabot bot mentioned this pull request Apr 15, 2024

Bump the minor-and-patch group with 24 updates jeffcarbs/tapioca#158

Closed

paracycle force-pushed the uk-method-generation-improvements branch from 593bfa7 to da53327 Compare August 9, 2024 16:47

paracycle marked this pull request as ready for review August 9, 2024 16:51

paracycle force-pushed the uk-method-generation-improvements branch from da53327 to 0d40ba9 Compare August 9, 2024 17:07

egiurleo and others added 4 commits August 13, 2024 19:23

Write failing test for case where dynamic method definition is incorr…

d385919

…ectly attributed

Unlock the rbtrace version

ceb831f

Start tracking line numbers for method definitions

f0094a4

By tracking line numbers for method definitions, we can generate more accurate source locations for methods in RBI files. This will help with tools that rely on source locations, such as Ruby LSP.

paracycle added 3 commits August 14, 2024 00:35

Add method definition tracking for singleton methods

0e73b24

paracycle force-pushed the uk-method-generation-improvements branch from dcd91bd to deda787 Compare August 14, 2024 20:38

egiurleo mentioned this pull request Aug 15, 2024

RuboCop RBI appears to be pulling in rubocop-rspec when typechecking #1982

Open

paracycle added 3 commits August 16, 2024 01:39

Do not process method redefinitions triggered by Sorbet runtime

2e771ae

By checking the same flag that Sorbet runtime uses to skip method_added handling, we can avoid processing method redefinitions that are triggered by Sorbet runtime.

Regenerate gem RBI files

c1d05c8

Pin minitest to < 5.25.0

a11f4ce

paracycle force-pushed the uk-method-generation-improvements branch from 6ee7756 to a11f4ce Compare August 15, 2024 22:39

KaanOzkan reviewed Aug 16, 2024

View reviewed changes

paracycle force-pushed the uk-method-generation-improvements branch from 330b100 to 862f422 Compare August 16, 2024 23:03

p-linnane mentioned this pull request Aug 19, 2024

build(deps-dev): bump rubocop from 1.65.0 to 1.65.1 in /Library/Homebrew Homebrew/brew#17928

Closed

amomchilov approved these changes Aug 19, 2024

View reviewed changes

Morriar reviewed Aug 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop generating more methods than necessary #293

Stop generating more methods than necessary #293

paracycle commented Apr 23, 2021 •

edited

Loading

vinistock Apr 23, 2021

paracycle Apr 23, 2021 •

edited

Loading

paracycle Apr 23, 2021

mutecipher commented Aug 9, 2024

paracycle commented Aug 9, 2024 •

edited

Loading

egiurleo commented Aug 9, 2024 •

edited

Loading

amomchilov commented Aug 11, 2024

paracycle commented Aug 11, 2024

KaanOzkan Aug 16, 2024

paracycle Aug 16, 2024

amomchilov left a comment

amomchilov Aug 19, 2024

Morriar Aug 20, 2024

amomchilov Aug 19, 2024

amomchilov Aug 19, 2024

amomchilov Aug 19, 2024

Morriar Aug 20, 2024

Morriar Aug 20, 2024

Morriar Aug 20, 2024

Morriar Aug 20, 2024

Morriar Aug 20, 2024

Morriar Aug 20, 2024


		gem.contains_path?(source_location)
		source_location == "(eval)" \|\| gem.contains_path?(source_location)

		sig { params(method_name: Symbol, owner: Module).returns(T.any([String, Integer], NilClass, T::Boolean)) }
		def method_definition_in_gem(method_name, owner)


		file, line = Object.const_source_location(constant_name)

		# Ruby 3.3 adds automatic definition of source location for evals if

	file = file&.sub(EVAL_SOURCE_FILE_PATTERN, "\\1")
	file = file&.sub(EVAL_SOURCE_FILE_PATTERN, '\1')

Stop generating more methods than necessary #293

Are you sure you want to change the base?

Stop generating more methods than necessary #293

Conversation

paracycle commented Apr 23, 2021 • edited Loading

Motivation

Implementation

Tests

Choose a reason for hiding this comment

paracycle Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mutecipher commented Aug 9, 2024

paracycle commented Aug 9, 2024 • edited Loading

egiurleo commented Aug 9, 2024 • edited Loading

amomchilov commented Aug 11, 2024

paracycle commented Aug 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amomchilov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paracycle commented Apr 23, 2021 •

edited

Loading

paracycle Apr 23, 2021 •

edited

Loading

paracycle commented Aug 9, 2024 •

edited

Loading

egiurleo commented Aug 9, 2024 •

edited

Loading