Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern global map_each functions in VectorArgs #23685

Closed
wants to merge 1 commit into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Sep 19, 2024

Intern global Starlark functions in the VectorArg as they can be reused for all rule instances (and possibly even across multiple Args.add_all calls).

@fmeum fmeum requested review from a team, brandjon and tetromino as code owners September 19, 2024 16:50
@fmeum fmeum requested review from gregestren and justinhorvitz and removed request for a team, brandjon, tetromino and gregestren September 19, 2024 16:50
@github-actions github-actions bot added team-Configurability platforms, toolchains, cquery, select(), config transitions awaiting-review PR is awaiting review from an assigned reviewer labels Sep 19, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 1, 2024

@justinhorvitz Friendly ping

Copy link
Contributor

@justinhorvitz justinhorvitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saves 8MB (0.1%) on a benchmark build.

@brandjon
Copy link
Member

brandjon commented Oct 2, 2024

What's the motivation for using reference equality in VectorArg#equals? The comment says:

Do not use equals for StarlarkFunction as it is implemented in terms of SymbolGenerator.Symbol, which may not change if the function itself does.

While the statement about SymbolGenerator.Symbol is true, how is the function object expected to change? A Starlark re-evaluation of a .bzl file or other context would create a new StarlarkFunction object, so reference equality doesn't avoid that (not that I understand why avoiding that situation is needed).

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 2, 2024

While the statement about SymbolGenerator.Symbol is true, how is the function object expected to change? A Starlark re-evaluation of a .bzl file or other context would create a new StarlarkFunction object, so reference equality doesn't avoid that (not that I understand why avoiding that situation is needed).

I may have misunderstood how SymbolGenerator.Symbol works, but my train of thought was: When a .bzl file containing a StarlarkFunction is reevaluated, the result will be a new StarlarkFunction object that is (obviously) not reference equal to the previous one, but it would still be equal according to its equals implementation (which delegates to the SymbolGenerator.Symbol token). Thus if we used equals here, the interner may return the old StarlarkFunction object when looking up the new one, thus potentially returning a stale implementation.

@brandjon
Copy link
Member

brandjon commented Oct 2, 2024

Ok, I think I understand. So currently the interner interns the VectorArg object, based on its equals(), allowing for sharing across multiple executions of the same (or even different) add_all() line. This change pushes the map_each callback into the VectorArg so that it can benefit from the same interning instead of occupying an additional slot in the arguments sequence. But we have to maintain the property that VectorArgs that have distinct semantics never compare equal under their equals() method, over the lifetime of the interner (which is static).

The general concern with reference equality in Starlark values is that it introduces the possibility that what should be an identical re-evaluation of Starlark code produces behaviorally different results. Since VectorArg itself is not the value of a Starlark expression, this is probably not an issue here. If a Starlark re-evaluation occurs, the new VectorArg won't be equal to the old and will occupy its own slot in the interner, but I don't think that's a problem.

We also use reference equality in CommandLineItemMapEachAdaptor#equals, so it's not like this is unprecedented.

@justinhorvitz
Copy link
Contributor

Thus if we used equals here, the interner may return the old StarlarkFunction object when looking up the new one, thus potentially returning a stale implementation.

I refer to this phenomenon as "interner resurrection," which I coined when working on https://cs.opensource.google/bazel/bazel/+/f6344ffcacdea6c4a61e112d0f60beda8068eac5. We can keep a weakly-referenced object alive by getting an interner hit before a GC collects it, and then we have duplicates. I will update the code comment to make the intent more clear.

@copybara-service copybara-service bot closed this in 53e7910 Oct 2, 2024
@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Oct 2, 2024
@fmeum fmeum deleted the intern-map-each-2 branch October 2, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Configurability platforms, toolchains, cquery, select(), config transitions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants