Skip to content

Conversation

@dylwil3
Copy link
Collaborator

@dylwil3 dylwil3 commented May 5, 2025

This PR implements template strings (t-strings) in the parser and formatter for Ruff.

I have attempted to arrange the commits for easy review commit-by-commit. They are labeled by what piece of the code they touch, and I have written extended commit messages for some of them to indicate the important bits.

Please let me know if there's anything that can be made clearer or if I can help make the review process smoother!

@dylwil3 dylwil3 added the python314 Related to Python 3.14 label May 5, 2025
@dylwil3 dylwil3 mentioned this pull request May 5, 2025
1 task
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2025

mypy_primer results

No ecosystem changes detected ✅

@dylwil3 dylwil3 force-pushed the template-strings branch 2 times, most recently from 63a58bf to e870485 Compare May 8, 2025 19:18
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dylwil3 dylwil3 force-pushed the template-strings branch 6 times, most recently from cf4dcbf to ba629ee Compare May 14, 2025 22:45
@dylwil3 dylwil3 marked this pull request as ready for review May 14, 2025 22:45
@carljm
Copy link
Contributor

carljm commented May 15, 2025

So cool!

I'm assuming the ty bits here are just the minimum to not panic, and we can handle the ty comments I'd have (e.g. some tests, maybe some code sharing between infer_fstring_expression and the f-string portion of infer_tstring_expression) in a separate ty-focused PR later on. Going to leave review on this one to people who know Ruff.

@carljm carljm removed their request for review May 15, 2025 01:34
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Exciting to seeing this come together.

I've only reviewed the parser, ast and parts of the linter at this point. Overall, my impression is that we went too far with copy pasting. We should consider more carefully where it is important to have different representations for t- and f-strings and where it is fine to share the same structs or parametrizing functions if they're identical except for e.g. some token kinds. Sharing the same structures should help with reusing more code as well.

@dylwil3
Copy link
Collaborator Author

dylwil3 commented May 15, 2025

Overall, my impression is that we went too far with copy pasting

That's fair - I will see what can be done about merging more logic! Thank you!

@AlexWaygood
Copy link
Member

(I'll also leave this mostly to Micha/Dhruv, but please feel free to ping me if there's something specific you want my opinion on!)

@dhruvmanila
Copy link
Member

dhruvmanila commented May 15, 2025

Looking at Micha's review, I'll wait for the follow-up changes to de-duplicate and then review but I agree that we should parameterize the methods and do minimal changes where f-strings and t-strings are different.

@dylwil3 dylwil3 force-pushed the template-strings branch from ba629ee to 73a2e16 Compare May 16, 2025 19:08
@dylwil3
Copy link
Collaborator Author

dylwil3 commented May 16, 2025

What's the motivation for having different nodes (not questioning whether we should have different FString and TString root nodes but do all nodes have to be different?)

Just to respond in general to the code duplication (I will still work on merging logic):

My main motivation for keeping so many things separate (e.g. the AST nodes) was to mimic to some extent what is being done in the CPython implementation. Granted, much of their parser is generated, so it's possible they are more comfortable with repetition than we ought to be, but it seemed like, down the line, if changes are introduced such that the behavior of t-strings and f-strings diverges further, this would be easier to maintain. However, I understand if this is too cautious/"YAGNI"-thinking. Anyway, just wanted to put on the record that I wasn't just being lazy for laziness sake 😄

@dylwil3
Copy link
Collaborator Author

dylwil3 commented May 16, 2025

Two quick questions as I continue to merge various things:

  1. Is there a standard pattern in Rust to get around the fact that we don't have dependent types? There are several places where merging functions would be possible except that the return types are different. So like, we have a function of signature (data) -> TStringThing and a function of signature (data) -> FStringThing. I want instead to do (data, enum variant?) -> ?? // FStringThing/TStringThing depending on enum variant? I guess am I supposed to do (data) -> T and parameterize by T instead where T is allowed to either be FStringThing or TStringThing? Hopefully this question makes sense...
  2. When I made both FStringElement and TStringElement share a node type (FTStringLiteralElement) (I know you hate the name!), this broke the code generation. Is there some other way I'm supposed to do this in ast.toml or should I modify the script? (Maybe a question for @dcreager ?)

@MichaReiser
Copy link
Member

MichaReiser commented May 17, 2025

For 1. Hard to say without seeing a concrete example. The alternative is to use a trait with an associated type:

trait InterpolatedString {
	type Element;
}

fn with_string<S>(string: S) -> S::Element where S: InterpolatedString {}

My main motivation for keeping so many things separate (e.g. the AST nodes) was to mimic to some extent what is being done in the CPython implementation.

I'm fine not sharing some AST nodes, e.g., if they're handled differently by most visitors anyway. I mainly wanted to make sure it was an intentional decision (maybe it was?).

@dylwil3 dylwil3 force-pushed the template-strings branch from 0ccfe03 to f026a01 Compare May 22, 2025 17:46
@dylwil3 dylwil3 requested a review from MichaReiser May 29, 2025 21:19
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for working on this!

This seems pretty close to landing! I've a few comments but they're mostly around avoiding duplication so I'll leave it up to you.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thank you for addressign all my feedback.

My only concern left is that we use different terminology for the same thing in different places. Sometimes we use FT, sometimes we use InterpolatedString. We should make sure to only use one of the two.

@dylwil3 dylwil3 merged commit 9bbf498 into astral-sh:main May 30, 2025
35 checks passed
@dylwil3
Copy link
Collaborator Author

dylwil3 commented May 30, 2025

Thank you so much @MichaReiser and @dhruvmanila !!

@dylwil3 dylwil3 deleted the template-strings branch May 30, 2025 20:01
@AlexWaygood
Copy link
Member

🥳🥳 Congrats @dylwil3!!

dcreager added a commit that referenced this pull request Jun 2, 2025
…aration

* origin/main:
  [ty] Treat lambda functions as instances of types.FunctionType (#18431)
  [ty] Fix false positives for legacy `ParamSpec`s inside `Callable` type expressions (#18426)
  [ty] Improve diagnostics if the user attempts to import a stdlib module that does not exist on their configured Python version (#18403)
  Update taiki-e/install-action action to v2.52.4 (#18420)
  Update docker/build-push-action action to v6.18.0 (#18422)
  [ty] Fix server hang after shutdown request (#18414)
  Update Rust crate libcst to v1.8.0 (#18424)
  Update Rust crate clap to v4.5.39 (#18419)
  Update cargo-bins/cargo-binstall action to v1.12.6 (#18416)
  Update dependency mdformat-mkdocs to v4.3.0 (#18421)
  Update pre-commit dependencies (#18418)
  Update dependency ruff to v0.11.12 (#18417)
  [ty] Ensure `Literal` types are considered assignable to anything their `Instance` supertypes are assignable to (#18351)
  [ty] Promote projects to good that now no longer hang (#18370)
  Sync vendored typeshed stubs (#18407)
  [ty] Fix multithreading related hangs and panics (#18238)
  Support relative `--ty-path` in ty-benchmark (#18385)
  [ty] Update docs for Python version inference (#18397)
  [ty] Infer the Python version from the environment if feasible (#18057)
  Implement template strings (#17851)
facebook-github-bot pushed a commit to facebook/pyrefly that referenced this pull request Jun 6, 2025
Summary:
I'm trying to pull in some latest changes from upstream `ruff_python` libraries to get a sense of what an upgrade would look like (potentially getting more upstream utilities that can be used).

The change I'm pulling from is 0.11.12 (released last week). There's a new release 0.11.13 this week which contains T-string support for Python 3.14 (astral-sh/ruff#17851), but the changes there introduced nontrivial downstream breakage (i.e. expr structure gets shuffled around) so I think it makes more sense to do a separate upgrade just for that one feature.

The main backward-incompatible change in this upgrade comes from this PR: astral-sh/ruff#17587. The main consequence is that `SourceLocation` now no longer directly contains line and column info for user-visible texts-- a new structure `LineColumn` is now used for that purpose, and `SourceLocation` now represents the "raw" line and character offset data in the original string. The reason why the "raw" numbers and "user-visible" numbers are different seem to come from Unicode's byte-offset-mark (BOM) character (I'm not super familiar with those -- read the original PR if you are interested in the details). For us, I think the main response there should be to rely on `LineColumn` instead of `SourceLocation` now. That's mostly a trivial thing to do, except there are cases where we want to convert line+column number back to a byte offset in the string and there we have to use `SourceLocation` -- technically speaking that conversion can't be made loseless so we need to be careful about where it happens. I think we perform that kind of conversion mostly in tests so we are fine. But I'll mark the place where we do it in prod to raise awareness that in certain cases it might be an issue.

There are also big changes in how the "semantic syntax checker" behaves. Good news is that a bunch of new checks were added so we can reliably detect more stuffs. Bad news is that many of the added checks require us to implement an AST visitor to track context and I don't think it's a trivial thing to do. Right now I'm just returning some dummy values to get the very basic checks working. But in the future we could come back and do more of the visit properly.

Reviewed By: ndmitchell

Differential Revision: D76156394

fbshipit-source-id: 0f55b5888259948d67400389a5efdff69c727dab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python314 Related to Python 3.14

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants