Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor marker file logic #21182

Closed
wants to merge 2 commits into from
Closed

Refactor marker file logic #21182

wants to merge 2 commits into from

Conversation

Wyverald
Copy link
Member

@Wyverald Wyverald commented Feb 2, 2024

Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the TYPE:KEY VALUE format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in RepositoryFunction, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

  • The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
    • This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
  • Each type of recorded inputs is a subclass of RepoRecordedInput, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
  • We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.
@Wyverald Wyverald marked this pull request as ready for review February 2, 2024 01:23
@github-actions github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Feb 2, 2024
@Wyverald Wyverald changed the title [DRAFT] Refactor marker file logic Refactor marker file logic Feb 2, 2024
@Wyverald Wyverald requested review from meteorcloudy, fmeum and SalmaSamy and removed request for lberki, ted-xie and ahumesky February 2, 2024 01:24
@lberki
Copy link
Contributor

lberki commented Feb 2, 2024

Shouting in from the sidelines here, if you are heavily changing the format of the marker files, consider replacing it with some less ad-hoc file format (maybe JSON, or hell, even Starlark-but-only-data-structures)?

@fmeum
Copy link
Collaborator

fmeum commented Feb 2, 2024

I would go even further: I see value in aligning the repo marker file format with that of module extension lockfile entries. After all, there doesn't seem to be a conceptual reason for not supporting the same capabilities for module extensions eventually. We don't need to implement this now, but could switch over to Gson/JSON for serialization and possibly even use the same logic for verification of basic file accesses and recorded repo mapping entries.

@Wyverald
Copy link
Member Author

Wyverald commented Feb 2, 2024

it is indeed my plan to eventually switch to JSON. it's actually forwards compatible too, since the current marker file format treats the whole first line as the rule hash; so if the first line became {, older Bazel versions would simply think the rule hash changed and refetch.

And yeah, this PR paves the way to eventually merge this logic with the module extension up-to-date checks.

Otherwise, though, this PR is just a refactor -- it doesn't actually change the marker file format at all.

@meteorcloudy meteorcloudy added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Feb 5, 2024
@copybara-service copybara-service bot closed this in 5ab21e0 Feb 6, 2024
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Feb 6, 2024
@Wyverald Wyverald deleted the wyv-watch-api branch February 12, 2024 22:06
Wyverald added a commit that referenced this pull request Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Wyverald added a commit that referenced this pull request Feb 20, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards #20952 and #12227.

Closes #21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
rrbutani pushed a commit to rrbutani/bazel that referenced this pull request Feb 28, 2024
Marker files today store all predeclared inputs as one hash (on the first line of the file), and then each recorded input as a following line in the `TYPE:KEY VALUE` format. This commit refactors the parsing/stringification logic of recorded inputs so that they're not all clumped in big methods in `RepositoryFunction`, to pave the way for more recorded input types (watching directories, etc) and more places to write recorded input data (for the true repo cache).

- The StarlarkSemantics object is no longer treated as a recorded input (only recorded for Starlark repo rules, ignored for native repo rules), but as a predeclared input instead (i.e. hashed on the first line).
  - This slightly simplifies logic, and since the existing native repo rules are either local (local_repository, new_local_repository, local_config_platform) or being Starlarkified (the two Android repo rules), it will have minimal visible impact.
- Each type of recorded inputs is a subclass of `RepoRecordedInput`, which knows how to stringify itself, verify its own up-to-date-ness, and how to parse itself from a string.
- We also try to collect as many SkyKeys needed to verify up-to-date-ness as possible in one go and do a mass Skyframe evaluation. This avoids a fair amount of Skyframe restarts (unlikely to have super big impact on performance, but is a nice thing to do).

Work towards bazelbuild#20952 and bazelbuild#12227.

Closes bazelbuild#21182.

PiperOrigin-RevId: 604522692
Change-Id: Idc18ab202adb601cda47914c48642a6c9039da40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants