Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log filegroups only once in execution_log_json_file and reference them in all the spawns where they are used #19628

Closed
abdulsattar opened this issue Sep 25, 2023 · 1 comment
Assignees
Labels
team-Performance Issues for Performance teams type: feature request

Comments

@abdulsattar
Copy link

Description of the feature request:

filegroups are an easy way of grouping files together and using them across several spawns. If a filegroup is used across several spawns (especially if those are executed as persistent workers thousands of times), it increases the size of execution_log_json_file.

We run into this when we use rules_js with:

js_binary(
    name = "bin",
    data = [
        "//:node_modules/react",
        "//:node_modules/eslint",
        "...",
    ],
    entry_point = "index.js",
)

These node_modules collectively can be hundreds of thousands of files, which multipled with thousands of persistent worker invocations runs easily into millions of lines in execution_log_json_file with the same repeating inputs.

One simple way to address this is to use filegroup if a filegroup appears only once in the execution_log:

filegroup(
  name = "node_modules_group",
  srcs = [
        "//:node_modules/react",
        "//:node_modules/eslint",
        "...",
  ]
)

js_binary(
    name = "bin",
    data = [":node_modules_group"],
    entry_point = "index.js",
)

Which category does this issue belong to?

Performance

What underlying problem are you trying to solve with this feature?

Reduce the size of execution_log_json_file in cases where several filegroups are used.

Which operating system are you running Bazel on?

MacOS

What is the output of bazel info release?

release 6.1.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@tjgq
Copy link
Contributor

tjgq commented Sep 26, 2023

We are planning to address the shortcomings of the execution log format in Q4 (specifically, by designing a leaner format that avoids storing duplicate information). Therefore I'm going to close this in favor of #18643.

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants