Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for Skylark rules and genrules to declare ResourceSets #6477

Open
ghost opened this issue Oct 23, 2018 · 23 comments
Open

API for Skylark rules and genrules to declare ResourceSets #6477

ghost opened this issue Oct 23, 2018 · 23 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request

Comments

@ghost
Copy link

ghost commented Oct 23, 2018

Description of the problem / feature request:

My project uses multithreaded code emitters written in Java. They may eat CPU and RAM, and when running concurrently may overwhelm the build host. It'd be great if I could somehow signal to Bazel's scheduler that these processes are heavier so that it could adjust accordingly.

Feature requests: what underlying problem are you trying to solve with this feature?

When porting a project from Maven to Bazel, I observe longer build times with Bazel. Looking at the critical path, I see that a few actions--most notably one of the code emitters--takes significantly longer under Bazel (23s) than when run from the command line (15s) or Maven.

If I understand the code correctly (GenRuleAction.java), Bazel treats all genrules the same. They're assumed to consume ~300M of RAM and a single CPU. My tasks, however, typically occupy 2-4 CPUs, so I'd like Bazel to take this into account when scheduling actions running on my build host.

I'm still too new to Bazel to suggest a concrete API, but maybe something like

genrule(
    ...,
    resources=(512, 3.0, 0.0),  # Corresponds to ResourceSet: 512M, 3 CPUs, 0/default for I/O
)

# or

genrule(
    ...,
    resources_ram=512,
    resources_cpu=3.0,
)    

would be feasible? (Ditto for plain rules, too!)

What operating system are you running Bazel on?

Linux (CentOS 6.6)

What's the output of bazel info release?

$ bazel info release
release 0.16.1- (@non-git)

(This is 0.18.0rc8.)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

n/a. This is just a feature request.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

n/a. This is just a feature request.

Have you found anything relevant by searching the web?

#988 touches on a similar topic.

Any other information, logs, or outputs that you want to share?

@jin
Copy link
Member

jin commented Oct 23, 2018

cc @ulfjack @jmmv

@jmmv
Copy link
Contributor

jmmv commented Oct 24, 2018

I think this makes a lot of sense. (The need for this came up as well during the Bazel hackathon but I cannot remember the details right now.)

I don't like the first proposal of using a tuple: unnamed parameters are very hard to read. I prefer either a dictionary with named quantities, the separate attributes, or maybe better, a higher-level data type to represent the resource set with named fields (a la collections.namedtuple).

I'd also try to avoid floats even if the backing ResourceSet supports them (because I think we should get rid of them). For CPU reservations, floats are very hard to deal with, and for I/O, we can just represent them as an integer in the [0,100] range.

@ittaiz
Copy link
Member

ittaiz commented Oct 24, 2018 via email

@ulfjack
Copy link
Contributor

ulfjack commented Oct 24, 2018

We already have a mechanism to configure cpu, but not one to configure ram. I don't think adding a mechanism to configure I/O makes any sense, given that the use of the "IO" resource in Bazel is inconsistent and more to address one-off restriction cases than as any sort of general I/O model. Basically, what I/O is is completely undefined and people have been abusing the setting to implement random restrictions. I think those are basically all wrong because they inherently don't scale. Bazel assumes a machine has "1.0 I/O", so a setting of 0.3 means "don't run more than 3 of these at the same time", regardless of whether I have an uber-powerful 52 core workstation with x-point high throughput memory in a raid-0 configuration for disk or a puny 2 core laptop with a spinning platter.

The way to configure cpu is to set tags = ["cpu:3"]. It's possible that this isn't properly processed everywhere, but that would be easy to fix. Arguably, using tags isn't ideal because it's not typesafe. However, adding this to all sorts of rules and actions as custom type-safe fields isn't ideal either, and I think this solution is a reasonable compromise. Certainly, if someone were to add "mem:" or similar, I would insist that the user must provide a unit at the end, e.g., "mem:5g" or "mem:3kb".

@meisterT meisterT added P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Execution and removed untriaged labels Nov 29, 2018
@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019
@dslomov dslomov removed the team-Performance Issues for Performance teams label Feb 15, 2019
@dslomov
Copy link
Contributor

dslomov commented Feb 15, 2019

Please do not assign issues to more than one team.

@Globegitter
Copy link

Globegitter commented Jul 3, 2019

@beasleyr-vmw I think this is great, but is it also possible to specify cpu and ram in a dynamic way, similar to how local_ram_resources and local_cpu_resources can be set dynamically? I.e. something like HOST_CPUS-1 etc. As I am often used to setting multi-threaded apps based on the available host cpus available rather than a fully static amount.

@kastiglione
Copy link
Contributor

kastiglione commented Dec 30, 2019

Note that the cpu:n feature works only for tests, not other actions. From Ulf's comment above, I thought it was generally available, but it's not.

To do some local testing, I hacked in cpu:n support for Starlark actions, but in my initial testing of swift builds, the total build time was not better for our build (which has many small modules). I'm not sure if I'll continue pursuing it. kastiglione@0a1e78b

@IlyaNiklyaev
Copy link

We have a big project with heavy compilations and heavy tests. Tests also require some resources which are not CPU or RAM (for example, GPU memory or special CPU mode).
For RAM-heavy tests we initially set exclusive tag. I've implemented ram:<int> tag just like cpu:<int> and it works like a charm. But there are some issues:

  • this option isn't flexible, I'd like to be able to multiply RAM requirements for all targets globally (for example, for sanitizers or code coverage runs)
  • this way of setting requirements by tags tends to become more complex if we'll add some special resources (like GPU mem)

So I guess we need two things:

  • adjustable memory estimations for test sizes instead of hardcoded ones (by option like --test_sizes=100M,500M,1G,5G)
  • some mechanism of describing resources/requirements in more flexible way

As for latter, I guess it may look something like this in rules:

cc_test(
  ...
  resources: {
    "cpu": "2",
    "ram": "2G",        # not an ideal solution, but may be useful 
    "gpu_ram": "2",
    ...
  }
  ...
)

As for new types of resources like gpu_ram we can use some new workspace rule like this:

resource(
  name = "gpu_ram",
  capacity = 16,
)

I think we can contribute adjustable memory estimations for now. A mechanism of describing custom resources looks much bigger.

@juliexxia
Copy link
Contributor

juliexxia commented Feb 25, 2020

I haven't looked too too deeply into this comment thread but the new Action Groups proposal might be of interest to you specifically the bit on exec_properties

@brentleyjones
Copy link
Contributor

@kastiglione

To do some local testing, I hacked in cpu:n support for Starlark actions, but in my initial testing of swift builds, the total build time was not better for our build (which has many small modules). I'm not sure if I'll continue pursuing it.

Were you setting it to mimic what was being passed in for -j? I assume that having that value match what swift is using would be best. I also assume that we would want the value to be dynamic, to reflect that some compiles will use fewer jobs.

@IlyaNiklyaev
Copy link

@juliexxia
Sorry for the late response. Yes, exec_properties would cover the case of "custom resources" like GPU memory (actually, even without the proposal you mentioned), but AFAIK Bazel's local ResourceManager doesn't respect exec_properties. Maybe you know about any plans/proposals of supporting exec_properties in ResourceManager?

BTW is it a good idea to do that?

@juliexxia
Copy link
Contributor

Ah, You are correct. Apologies for not more deeply understanding the needs of this post earlier. exec_properties is meant to be read by remote execution machinery and isn't supported by the local ResourceManager. Bazel is general doesn't do any real parsing of exec_properties, just forwards it along. I am not aware of any plans to make ResourceManager respect exec_properties and agreed, I'm not sure it's a great idea to do that.

@emidln
Copy link

emidln commented Aug 20, 2020

Asking for the underinformed, but why is it a problem if the local ResourceManager respects certain exec_properties? If this is a needed piece of a functionality (and I'd argue that it is, bazel habitually underestimates my static CppLink actions to the point I might have to run a fork to make the memory estimate configurable), we'd need some way of specifying this. Is have a local_exec_properties and a remote_exec_properties better than declaring a single exec_properties which is interpreted to the best of a ResourceManager's abilities?

@uri-canva
Copy link
Contributor

From the discussion so far it sounds like having ResourceManager interpret exec_properties would be exactly what we're looking for. @juliexxia what are the reasons why it might not be a good idea?

@ulfjack
Copy link
Contributor

ulfjack commented Sep 29, 2020

I'm in favor of making ResourceManager obey exec_properties where appropriate. It seems like a good way to unify local and remote execution, rather than treating them as inherently separate.

IIRC, @katre said in another thread that exec_properties is purely for remote execution; adding him here for visibility.

@katre
Copy link
Member

katre commented Sep 29, 2020

I think I wasn't clear in my previous emails with @ulfjack: exec_properties isn't only for remote execution, it's for any execution strategy that is interested in the execution platform (and with the Execution Platforms vs Strategies proposal in development, that's all of them).

However, Bazel does not currently set any meanings for the keys in exec_properties: that is up to the execution strategy. Currently the keys used by Bazel's Remote Execution API are defined in the API description, and are only meaningful in the context of that strategy.

We could have the local execution strategies use some of these keys as well, but this raises the problem that Google's internal remote execution uses a completely different set of keys. We'd need to figure out a way to parameterize those keys to work with the existing internal/external remote execution systems. This makes things a bit harder for those of us working with both systems, and I'm sure is frustrating to people only using Bazel, but it does need to be considered for us.

@IlyaNiklyaev
Copy link

As far as I can see, we can go further in one of two ways:

  1. Design and support a predefined set of properties. In that case local or remote execution systems may support a subset of it, and each property should have some certain meaning.
  2. Let users describe resources in BUILD/WORKSPACE/bzl files. These descriptions may, for example, contain some executable scripts to interact with hardware or smth.

@benjaminp
Copy link
Collaborator

d7f0724

@joeljeske
Copy link
Contributor

@benjaminp and @wilwell thanks! That looks a fantastic step in the right direction.

This looks like it will be very helpful for allowing rule authors to determine the resources needed for a specific action.

For generic rulesets, it does seem like it might be a bit more difficult to adopt. If I understand the constraints correctly, the rule itself must reference a global starlark function that is callable within the rule at the time of the declaring the run action. This makes it difficult to the consumer of a rule to determine what resources should be reserved for a particular action.

For example, rules_nodejs has a npm_package_bin generic rule that allows a consumer to execute an arbitrary NPM tool with arbitrary arguments and file inputs. I don't see a good way for npm_package_bin to allow its consumers to specify the resources that would be needed for this rule given the constraints of always requiring a global starlark function.

It does make sense why you could not allow lambdas or nested functions. Would it be feasible to allow rule authors to specify the resource_set dictionary directly, as opposed to a function that defines it? That would make it feasible for these generic rules to accept some form of attrs so the consumers can specify their constraints.

@c-nuro
Copy link

c-nuro commented Mar 24, 2022

+1 for @joeljeske's ask above.

Another scenario is for rules_foreign_cc that invokes make to compile. It needs -j parameter to explicitly run parallized, so the resource set could be directly determined by that parameter, but it's hard to infer from the input of the global function.

@cloudhan
Copy link

cloudhan commented Mar 30, 2022

+1 for @joeljeske's ask above.
+1 for @c-nuro's case.

Why pass a callable, while the callable cannot have a full overview of the action. Why not just let the rule author pass the resource set dict in directly, or even better, a ResourceSetInfo. At the moment of actions.run is in construction, the rule author almost have all the information he can get, the so called input_size is also available.

And use a callable is a weird design choice, bazel is not designed a round higher order function in any means...

And, what does the resource set mean? Does it impose the hard limit? Or just a hint to the scheduler? In case I still want to over subscribe the CPU, you know, not everything can be always embarrassingly parallel, there is always something must be serial, and it take time, and sometimes you cannot easily break them into multiple actions.run

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023
@fmeum
Copy link
Collaborator

fmeum commented Jun 13, 2023

@bazelbuild/triage not stale

@sgowroji sgowroji removed the stale Issues or PRs that are stale (no activity for 30 days) label Jun 13, 2023
@iancha1992 iancha1992 added the not stale Issues or PRs that are inactive but not considered stale label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.