Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context manager for contiguous computations #7795

Open
mrocklin opened this issue Apr 20, 2023 · 7 comments
Open

Context manager for contiguous computations #7795

mrocklin opened this issue Apr 20, 2023 · 7 comments

Comments

@mrocklin
Copy link
Member

Motivation

For example, groups (like us!) run benchmarks and want to group sets of computations together into one cohesive job. We also want to record information about that computation that we know, for example the name of the benchmark, the hardware on which it was run, etc..

Example

with client.computation(tags={...}):
    df = df.persist()

    out1 = df.x.sum().compute()
    out2 = df.y.sum().compute()

This name computation is bad. We should find a better one.

Question: should this be allowable without a context manager? @crusaderky is mildly positive on this.

What should happen here?: these should not nest. That should raise an errors

    with client.computation():
        with client.computation():

@crusaderky @hendrikmakait @j-bennet

@ntabris
Copy link
Contributor

ntabris commented Apr 20, 2023

Why shouldn't they nest? From user-perspective, one could have a larger unit of compute which is made up of smaller units, and be interested in comparing larger units and smaller units.

@mrocklin
Copy link
Member Author

I agree with you that that seems semantically meaningful. Implementation-wise this is a one-to-many relationship with task groups currently. I'd suggest keeping that structure for a first pass and considering nesting (or other structures) afterwards if prioritization recommends it.

@fjetter
Copy link
Member

fjetter commented Apr 27, 2023

I wouldn't dismiss the nested structure either. If we went for nesting I think the direct prefix to computation mapping should still be 1to1.
The Computation tree would then be just a relation between individual computations and the prefixes are attached to the leafs.

Not a strong opinion but I could see this becoming very useful when combined with more instrumentation

@crusaderky
Copy link
Collaborator

The biggest reason against nesting is that, without it, "the current and most recent computation" is always the last element of the Scheduler.computations.deque. Everything more than it feels like overengineering.

@fjetter
Copy link
Member

fjetter commented May 10, 2023

Question: should this be allowable without a context manager?

From a UX perspective I could see this being useful without a context manager but sticking to context managers will simplify implementation significantly. I suggest to stick to contextmanagers (and decorators) for now.

The biggest reason against nesting is that, without it, "the current and most recent computation" is always the last element of the Scheduler.computations.deque. Everything more than it feels like overengineering.

If we stick to a context manager I don't think this ambiguity even exists. The contextmanager could generate an ID (or the user provides a name) that references the computation explicitly avoiding any kind of ambiguity. We wouldn't even need any "what is the most current/recent computation" mechanism, would we?

@crusaderky
Copy link
Collaborator

crusaderky commented May 10, 2023

If we stick to a context manager I don't think this ambiguity even exists. The contextmanager could generate an ID (or the user provides a name) that references the computation explicitly avoiding any kind of ambiguity. We wouldn't even need any "what is the most current/recent computation" mechanism, would we?

I'm lost. Are you suggesting to have on the scheduler a stack of "current computation" objects that the context managers add/remove from?

@crusaderky
Copy link
Collaborator

This should be superseded by #7860

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants