Context manager for contiguous computations #7795

mrocklin · 2023-04-20T15:49:14Z

Motivation

For example, groups (like us!) run benchmarks and want to group sets of computations together into one cohesive job. We also want to record information about that computation that we know, for example the name of the benchmark, the hardware on which it was run, etc..

Example

with client.computation(tags={...}):
    df = df.persist()

    out1 = df.x.sum().compute()
    out2 = df.y.sum().compute()

This name computation is bad. We should find a better one.

Question: should this be allowable without a context manager? @crusaderky is mildly positive on this.

What should happen here?: these should not nest. That should raise an errors

    with client.computation():
        with client.computation():

@crusaderky @hendrikmakait @j-bennet

The text was updated successfully, but these errors were encountered:

ntabris · 2023-04-20T16:02:41Z

Why shouldn't they nest? From user-perspective, one could have a larger unit of compute which is made up of smaller units, and be interested in comparing larger units and smaller units.

mrocklin · 2023-04-22T10:37:00Z

I agree with you that that seems semantically meaningful. Implementation-wise this is a one-to-many relationship with task groups currently. I'd suggest keeping that structure for a first pass and considering nesting (or other structures) afterwards if prioritization recommends it.

fjetter · 2023-04-27T15:18:14Z

I wouldn't dismiss the nested structure either. If we went for nesting I think the direct prefix to computation mapping should still be 1to1.
The Computation tree would then be just a relation between individual computations and the prefixes are attached to the leafs.

Not a strong opinion but I could see this becoming very useful when combined with more instrumentation

crusaderky · 2023-05-04T16:21:55Z

The biggest reason against nesting is that, without it, "the current and most recent computation" is always the last element of the Scheduler.computations.deque. Everything more than it feels like overengineering.

fjetter · 2023-05-10T13:01:20Z

Question: should this be allowable without a context manager?

From a UX perspective I could see this being useful without a context manager but sticking to context managers will simplify implementation significantly. I suggest to stick to contextmanagers (and decorators) for now.

The biggest reason against nesting is that, without it, "the current and most recent computation" is always the last element of the Scheduler.computations.deque. Everything more than it feels like overengineering.

If we stick to a context manager I don't think this ambiguity even exists. The contextmanager could generate an ID (or the user provides a name) that references the computation explicitly avoiding any kind of ambiguity. We wouldn't even need any "what is the most current/recent computation" mechanism, would we?

crusaderky · 2023-05-10T17:31:40Z

If we stick to a context manager I don't think this ambiguity even exists. The contextmanager could generate an ID (or the user provides a name) that references the computation explicitly avoiding any kind of ambiguity. We wouldn't even need any "what is the most current/recent computation" mechanism, would we?

I'm lost. Are you suggesting to have on the scheduler a stack of "current computation" objects that the context managers add/remove from?

crusaderky · 2023-05-25T12:23:51Z

This should be superseded by #7860

fjetter mentioned this issue Apr 27, 2023

Frame capture improvements #7780

Open

crusaderky mentioned this issue May 10, 2023

Computations meta-issue #7830

Open

This was referenced May 22, 2023

Worker crash causes computations to overlap #7825

Open

Fine performance metrics: apportion to Computations #7776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context manager for contiguous computations #7795

Context manager for contiguous computations #7795

mrocklin commented Apr 20, 2023

ntabris commented Apr 20, 2023

mrocklin commented Apr 22, 2023

fjetter commented Apr 27, 2023

crusaderky commented May 4, 2023

fjetter commented May 10, 2023

crusaderky commented May 10, 2023 •

edited

Loading

crusaderky commented May 25, 2023

Context manager for contiguous computations #7795

Context manager for contiguous computations #7795

Comments

mrocklin commented Apr 20, 2023

Motivation

Example

ntabris commented Apr 20, 2023

mrocklin commented Apr 22, 2023

fjetter commented Apr 27, 2023

crusaderky commented May 4, 2023

fjetter commented May 10, 2023

crusaderky commented May 10, 2023 • edited Loading

crusaderky commented May 25, 2023

crusaderky commented May 10, 2023 •

edited

Loading