Exclude IPython code from computations #7788

milesgranger · 2023-04-18T09:05:29Z

Tests added / passed
Passes pre-commit run --all-files

@ntabris wasn't sure how pedantic one should be to verify the run_code is indeed the IPython's run_code we want to ignore, so feel free to recommend adjustments. :)

milesgranger · 2023-04-18T09:43:42Z

Updated to ignore any frames in IPython.core.interactiveshell module, since other random functions from there also showed up and turned into whack-a-mole. Do you think anything within IPython ought to be ignored? Right now, the original issue will now give back:

In [1]: import dask
   ...: from distributed import Client
   ...: client = Client()
   ...: dask.config.set({"distributed.diagnostics.computations.nframes": 2})
   ...:
   ...: def foo(x): print(x); return x;
   ...: def zip(x): return foo(x);
   ...: N = client.map(zip, range(2))

In [2]: print(client.cluster.scheduler.computations[-1].code[0][0])
    def run_cell(
        self,
        raw_cell,
        store_history=False,
        silent=False,
        shell_futures=True,
        cell_id=None,
    ):
        """Run a complete IPython cell.

        Parameters
        ----------
        raw_cell : str
            The code (including IPython code such as %magic functions) to run.
        store_history : bool
            If True, the raw and translated cell will be stored in IPython's
            history. For user code calling back into IPython's machinery, this
            should be set to False.
        silent : bool
            If True, avoid side-effects, such as implicit displayhooks and
            and logging.  silent=True forces store_history=False.
        shell_futures : bool
            If True, the code will share future statements with the interactive
            shell. It will both be affected by previous __future__ imports, and
            any __future__ imports in the code will affect the shell. If False,
            __future__ imports are not shared in either direction.

        Returns
        -------
        result : :class:`ExecutionResult`
        """
        result = None
        try:
            result = self._run_cell(
                raw_cell, store_history, silent, shell_futures, cell_id
            )
        finally:
            self.events.trigger('post_execute')
            if not silent:
                self.events.trigger('post_run_cell', result)
        return result

Which is coming from a separate module in IPython.

github-actions · 2023-04-18T10:34:08Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      20 files ±  0       20 suites ±0 11h 13m 8s ⏱️ - 42m 3s
  3 642 tests +  2   3 533 ✔️ +  2   108 💤 ±0 1 ❌ ±0
35 210 runs +20 33 447 ✔️ +16 1 762 💤 +4 1 ❌ ±0

For more details on these failures, see this check.

Results for commit f4cc4d0. ± Comparison against base commit 6b04277.

♻️ This comment has been updated with latest results.

ntabris · 2023-04-18T13:08:33Z

turned into whack-a-mole

Yeah, I briefly tried to address this by adding modules to the list at

distributed/distributed/distributed.yaml

Line 274 in b83c055

ignore-modules:

and new things kept showing up... including traitlets, so just doing IPython namespace also wasn't enough.

The goal here is just to collect code that's interesting and relevant to the Dask cluster is going to be doing. I don't think anything in IPython would count. I suspect that once you hit an IPython-related frame, then that and all higher frames would not be interesting and relevant. But I'm not certain about that... maybe IPython does weird things I wouldn't expect.

milesgranger · 2023-04-25T08:42:02Z

I suspect that once you hit an IPython-related frame, then that and all higher frames would not be interesting and relevant.

Went with this suspicion (b1b8585) and seems to work well enough for me in my trials.

For example, the original issue is as expected:

In [1]: import dask
   ...: from distributed import Client
   ...: client = Client()
   ...: dask.config.set({"distributed.diagnostics.computations.nframes": 2})
   ...:  
   ...: def foo(x): print(x); return x;
   ...: def zip(x): return foo(x);
   ...: N = client.map(zip, range(2))
   ...: import time
   ...: time.sleep(2)
In [2]: print(client.cluster.scheduler.computations[-1].code[0][0])
1
0
import dask
from distributed import Client
client = Client()
dask.config.set({"distributed.diagnostics.computations.nframes": 2})

def foo(x): print(x); return x;
def zip(x): return foo(x);
N = client.map(zip, range(2))
import time
time.sleep(2)
print(client.cluster.scheduler.computations[-1].code[0][0])

ntabris · 2023-04-25T13:04:19Z

Went with this suspicion (b1b8585) and seems to work well enough for me in my trials.

Are you assuming there's only a single frame of user code? (I gave your code a quick read and that's what it looks like). If so, that's not correct.

Here's another example where you can see this:

import dask
from distributed import Client
client = Client()
dask.config.set({"distributed.diagnostics.computations.nframes": 3})

def foo(x): print(x); return x;
def bar(x): return client.map(foo, range(x))

N = bar(3)

client.cluster.scheduler.computations[-1].code[0][2] is def bar(x): return client.map(foo, range(x)), [0][1] is import dask through N = bar(3), and [0][0] is ipython code.

milesgranger · 2023-05-08T10:10:04Z

Sorry for the delay, and you were (unsurprisingly) quite right. I've updated to take the first encountered ipython related frame onwards, so the example now works without taking just the last frame of user code

milesgranger · 2023-05-15T09:33:27Z

@ntabris when you have time to give this another look, I think it may be what we're after now.

distributed/client.py

ntabris · 2023-05-16T16:27:38Z

@milesgranger can you say more about what testing you've done / are expecting someone else to do? I have a few nits, but if this has been tested and works, I'd be happy to approve.

hendrikmakait · 2023-05-16T17:42:53Z

@milesgranger: Is there a way we can automatically test this? What additional packages would we need to install on CI?

milesgranger · 2023-05-22T09:38:50Z

@hendrikmakait @ntabris There exists no tests for the current IPython logic in this same function (AFAICT), so was kinda going off that being 'okay'.

Given that it is testing specifically if a frame is part of IPython.core.interactiveshell, seems maybe the only way to test it is to use mock to simulate an interactiveshell frame and check from there. Is this something you'd like to pursue?

hendrikmakait · 2023-05-22T14:13:43Z

@hendrikmakait @ntabris There exists no tests for the current IPython logic in this same function (AFAICT), so was kinda going off that being 'okay'.

I always get nervous around untested code in particular if I don't know how if this is more of a fringe case to handle or something that happens frequently.

Given that it is testing specifically if a frame is part of IPython.core.interactiveshell, seems maybe the only way to test it is to use mock to simulate an interactiveshell frame and check from there. Is this something you'd like to pursue?

Something we could explore would be executing code via https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.interactiveshell.html#IPython.core.interactiveshell.InteractiveShell.run_cell. It's less e2e compared to actually running an interactive shell session, but might get us most of the way without having to fiddle with subprocesses or something like that.

milesgranger · 2023-05-23T13:47:49Z

Got something together with testing ignoring IPython frames bb7f77a, but wasn't as nice as I'd hoped for. Seems like some conflict with the current event loop due to trying to use a client inside of this IPython InteractiveShell. Tried a number of things, including but not limited to:

Not including @gen_cluster results in failing run_cell not returning anything
Not running in ProcessPool then interferes with the current event loop

Will try a few more things later, but putting it here for now in case anyone has suggestions or wanted to try their hand at it.

hendrikmakait · 2023-05-23T15:31:23Z

Maybe @graingert can help with event-loop-related problems?

hendrikmakait · 2023-05-24T11:49:57Z

I've also been trying my hand at creating a test and this is the best version I could come up with:

@gen_cluster(client=False)
async def test_ignore_ipython(s, a, b):
    pytest.importorskip("IPython")
    from IPython.core.interactiveshell import InteractiveShell

    source_code = """
import time
import dask
from distributed import Client
dask.config.set({{"distributed.diagnostics.computations.nframes": 3}})
with Client("{}") as client:
    def foo(x): print(x); return x;
    def bar(x): return client.map(foo, range(x))

    N = client.gather(bar(3))
""".format(s.address)

    with concurrent.futures.ProcessPoolExecutor() as executor:
            result = await asyncio.get_running_loop().run_in_executor(
                executor,
                run_in_ipython,
                source_code,
            )
    assert result.success
    computations = s.computations

    assert len(computations) == 1  # 1 computation
    computation = computations[0]
    assert len(computation.code) == 1  # 1 code
    code = computation.code[0]
    assert len(code) == 2 # 2 frames, even with nframes 3 other is ipython code

    def normalize(s):
        return re.sub(r"\s+", " ", s).strip()

    assert normalize(code[0]) == normalize(source_code)
    assert normalize(code[1]) == "def bar(x): return client.map(foo, range(x))"

Not running IPython in a separate process kept leaking a thread for me, so I couldn't use the run_cell_async functionality instead.

distributed/tests/test_client.py

Co-authored-by: Hendrik Makait <hendrik@makait.com>

distributed/client.py

graingert · 2023-05-25T11:37:00Z

distributed/tests/test_client.py

+    # directly in InteractiveShell (and does) but requires `--reruns=1`
+    # due to some underlying lag in the asyncio if ran by itself, but will
+    # otherwise run fine in the suite of tests.
+    with concurrent.futures.ProcessPoolExecutor() as executor:


I think it's a good idea to pass mp_context=multiprocessing.get_context("spawn") as this defaults to fork on linux, which I'm surprised is working in asyncio

crusaderky

Ready to merge after CI finishes

milesgranger requested a review from fjetter as a code owner April 18, 2023 09:05

milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from 1ab1e79 to 889c0dc Compare April 18, 2023 09:44

milesgranger added 4 commits May 8, 2023 11:54

Exclude IPython code from computations

07bafa5

Fixup: ignore IPython's interactive module functions

5b3aaac

Return last frame/code when ipython

6988425

Update to take first ipython from onwards

bc660ac

milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from b1b8585 to bc660ac Compare May 8, 2023 10:08

hendrikmakait added the needs review Needs review from a contributor. label May 10, 2023

crusaderky mentioned this pull request May 10, 2023

Computations meta-issue #7830

Open

ntabris reviewed May 16, 2023

View reviewed changes

distributed/client.py Outdated Show resolved Hide resolved

ntabris reviewed May 16, 2023

View reviewed changes

distributed/client.py Outdated Show resolved Hide resolved

milesgranger added 2 commits May 22, 2023 10:43

Rename is_ipython_idx -> ipython_idx

4bfbbff

Exclude first ipython entry function

865b222

Impl IPython interactive test for ignoring frames

bb7f77a

milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from e90d56b to bb7f77a Compare May 23, 2023 13:39

Fixup: Adjust to run w/o new process

35956fd

milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from 611b9f9 to 35956fd Compare May 24, 2023 10:50

hendrikmakait reviewed May 24, 2023

View reviewed changes

distributed/tests/test_client.py Outdated Show resolved Hide resolved

milesgranger and others added 2 commits May 24, 2023 14:06

Update distributed/tests/test_client.py

0ca5805

Co-authored-by: Hendrik Makait <hendrik@makait.com>

Fix suggestion

5f4ca92

hendrikmakait reviewed May 24, 2023

View reviewed changes

distributed/client.py Outdated Show resolved Hide resolved

milesgranger added 2 commits May 24, 2023 21:19

Refactor: join IPython logic and remove indexing

a66be39

Fixup: Fix pattern reference before assignment

2e249bc

graingert reviewed May 25, 2023

View reviewed changes

crusaderky added 2 commits May 25, 2023 15:09

Merge branch 'main' into 7777-exclude-ipython-code-from-computations

85b8aa0

Code review

f4cc4d0

crusaderky approved these changes May 25, 2023

View reviewed changes

hendrikmakait merged commit ff6327f into dask:main May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude IPython code from computations #7788

Exclude IPython code from computations #7788

milesgranger commented Apr 18, 2023 •

edited

Loading

milesgranger commented Apr 18, 2023

github-actions bot commented Apr 18, 2023 •

edited

Loading

ntabris commented Apr 18, 2023

milesgranger commented Apr 25, 2023

ntabris commented Apr 25, 2023

milesgranger commented May 8, 2023

milesgranger commented May 15, 2023

ntabris commented May 16, 2023

hendrikmakait commented May 16, 2023

milesgranger commented May 22, 2023

hendrikmakait commented May 22, 2023 •

edited

Loading

milesgranger commented May 23, 2023 •

edited

Loading

hendrikmakait commented May 23, 2023

hendrikmakait commented May 24, 2023 •

edited

Loading

graingert May 25, 2023

crusaderky left a comment

Exclude IPython code from computations #7788

Exclude IPython code from computations #7788

Conversation

milesgranger commented Apr 18, 2023 • edited Loading

milesgranger commented Apr 18, 2023

github-actions bot commented Apr 18, 2023 • edited Loading

Unit Test Results

ntabris commented Apr 18, 2023

milesgranger commented Apr 25, 2023

ntabris commented Apr 25, 2023

milesgranger commented May 8, 2023

milesgranger commented May 15, 2023

ntabris commented May 16, 2023

hendrikmakait commented May 16, 2023

milesgranger commented May 22, 2023

hendrikmakait commented May 22, 2023 • edited Loading

milesgranger commented May 23, 2023 • edited Loading

hendrikmakait commented May 23, 2023

hendrikmakait commented May 24, 2023 • edited Loading

graingert May 25, 2023

Choose a reason for hiding this comment

crusaderky left a comment

Choose a reason for hiding this comment

milesgranger commented Apr 18, 2023 •

edited

Loading

github-actions bot commented Apr 18, 2023 •

edited

Loading

hendrikmakait commented May 22, 2023 •

edited

Loading

milesgranger commented May 23, 2023 •

edited

Loading

hendrikmakait commented May 24, 2023 •

edited

Loading