Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude IPython code from computations #7788

Merged

Conversation

milesgranger
Copy link
Contributor

@milesgranger milesgranger commented Apr 18, 2023

Closes #7777

  • Tests added / passed
  • Passes pre-commit run --all-files

@ntabris wasn't sure how pedantic one should be to verify the run_code is indeed the IPython's run_code we want to ignore, so feel free to recommend adjustments. :)

@milesgranger milesgranger requested a review from fjetter as a code owner April 18, 2023 09:05
@milesgranger
Copy link
Contributor Author

Updated to ignore any frames in IPython.core.interactiveshell module, since other random functions from there also showed up and turned into whack-a-mole. Do you think anything within IPython ought to be ignored? Right now, the original issue will now give back:

In [1]: import dask
   ...: from distributed import Client
   ...: client = Client()
   ...: dask.config.set({"distributed.diagnostics.computations.nframes": 2})
   ...:
   ...: def foo(x): print(x); return x;
   ...: def zip(x): return foo(x);
   ...: N = client.map(zip, range(2))

In [2]: print(client.cluster.scheduler.computations[-1].code[0][0])
    def run_cell(
        self,
        raw_cell,
        store_history=False,
        silent=False,
        shell_futures=True,
        cell_id=None,
    ):
        """Run a complete IPython cell.

        Parameters
        ----------
        raw_cell : str
            The code (including IPython code such as %magic functions) to run.
        store_history : bool
            If True, the raw and translated cell will be stored in IPython's
            history. For user code calling back into IPython's machinery, this
            should be set to False.
        silent : bool
            If True, avoid side-effects, such as implicit displayhooks and
            and logging.  silent=True forces store_history=False.
        shell_futures : bool
            If True, the code will share future statements with the interactive
            shell. It will both be affected by previous __future__ imports, and
            any __future__ imports in the code will affect the shell. If False,
            __future__ imports are not shared in either direction.

        Returns
        -------
        result : :class:`ExecutionResult`
        """
        result = None
        try:
            result = self._run_cell(
                raw_cell, store_history, silent, shell_futures, cell_id
            )
        finally:
            self.events.trigger('post_execute')
            if not silent:
                self.events.trigger('post_run_cell', result)
        return result

Which is coming from a separate module in IPython.

@milesgranger milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from 1ab1e79 to 889c0dc Compare April 18, 2023 09:44
@github-actions
Copy link
Contributor

github-actions bot commented Apr 18, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       20 files  ±  0         20 suites  ±0   11h 13m 8s ⏱️ - 42m 3s
  3 642 tests +  2    3 533 ✔️ +  2     108 💤 ±0  1 ±0 
35 210 runs  +20  33 447 ✔️ +16  1 762 💤 +4  1 ±0 

For more details on these failures, see this check.

Results for commit f4cc4d0. ± Comparison against base commit 6b04277.

♻️ This comment has been updated with latest results.

@ntabris
Copy link
Contributor

ntabris commented Apr 18, 2023

turned into whack-a-mole

Yeah, I briefly tried to address this by adding modules to the list at

and new things kept showing up... including traitlets, so just doing IPython namespace also wasn't enough.

The goal here is just to collect code that's interesting and relevant to the Dask cluster is going to be doing. I don't think anything in IPython would count. I suspect that once you hit an IPython-related frame, then that and all higher frames would not be interesting and relevant. But I'm not certain about that... maybe IPython does weird things I wouldn't expect.

@milesgranger
Copy link
Contributor Author

I suspect that once you hit an IPython-related frame, then that and all higher frames would not be interesting and relevant.

Went with this suspicion (b1b8585) and seems to work well enough for me in my trials.

For example, the original issue is as expected:

In [1]: import dask
   ...: from distributed import Client
   ...: client = Client()
   ...: dask.config.set({"distributed.diagnostics.computations.nframes": 2})
   ...:  
   ...: def foo(x): print(x); return x;
   ...: def zip(x): return foo(x);
   ...: N = client.map(zip, range(2))
   ...: import time
   ...: time.sleep(2)
In [2]: print(client.cluster.scheduler.computations[-1].code[0][0])
1
0
import dask
from distributed import Client
client = Client()
dask.config.set({"distributed.diagnostics.computations.nframes": 2})

def foo(x): print(x); return x;
def zip(x): return foo(x);
N = client.map(zip, range(2))
import time
time.sleep(2)
print(client.cluster.scheduler.computations[-1].code[0][0])

@ntabris
Copy link
Contributor

ntabris commented Apr 25, 2023

Went with this suspicion (b1b8585) and seems to work well enough for me in my trials.

Are you assuming there's only a single frame of user code? (I gave your code a quick read and that's what it looks like). If so, that's not correct.

Here's another example where you can see this:

import dask
from distributed import Client
client = Client()
dask.config.set({"distributed.diagnostics.computations.nframes": 3})

def foo(x): print(x); return x;
def bar(x): return client.map(foo, range(x))

N = bar(3)

client.cluster.scheduler.computations[-1].code[0][2] is def bar(x): return client.map(foo, range(x)), [0][1] is import dask through N = bar(3), and [0][0] is ipython code.

@milesgranger milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from b1b8585 to bc660ac Compare May 8, 2023 10:08
@milesgranger
Copy link
Contributor Author

Sorry for the delay, and you were (unsurprisingly) quite right. I've updated to take the first encountered ipython related frame onwards, so the example now works without taking just the last frame of user code

@hendrikmakait hendrikmakait added the needs review Needs review from a contributor. label May 10, 2023
@milesgranger
Copy link
Contributor Author

@ntabris when you have time to give this another look, I think it may be what we're after now.

@ntabris
Copy link
Contributor

ntabris commented May 16, 2023

@milesgranger can you say more about what testing you've done / are expecting someone else to do? I have a few nits, but if this has been tested and works, I'd be happy to approve.

@hendrikmakait
Copy link
Member

@milesgranger: Is there a way we can automatically test this? What additional packages would we need to install on CI?

@milesgranger
Copy link
Contributor Author

@hendrikmakait @ntabris There exists no tests for the current IPython logic in this same function (AFAICT), so was kinda going off that being 'okay'.

Given that it is testing specifically if a frame is part of IPython.core.interactiveshell, seems maybe the only way to test it is to use mock to simulate an interactiveshell frame and check from there. Is this something you'd like to pursue?

@hendrikmakait
Copy link
Member

hendrikmakait commented May 22, 2023

@hendrikmakait @ntabris There exists no tests for the current IPython logic in this same function (AFAICT), so was kinda going off that being 'okay'.

I always get nervous around untested code in particular if I don't know how if this is more of a fringe case to handle or something that happens frequently.

Given that it is testing specifically if a frame is part of IPython.core.interactiveshell, seems maybe the only way to test it is to use mock to simulate an interactiveshell frame and check from there. Is this something you'd like to pursue?

Something we could explore would be executing code via https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.interactiveshell.html#IPython.core.interactiveshell.InteractiveShell.run_cell. It's less e2e compared to actually running an interactive shell session, but might get us most of the way without having to fiddle with subprocesses or something like that.

@milesgranger milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from e90d56b to bb7f77a Compare May 23, 2023 13:39
@milesgranger
Copy link
Contributor Author

milesgranger commented May 23, 2023

Got something together with testing ignoring IPython frames bb7f77a, but wasn't as nice as I'd hoped for. Seems like some conflict with the current event loop due to trying to use a client inside of this IPython InteractiveShell. Tried a number of things, including but not limited to:

  • Not including @gen_cluster results in failing run_cell not returning anything
  • Not running in ProcessPool then interferes with the current event loop

Will try a few more things later, but putting it here for now in case anyone has suggestions or wanted to try their hand at it.

@hendrikmakait
Copy link
Member

Maybe @graingert can help with event-loop-related problems?

@milesgranger milesgranger force-pushed the 7777-exclude-ipython-code-from-computations branch from 611b9f9 to 35956fd Compare May 24, 2023 10:50
@hendrikmakait
Copy link
Member

hendrikmakait commented May 24, 2023

I've also been trying my hand at creating a test and this is the best version I could come up with:

@gen_cluster(client=False)
async def test_ignore_ipython(s, a, b):
    pytest.importorskip("IPython")
    from IPython.core.interactiveshell import InteractiveShell

    source_code = """
import time
import dask
from distributed import Client
dask.config.set({{"distributed.diagnostics.computations.nframes": 3}})
with Client("{}") as client:
    def foo(x): print(x); return x;
    def bar(x): return client.map(foo, range(x))

    N = client.gather(bar(3))
""".format(s.address)

    with concurrent.futures.ProcessPoolExecutor() as executor:
            result = await asyncio.get_running_loop().run_in_executor(
                executor,
                run_in_ipython,
                source_code,
            )
    assert result.success
    computations = s.computations

    assert len(computations) == 1  # 1 computation
    computation = computations[0]
    assert len(computation.code) == 1  # 1 code
    code = computation.code[0]
    assert len(code) == 2 # 2 frames, even with nframes 3 other is ipython code

    def normalize(s):
        return re.sub(r"\s+", " ", s).strip()

    assert normalize(code[0]) == normalize(source_code)
    assert normalize(code[1]) == "def bar(x): return client.map(foo, range(x))"

Not running IPython in a separate process kept leaking a thread for me, so I couldn't use the run_cell_async functionality instead.

milesgranger and others added 2 commits May 24, 2023 14:06
Co-authored-by: Hendrik Makait <hendrik@makait.com>
# directly in InteractiveShell (and does) but requires `--reruns=1`
# due to some underlying lag in the asyncio if ran by itself, but will
# otherwise run fine in the suite of tests.
with concurrent.futures.ProcessPoolExecutor() as executor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to pass mp_context=multiprocessing.get_context("spawn") as this defaults to fork on linux, which I'm surprised is working in asyncio

Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge after CI finishes

@hendrikmakait hendrikmakait merged commit ff6327f into dask:main May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review Needs review from a contributor.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exclude ipython code from computations
5 participants