Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gwas_tutorial.ipynb taking too long to run. #934

Open
benjeffery opened this issue Oct 17, 2022 · 10 comments · Fixed by #935
Open

gwas_tutorial.ipynb taking too long to run. #934

benjeffery opened this issue Oct 17, 2022 · 10 comments · Fixed by #935
Assignees

Comments

@benjeffery
Copy link
Collaborator

benjeffery commented Oct 17, 2022

CI Is currently failing (e.g. https://github.com/pystatgen/sgkit/actions/runs/3251065111/jobs/5362259020) as the GWAS tutorial notebook is timing out. (default timeout is 30s, I've been running locally for a 5min and it is still going)

I assume this is a regression? Looking into it (I can't self-assign here yet).

@tomwhite
Copy link
Collaborator

Thanks for opening this @benjeffery - I was just about to open the same issue! This is a regression - started on Friday.

I can reproduce locally and I get the following log:

Traceback (most recent call last):
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 730, in _async_poll_for_reply
    msg = await ensure_async(self.kc.shell_channel.get_msg(timeout=new_timeout))
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 96, in ensure_async
    result = await obj
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/jupyter_client/channels.py", line 230, in get_msg
    raise Empty
_queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
    executenb(
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 1204, in execute
    return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 84, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/util.py", line 62, in just_run
    return loop.run_until_complete(coro)
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 663, in async_execute
    await self.async_execute_cell(
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 949, in async_execute_cell
    exec_reply = await self.task_poll_for_reply
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 754, in _async_poll_for_reply
    await self._async_handle_timeout(timeout, cell)
  File "/Users/tom/miniconda3/envs/sgkit-doc-3.8/lib/python3.8/site-packages/nbclient/client.py", line 801, in _async_handle_timeout
    raise CellTimeoutError.error_from_timeout_and_cell(
nbclient.exceptions.CellTimeoutError: A cell timed out while it was being executed, after 30 seconds.
The message was: Cell execution timed out.
Here is a preview of the cell contents:
-------------------
dp = ds.call_DP.where(ds.call_DP >= 0) # filter out missing
sample_dp_mean = dp.mean(dim="variants")
sample_dp_mean.attrs["long_name"] = "Mean Sample DP"
ds["sample_dp_mean"] = sample_dp_mean # add new data array to dataset
ds.plot.scatter(x="sample_dp_mean", y="sample_call_rate", size=8, s=10);
-------------------

Running the notebook manually doesn't cause the problem - that cell runs instantly.

@benjeffery
Copy link
Collaborator Author

benjeffery commented Oct 17, 2022

Running the notebook manually doesn't cause the problem - that cell runs instantly.

I'm not finding that! Locally the cell takes several minutes. (I'm on matplotlib==3.6.1 if that makes any difference as it seem to be plotting releated)

@benjeffery
Copy link
Collaborator Author

Diffing the installed dependencies of the failing build with the last successful shows that this is due to xarray==2022.10.0.
Locally, xarray==2022.9.0 completes the build in 44s. Looking into what changed.

@tomwhite
Copy link
Collaborator

You're right - I was running the wrong cell - I can reproduce it in the notebook now.

@tomwhite
Copy link
Collaborator

From https://github.com/pydata/xarray/releases/tag/v2022.10.0: "This release brings numerous bugfixes, a change in minimum supported versions, and a new scatter plot method for DataArrays."

@benjeffery
Copy link
Collaborator Author

Yeah, pydata/xarray#6778 completely replaced the scatter code.

@benjeffery
Copy link
Collaborator Author

Heh, the output (after a while) is completly different!
old
new

@tomwhite
Copy link
Collaborator

It's fine to pin xarray on an older version while we address this (if there's no obvious fix) - that would unblock the other issues.

benjeffery added a commit to benjeffery/sgkit that referenced this issue Oct 17, 2022
@mergify mergify bot closed this as completed in #935 Oct 17, 2022
mergify bot pushed a commit that referenced this issue Oct 17, 2022
@tomwhite tomwhite reopened this Sep 4, 2023
@tomwhite
Copy link
Collaborator

tomwhite commented Sep 4, 2023

The underlying issue hasn't been fixed (see #1122), so it might be worth reporting upstream @benjeffery?

@jeromekelleher
Copy link
Collaborator

Can we do the scatter plot with matplotlib or something to avoid the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants