Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build is stuck for jupyter/scipy-notebook arm64 image #1539

Closed
mathbunnyru opened this issue Nov 26, 2021 · 37 comments · Fixed by #1593
Closed

Build is stuck for jupyter/scipy-notebook arm64 image #1539

mathbunnyru opened this issue Nov 26, 2021 · 37 comments · Fixed by #1593
Labels
type:Bug A problem with the definition of one of the docker images maintained here

Comments

@mathbunnyru
Copy link
Member

There is a timeout and it happened several times already:
https://github.com/jupyter/docker-stacks/runs/4292319380?check_suite_focus=true

I think the reason might be in QEMU :(

@mathbunnyru mathbunnyru added the type:Bug A problem with the definition of one of the docker images maintained here label Nov 26, 2021
@mathbunnyru
Copy link
Member Author

I have tried to restart manually several times - still no luck.

@mathbunnyru
Copy link
Member Author

Stuck command:
https://github.com/jupyter/docker-stacks/runs/4292319380?check_suite_focus=true#step:7:7814

RUN mamba install --quiet --yes     'altair'     'beautifulsoup4'     'bokeh'     'bottleneck'     'cloudpickle'     'conda-forge::blas=*=openblas'     'cython'     'dask'     'dill'     'h5py'     'ipympl'    'ipywidgets'     'matplotlib-base'     'numba'     'numexpr'     'pandas'     'patsy'     'protobuf'     'pytables'     'scikit-image'     'scikit-learn'     'scipy'     'seaborn'     'sqlalchemy'     'statsmodels'     'sympy'     'widgetsnbextension'    'xlrd' &&     mamba clean --all -f -y &&     fix-permissions "/opt/conda" &&     fix-permissions "/home/jovyan"

@mathbunnyru mathbunnyru changed the title Build is stuck for arm jupyter/scipy-notebook arm64 Build is stuck for jupyter/scipy-notebook arm64 image Nov 26, 2021
@mathbunnyru mathbunnyru pinned this issue Dec 1, 2021
@mathbunnyru mathbunnyru unpinned this issue Dec 2, 2021
@mathbunnyru mathbunnyru reopened this Dec 3, 2021
@mathbunnyru mathbunnyru pinned this issue Dec 8, 2021
@mathbunnyru
Copy link
Member Author

@consideRatio @minrk could you take a look please?
I do not have enough experience with qemu / arm to say what's going wrong.
We haven't been able to upload new images for 3 weeks already.

@minrk
Copy link
Member

minrk commented Dec 9, 2021

I don't have any information or insight, but if it's a problem, I'd say disable the arm64 builds and try to fix it by re-enabling them in a PR. That should unstick things for the most part and reduce pressure on working out what changed (apparently between 6e246ea and 92ce0af which had no changes for the failing image).

FWIW, here's the diff in the mamba install output for the stuck stage:

--- 6e246ea4bbff1eb812b0520b01ab68a2a42b2dad.txt	2021-12-09 15:10:27.000000000 +0100
+++ 92ce0af9989fb5ca8997f744c0718ba20dc70cd8.txt	2021-12-09 15:11:15.000000000 +0100
@@ -1,5 +1,4 @@
 #14  [linux/arm64  3/7]  RUN  mamba  install  --quiet  --yes  'altair'  'beautifulsoup4'  'bokeh'  'bottleneck'  'cloudpickle'  'conda-forge::blas=*=openblas'  'cython'  'dask'  'dill'  'h5py'  'ipympl'  'ipywidgets'  'matplotlib-base'  'numba'  'numexpr'  'pandas'  'patsy'  'protobuf'  'pytables'  'scikit-image'  'scikit-learn'  'scipy'  'seaborn'  'sqlalchemy'  'statsmodels'  'sympy'  'widgetsnbextension'  'xlrd'  &&  mamba  clean  --all  -f  -y  &&  fix-permissions  "/opt/conda"  &&  fix-permissions  "/home/jovyan"
-#14  10.60  WARNING  conda.lock:touch(51):  Failed  to  create  lock,  do  not  run  conda  in  parallel  processes  [errno  13]
 #14  XX.XX  Package  Version  Build  Channel  Size
 #14  XX.XX  ───────────────────────────────────────────────────────────────────────────────────────────────────
 #14  XX.XX  Install:
@@ -18,18 +17,70 @@
 #14  XX.XX  +  brunsli  0.1  h01db608_0  conda-forge/linux-aarch64  196  KB
 #14  XX.XX  +  c-blosc2  2.0.4  h9a49097_1  conda-forge/linux-aarch64  216  KB
 #14  XX.XX  +  cached-property  1.5.2  hd8ed1ab_1  conda-forge/noarch  4  KB
+#14  XX.XX  +  cached_property  1.5.2  pyha770c72_1  conda-forge/noarch  11  KB
+#14  XX.XX  +  cfitsio  4.0.0  h152aa4d_0  conda-forge/linux-aarch64  1  MB
+#14  XX.XX  +  charls  2.2.0  h01db608_0  conda-forge/linux-aarch64  145  KB
+#14  XX.XX  +  click  8.0.3  py39ha65689a_1  conda-forge/linux-aarch64  147  KB
+#14  XX.XX  +  cloudpickle  2.0.0  pyhd8ed1ab_0  conda-forge/noarch  24  KB
+#14  XX.XX  +  cycler  0.11.0  pyhd8ed1ab_0  conda-forge/noarch  10  KB
+#14  XX.XX  +  cython  0.29.24  py39h99ab00b_1  conda-forge/linux-aarch64  2  MB
+#14  XX.XX  +  cytoolz  0.11.2  py39h14843e3_1  conda-forge/linux-aarch64  393  KB
+#14  XX.XX  +  dask  2021.11.2  pyhd8ed1ab_0  conda-forge/noarch  5  KB
+#14  XX.XX  +  dask-core  2021.11.2  pyhd8ed1ab_0  conda-forge/noarch  783  KB
+#14  XX.XX  +  dill  0.3.4  pyhd8ed1ab_0  conda-forge/noarch  62  KB
+#14  XX.XX  +  distributed  2021.11.2  py39ha65689a_0  conda-forge/linux-aarch64  1  MB
+#14  XX.XX  +  fonttools  4.28.1  py39h14843e3_0  conda-forge/linux-aarch64  2  MB
+#14  XX.XX  +  freetype  2.10.4  hdf53a3c_1  conda-forge/linux-aarch64  988  KB
+#14  XX.XX  +  fsspec  2021.11.0  pyhd8ed1ab_0  conda-forge/noarch  91  KB
+#14  XX.XX  +  giflib  5.2.1  hb9de7d4_2  conda-forge/linux-aarch64  78  KB
+#14  XX.XX  +  gmp  6.2.1  h7fd3ca4_0  conda-forge/linux-aarch64  737  KB
+#14  XX.XX  +  gmpy2  2.1.0rc1  py39hb332cb7_0  conda-forge/linux-aarch64  176  KB
+#14  XX.XX  +  h5py  3.4.0  nompi_py39hbdd1fc2_102  conda-forge/linux-aarch64  1  MB
+#14  XX.XX  +  hdf5  1.12.1  nompi_h774d4d8_101  conda-forge/linux-aarch64  4  MB
+#14  XX.XX  +  heapdict  1.0.1  py_0  conda-forge/noarch  7  KB
+#14  XX.XX  +  imagecodecs  2021.11.20  py39he536a17_0  conda-forge/linux-aarch64  8  MB
+#14  XX.XX  +  imageio  2.9.0  py_0  conda-forge/noarch  3  MB
+#14  XX.XX  +  ipympl  0.8.2  pyhd8ed1ab_0  conda-forge/noarch  43  KB
+#14  XX.XX  +  ipywidgets  7.6.5  pyhd8ed1ab_0  conda-forge/noarch  101  KB
+#14  XX.XX  +  jbig  2.1  hf897c2e_2003  conda-forge/linux-aarch64  44  KB
+#14  XX.XX  +  joblib  1.1.0  pyhd8ed1ab_0  conda-forge/noarch  210  KB
+#14  XX.XX  +  jpeg  9d  hfd2af3c_0  conda-forge/linux-aarch64  410  KB
+#14  XX.XX  +  jupyterlab_widgets  1.0.2  pyhd8ed1ab_0  conda-forge/noarch  130  KB
+#14  XX.XX  +  jxrlib  1.1  hf897c2e_2  conda-forge/linux-aarch64  247  KB
+#14  XX.XX  +  kiwisolver  1.3.2  py39hb300cb6_1  conda-forge/linux-aarch64  79  KB
+#14  XX.XX  +  lcms2  2.12  h012adcb_0  conda-forge/linux-aarch64  524  KB
+#14  XX.XX  +  lerc  3.0  h01db608_0  conda-forge/linux-aarch64  200  KB
+#14  XX.XX  +  libaec  1.0.6  h01db608_0  conda-forge/linux-aarch64  51  KB
+#14  XX.XX  +  libblas  3.9.0  12_linuxaarch64_openblas  conda-forge/linux-aarch64  12  KB
+#14  XX.XX  +  libbrotlicommon  1.0.9  hf897c2e_6  conda-forge/linux-aarch64  65  KB
+#14  XX.XX  +  libbrotlidec  1.0.9  hf897c2e_6  conda-forge/linux-aarch64  33  KB
+#14  XX.XX  +  libbrotlienc  1.0.9  hf897c2e_6  conda-forge/linux-aarch64  293  KB
+#14  XX.XX  +  libcblas  3.9.0  12_linuxaarch64_openblas  conda-forge/linux-aarch64  12  KB
+#14  XX.XX  +  libdeflate  1.8  hf897c2e_0  conda-forge/linux-aarch64  88  KB
+#14  XX.XX  +  libgfortran-ng  11.2.0  he9431aa_11  conda-forge/linux-aarch64  19  KB
+#14  XX.XX  +  libgfortran5  11.2.0  h440fb59_11  conda-forge/linux-aarch64  1  MB
+#14  XX.XX  +  liblapack  3.9.0  12_linuxaarch64_openblas  conda-forge/linux-aarch64  12  KB
+#14  XX.XX  +  liblapacke  3.9.0  12_linuxaarch64_openblas  conda-forge/linux-aarch64  12  KB
+#14  XX.XX  +  libllvm11  11.1.0  h6293a0b_2  conda-forge/linux-aarch64  29  MB
+#14  XX.XX  +  libopenblas  0.3.18  pthreads_h775ce2d_0  conda-forge/linux-aarch64  7  MB
+#14  XX.XX  +  libpng  1.6.37  hbd635b3_2  conda-forge/linux-aarch64  338  KB
+#14  XX.XX  +  libprotobuf  3.19.1  h469bdbd_0  conda-forge/linux-aarch64  2  MB
+#14  XX.XX  +  libtiff  4.3.0  h58d83a1_2  conda-forge/linux-aarch64  777  KB
+#14  XX.XX  +  libwebp-base  1.2.1  hf897c2e_0  conda-forge/linux-aarch64  861  KB
+#14  XX.XX  +  libzopfli  1.0.3  h01db608_0  conda-forge/linux-aarch64  167  KB
 #14  XX.XX  +  llvm-openmp  12.0.1  hd62202e_1  conda-forge/linux-aarch64  3  MB
-#14  XX.XX  +  llvmlite  0.37.0  py39h4ebc84c_0  conda-forge/linux-aarch64  3  MB
+#14  XX.XX  +  llvmlite  0.37.0  py39h4ebc84c_1  conda-forge/linux-aarch64  3  MB
 #14  XX.XX  +  locket  0.2.0  py_2  conda-forge/noarch  6  KB
-#14  XX.XX  +  matplotlib-base  3.4.3  py39hc6c69e0_2  conda-forge/linux-aarch64  7  MB
+#14  XX.XX  +  matplotlib-base  3.5.0  py39hc6c69e0_0  conda-forge/linux-aarch64  7  MB
 #14  XX.XX  +  mock  4.0.3  py39ha65689a_2  conda-forge/linux-aarch64  51  KB
 #14  XX.XX  +  mpc  1.2.1  h846f343_0  conda-forge/linux-aarch64  119  KB
 #14  XX.XX  +  mpfr  4.1.0  h719063d_1  conda-forge/linux-aarch64  435  KB
 #14  XX.XX  +  mpmath  1.2.1  pyhd8ed1ab_0  conda-forge/noarch  437  KB
 #14  XX.XX  +  msgpack-python  1.0.2  py39hb300cb6_2  conda-forge/linux-aarch64  91  KB
+#14  XX.XX  +  munkres  1.1.4  pyh9f0ad1d_0  conda-forge/noarch  12  KB
 #14  XX.XX  +  networkx  2.6.3  pyhd8ed1ab_1  conda-forge/noarch  1  MB
 #14  XX.XX  +  numba  0.54.1  py39h0974df5_0  conda-forge/linux-aarch64  4  MB
-#14  XX.XX  +  numexpr  2.7.3  py39h03da0bc_1  conda-forge/linux-aarch64  240  KB
+#14  XX.XX  +  numexpr  2.7.3  py39h54c71e5_2  conda-forge/linux-aarch64  236  KB
 #14  XX.XX  +  numpy  1.20.3  py39h43e3299_1  conda-forge/linux-aarch64  6  MB
 #14  XX.XX  +  olefile  0.46  pyh9f0ad1d_1  conda-forge/noarch  32  KB
 #14  XX.XX  +  openblas  0.3.18  pthreads_h9ce3df4_0  conda-forge/linux-aarch64  8  MB
@@ -58,8 +109,8 @@
 #14  XX.XX  +  threadpoolctl  3.0.0  pyh8a188c0_0  conda-forge/noarch  17  KB
 #14  XX.XX  +  tifffile  2021.11.2  pyhd8ed1ab_0  conda-forge/noarch  139  KB
 #14  XX.XX  +  toolz  0.11.2  pyhd8ed1ab_0  conda-forge/noarch  48  KB
-#14  XX.XX  +  typing_extensions  3.10.0.2  pyha770c72_0  conda-forge/noarch  28  KB
-#14  XX.XX  +  widgetsnbextension  3.5.2  py39ha65689a_0  conda-forge/linux-aarch64  1  MB
+#14  XX.XX  +  typing_extensions  4.0.0  pyha770c72_0  conda-forge/noarch  26  KB
+#14  XX.XX  +  widgetsnbextension  3.5.2  py39h4420490_1  conda-forge/linux-aarch64  1  MB
 #14  XX.XX  +  xlrd  2.0.1  pyhd8ed1ab_3  conda-forge/noarch  92  KB
 #14  XX.XX  +  zfp  0.5.5  h01db608_7  conda-forge/linux-aarch64  195  KB
 #14  XX.XX  +  zict  2.0.0  py_0  conda-forge/noarch  10  KB
@@ -72,10 +123,10 @@
 #14  XX.XX
 #14  XX.XX  Summary:
 #14  XX.XX
-#14  XX.XX  Install:  108  packages
+#14  XX.XX  Install:  110  packages
 #14  XX.XX  Change:  1  packages
 #14  XX.XX
-#14  XX.XX  Total  download:  204  MB
+#14  XX.XX  Total  download:  206  MB
 #14  XX.XX
 #14  XX.XX  ───────────────────────────────────────────────────────────────────────────────────────────────────
 #14  XX.XX

I don't understand where all the new dependencies are coming from. They don't seem to trace to any of the other changes that I can see.

@minrk
Copy link
Member

minrk commented Dec 9, 2021

Actually comparing the last successful and first failed build, it appears mamba was upgraded from 0.17 to 0.18. That seems very likely to be the cause, given that it's apparently a hang in mamba solve.

I'd try pinning mamba to '<0.18' to see if that helps.

@mathbunnyru
Copy link
Member Author

Thanks @minrk!
I might have missed that :(

Created PR to try it out: #1545

@maresb
Copy link
Contributor

maresb commented Feb 2, 2022

I wonder if this code block is still necessary?

In case it is still required, perhaps we should report this upstream to mamba-org/mamba. They would probably like to know.

@consideRatio
Copy link
Collaborator

@maresb do you want to give it a test run by creating a PR and removing it - checking if our build gets stuck? I know mamba 0.20 was released quite recently.

@maresb
Copy link
Contributor

maresb commented May 1, 2022

I hope we can close this. I just took a look at the current run, and it seems that it stalled on the docker push stage? Better that stalling at the build stage like before, but I wonder what's wrong? Maybe cosmic rays... 😂

@mathbunnyru
Copy link
Member Author

I think the reason is that during the push step we are reusing an old image, not just a freshly built one.
Yes, it's really bad.
I will restart build process a few times and hope stuck build/push problem goes away.
Then I'll close this issue.

@mathbunnyru
Copy link
Member Author

@maresb does setting ./micromamba config set extract_threads 1 sets this value for further mamba calls as well?
If it doesn't, then we will still see stuck builds, because initially we had problems with mamba install calls.

@mathbunnyru
Copy link
Member Author

(base) jovyan@4ed975c01e37:~$ cat /home/jovyan/.condarc
extract_threads: 1

It seems that it does, so all should be good after several rebuilds.

@maresb
Copy link
Contributor

maresb commented May 1, 2022

I think the reason is that during the push step we are reusing an old image, not just a freshly built one.

I'm really confused about why build commands are running in the push step in the first place. I vaguely get that the various images seem to be building FROM the DockerHub images instead of the fresh images, and I'm very glad you're working on it.

But why build a second time? Am I missing something obvious?

Also, just an idea, sorry if this is way too naive, but would it simplify everything to make the various images into stages of a single Dockerfile? It seems like that way Docker would take care of the build dependency tree for you, so that you don't even have these problems in the first place.

@mathbunnyru
Copy link
Member Author

But why build a second time? Am I missing something obvious?

I don't know the reason myself :(
It might happen because we're building in system arch first, then we build in amd64+aarch64 using buildx.
Maybe because we don't have enough space to store everything on disk.

Also, just an idea, sorry if this is way too naive, but would it simplify everything to make the various images into stages of a single Dockerfile? It seems like that way Docker would take care of the build dependency tree for you, so that you don't even have these problems in the first place.

This is a good suggestion. But we still have to work with amd64/aarch64 differences (for example, we're not building everything under aarch64).
Or, it doesn't make sense to build all and then test all. It makes sense to test as early as possible (otherwise you will have to wait all the builds even if base image doesn't work).
Also, tagging is not an easy thing.
Overall, I can see some advantages, but I see many disadvantages as well.

I'm very glad you're working on it

To be honest, I was hoping to merge #1631 one day, but there has no been progress.
Let's continue the discussion about how we should change the CI system and build process here: #1407, because this issue is about one particular problem (and I think you resolved it).

@mathbunnyru
Copy link
Member Author

I made some rebuilds, I think this should be enough and the issue should be finally resolved.
Thanks for all your efforts and patience @maresb!

@mathbunnyru mathbunnyru unpinned this issue May 1, 2022
@mathbunnyru
Copy link
Member Author

@maresb could you please take a look here?
https://github.com/jupyter/docker-stacks/runs/6253380468?check_suite_focus=true

The build step was taking more than 3h30 min (normally it takes 2h now), so I cancelled it.
The last operation was mamba install for aarch64 😢

In the beginning, we can see #6 [linux/arm64 1/7] FROM docker.io/jupyter/minimal-notebook@sha256:1979b6252a3f0e21d1bdc4654cd7482acc038baf301251a549d5f259d8c3fc33.
I took this exact image, pulled it, and checked that extract_threads: 1 was there.

@maresb
Copy link
Contributor

maresb commented May 2, 2022

😢
I will look into this.

@mathbunnyru
Copy link
Member Author

@mathbunnyru mathbunnyru reopened this May 2, 2022
@mathbunnyru mathbunnyru pinned this issue May 2, 2022
@mathbunnyru
Copy link
Member Author

This is worse now than it was. It seems like it is always stuck now.

@maresb
Copy link
Contributor

maresb commented May 2, 2022

I am absolutely baffled. I've been running docker run on that line where it hangs, unable to reproduce the problem locally. Before I had no problem reproducing the issue locally.

@mathbunnyru
Copy link
Member Author

Did you pull the latest minimal image?

@maresb
Copy link
Contributor

maresb commented May 2, 2022

I pulled by the SHA.

At least I do have one remaining trick, which is to set the environment variable G_SLICE=always-malloc before each RUN mamba install as per mamba-org/mamba#1611 (comment). It's a bit ugly though.

I'm going to try a few more obscure possibilities to reproduce this locally.

Have you seen any failures with Micromamba or is it exclusively with Mamba on subsequent images?

@mathbunnyru
Copy link
Member Author

I've seen it happen only on mamba install.

@mathbunnyru
Copy link
Member Author

Is there a chance that extract_threads applies to micromamba only?

@maresb
Copy link
Contributor

maresb commented May 2, 2022

I was wondering about that... they share libmamba and I believe the changes are all there, so theoretically I believe the answer is yes, but I wanted to double-check that.

@maresb
Copy link
Contributor

maresb commented May 2, 2022

I'm a bit confused that extract_threads=1 is missing here:

$ docker run --rm -it --platform=amd64 docker.io/jupyter/minimal-notebook@sha256:1979b6252a3f0e21d1bdc4654cd7482acc038baf301251a549d5f259d8c3fc33 cat /opt/conda/.condarc
# Conda configuration see https://conda.io/projects/conda/en/latest/configuration.html

auto_update_conda: false
show_channel_urls: true
channels:
  - conda-forge

@mathbunnyru
Copy link
Member Author

mathbunnyru commented May 2, 2022

It's not in this file, but in this one: /home/jovyan/.condarc

@mathbunnyru
Copy link
Member Author

--platform=amd64 are you sure you want to run amd64 image?
I think you should use aarch64 or something like this.

@maresb
Copy link
Contributor

maresb commented May 2, 2022

Aaaahhhh! Very nice catch. I seem to be a bit flustered at the moment! Such a silly mistake!

@mathbunnyru
Copy link
Member Author

mathbunnyru commented May 2, 2022

No worries ❤️
Please, check how arm images behaves, I'm sure you can do this better than me because you have a lot of experience with this issue.
May I also ask - when you tested, did you only check that micromamba is working fine or you checked mamba as well?

@maresb
Copy link
Contributor

maresb commented May 2, 2022

I got a bit mentally tripped up by the CI bug. I figured that I could test mamba by starting a batch of CI runs on my branch, as per your old suggestion. They all completed without hanging. Thus I mentally gave it a ✔️. Sometime later I noticed that the Mamba version was old, and then connected it up with the problem of CI pulling the old images. But I didn't mentally revoke the ✔️ until just now.

@mathbunnyru
Copy link
Member Author

I think we can try ENV G_SLICE=always-malloc in all dockerfiles (in all to avoid even pulling problems) and see if it works.

@mathbunnyru
Copy link
Member Author

Or, we can figure a way to make mamba respect extract_threads setting (I think it doesn't now).

@maresb
Copy link
Contributor

maresb commented May 2, 2022

Do you want to try out ENV G_SLICE=always-malloc while I do the local testing?

@mathbunnyru
Copy link
Member Author

Yes, I will draft a PR.

@mathbunnyru
Copy link
Member Author

#1696

I'm going to sleep and will take a look tomorrow.
I currently think one successful build would be a good thing 😆

@mathbunnyru
Copy link
Member Author

Closing this one, finally 👍
Hope the upstream project will fix the issue and we will get rid of ugly code in our Dockerfiles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug A problem with the definition of one of the docker images maintained here
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants