Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker builds using GitHub Actions #274

Closed

Conversation

swelborn
Copy link
Collaborator

This PR adds a GitHub Actions docker build workflow to facilitate faster development cycles, avoid docker build timeouts on dockerhub, and prepare for the future (python/ubuntu versions will be out of support soon).

Instead of pip installing everything, the new Docker images contain a conda environment. This is in order to facilitate (potentially) adding non pip-installable packages to stempy.

On migrating to newer versions, I first bumped to mpich 4 prior to changing python. After bumping python 3.7.3 -> python 3.8 || python 3.9, a segfault occurs when running MPI counting. Backtrace is below. This does not seem to occur on the non-MPI version.

The new images have been tested (briefly), but need to undergo more rigorous tests before messing with the image people are actively using. Thus, a separate job within the workflow builds the original images (with minor modifications to the Dockerfile (for some reason the COPY command was not recursively copying pybind11). New versions will be under different tags.

The following images will be built:

  • openchemistry/stempy:latest (python 3.7.3, uses original dockerfile)
  • openchemistry/stempy:latest-conda-jammy (python 3.9, doesn't need mpich)
  • openchemistry/stempy-ipykernel:latest (python 3.7.3, uses original dockerfile)
  • openchemistry/stempy-ipykernel:latest-conda-jammy (python 3.9, doesn't need mpich)
  • openchemistry/stempy-mpi:latest (mpich3, python3.7.3, uses original dockerfile)
  • openchemistry/stempy-mpi:latest-conda (mpich4, python 3.7.3)
  • openchemistry/stempy-mpi:latest-conda-jammy (python3.7.3: does not build mpich, uses apt-get mpich/libmpich-dev)

along with base images that serve as a "cache":

  • openchemistry/stempy-base:latest-conda-jammy
  • openchemistry/stempy-mpi-base:latest-conda
  • openchemistry/stempy-mpi-base:latest-conda-jammy

Here is the bt:

#0  0x0000000000580db8 in PyContextVar_Get (ovar=0x7fead4ec99a0, def=0x0, val=0x7fff33f9a400)
    at /usr/local/src/conda/python-3.9.15/Python/context.c:197
197	/usr/local/src/conda/python-3.9.15/Python/context.c: No such file or directory.
[Current thread is 1 (Thread 0x7fecef636740 (LWP 110556))]
(gdb) bt
#0  0x0000000000580db8 in PyContextVar_Get (ovar=0x7fead4ec99a0, def=0x0, val=0x7fff33f9a400)
    at /usr/local/src/conda/python-3.9.15/Python/context.c:197
#1  0x00007feceed14389 in PyDataMem_GetHandler ()
   from /opt/miniconda3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#2  0x00007feceed5c495 in PyArray_NewFromDescr_int ()
   from /opt/miniconda3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#3  0x00007feceed5c916 in PyArray_NewFromDescr ()
   from /opt/miniconda3/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
#4  0x00007feacf9462d7 in pybind11::array::array (this=0x7fff33f9a870, dt=..., shape=..., strides=...,
    ptr=0x0, base=...) at /source/stempy/thirdparty/pybind11/include/pybind11/numpy.h:736
#5  0x00007feacf9466d1 in pybind11::array::array<unsigned int> (base=..., ptr=0x0, strides=...,
    shape=..., this=0x7fff33f9a870) at /source/stempy/thirdparty/pybind11/include/pybind11/numpy.h:765
#6  pybind11::array::array<unsigned int> (this=0x7fff33f9a870, count=<optimized out>, ptr=0x0, base=...)
    at /source/stempy/thirdparty/pybind11/include/pybind11/numpy.h:773
#7  0x00007feacf94692f in stempy::as_pyarray<std::vector<unsigned int, std::allocator<unsigned int> > >
    (seq=...) at /source/stempy/python/image.cpp:42
#8  0x00007feacf946dcc in stempy::ElectronCountedDataPyArray::ElectronCountedDataPyArray (
    this=0x7fff33f9ab10, other=...) at /usr/include/c++/11/bits/move.h:104
--Type <RET> for more, q to quit, c to continue without paging--
#9  0x00007feacf947b5a in stempy::electronCount<stempy::SectorStreamMultiPassThreadedReader> (
    reader=0x8c16b0, options=...) at /source/stempy/python/image.cpp:227
#10 0x00007feacf958f64 in pybind11::detail::argument_loader<stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&>::call_impl<stempy::ElectronCountedDataPyArray, stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), 0ul, 1ul, pybind11::gil_scoped_release>(stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::gil_scoped_release&&) && (f=<optimized out>, f=<optimized out>, this=0x7fff33f9aae0)
    at /source/stempy/thirdparty/pybind11/include/pybind11/detail/type_caster_base.h:976
#11 pybind11::detail::argument_loader<stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&>::call<stempy::ElectronCountedDataPyArray, pybind11::gil_scoped_release, stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&)>(stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&)) && (f=<optimized out>, this=0x7fff33f9aae0)
    at /source/stempy/thirdparty/pybind11/include/pybind11/cast.h:1412
#12 pybind11::cpp_function::initialize<stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), stempy::ElectronCountedDataPyArray, stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release> >(stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), stempy::ElectronCountedDataPyArray (*)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptions--Type <RET> for more, q to quit, c to continue without paging--
Py const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (__closure=0x0, call=...)
    at /source/stempy/thirdparty/pybind11/include/pybind11/pybind11.h:248
#13 pybind11::cpp_function::initialize<stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), stempy::ElectronCountedDataPyArray, stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&, pybind11::name, pybind11::scope, pybind11::sibling, pybind11::call_guard<pybind11::gil_scoped_release> >(stempy::ElectronCountedDataPyArray (*&)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), stempy::ElectronCountedDataPyArray (*)(stempy::SectorStreamMultiPassThreadedReader*, stempy::ElectronCountOptionsPy const&), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, pybind11::call_guard<pybind11::gil_scoped_release> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () at /source/stempy/thirdparty/pybind11/include/pybind11/pybind11.h:223
#14 0x00007feacf953b7d in pybind11::cpp_function::dispatcher (self=<optimized out>,
    args_in=0x7fead2a72680, kwargs_in=0x0)
    at /source/stempy/thirdparty/pybind11/include/pybind11/pybind11.h:939
#15 0x0000000000507457 in cfunction_call (func=0x7fead2a47db0, args=<optimized out>,
    kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.15/Objects/methodobject.c:543
#16 0x00000000004f068c in _PyObject_MakeTpCall (tstate=0x76f8c0, callable=0x7fead2a47db0,
    args=<optimized out>, nargs=<optimized out>, keywords=0x0)
    at /usr/local/src/conda/python-3.9.15/Objects/call.c:191
--Type <RET> for more, q to quit, c to continue without paging--
#17 0x00000000004ec9fb in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>,
    args=0xd66720, callable=0x7fead2a47db0, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.15/Include/cpython/abstract.h:116
#18 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xd66720,
    callable=0x7fead2a47db0, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.15/Include/cpython/abstract.h:103
#19 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xd66720, callable=0x7fead2a47db0)
    at /usr/local/src/conda/python-3.9.15/Include/cpython/abstract.h:127
#20 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x76f8c0)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:5077
#21 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xd664c0, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:3489
#22 0x00000000004e689a in _PyEval_EvalFrame (throwflag=0, f=0xd664c0, tstate=0x76f8c0)
    at /usr/local/src/conda/python-3.9.15/Include/internal/pycore_ceval.h:40
#23 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x7fecef468a18,
    kwargs=0x7d1930, kwcount=<optimized out>, kwstep=1, defs=0x7fead3546d58, defcount=<optimized out>,
    kwdefs=0x0, closure=0x0, name=0x7fecef35ae70, qualname=0x7fecef35ae70)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:4329
#24 0x00000000004f7c64 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>,
    nargsf=<optimized out>, kwnames=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/local/src/conda/python-3.9.15/Objects/call.c:396
#25 0x00000000004e89d4 in _PyObject_VectorcallTstate (kwnames=0x7fecef468a00, nargsf=<optimized out>,
    args=<optimized out>, callable=0x7fead2a4e790, tstate=0x76f8c0)
    at /usr/local/src/conda/python-3.9.15/Include/cpython/abstract.h:118
#26 PyObject_Vectorcall (kwnames=0x7fecef468a00, nargsf=<optimized out>, args=<optimized out>,
    callable=0x7fead2a4e790) at /usr/local/src/conda/python-3.9.15/Include/cpython/abstract.h:127
#27 call_function (kwnames=0x7fecef468a00, oparg=<optimized out>, pp_stack=<synthetic pointer>,
    tstate=<optimized out>) at /usr/local/src/conda/python-3.9.15/Python/ceval.c:5077
#28 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7d17b0, throwflag=<optimized out>)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:3537
#29 0x00000000004e689a in _PyEval_EvalFrame (throwflag=0, f=0x7d17b0, tstate=0x76f8c0)
    at /usr/local/src/conda/python-3.9.15/Include/internal/pycore_ceval.h:40
#30 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0,
    kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0,
    name=0x0, qualname=0x0) at /usr/local/src/conda/python-3.9.15/Python/ceval.c:4329
#31 0x00000000004e6527 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>,
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>,
    kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0,
    qualname=0x0) at /usr/local/src/conda/python-3.9.15/Python/ceval.c:4361
#32 0x00000000004e64d9 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>,
--Type <RET> for more, q to quit, c to continue without paging--
    locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>,
    kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:4377
#33 0x000000000059329b in PyEval_EvalCode (co=co@entry=0x7fecef4a6660,
    globals=globals@entry=0x7fecef513740, locals=locals@entry=0x7fecef513740)
    at /usr/local/src/conda/python-3.9.15/Python/ceval.c:828
#34 0x00000000005c0ad7 in run_eval_code_obj (tstate=0x76f8c0, co=0x7fecef4a6660,
    globals=0x7fecef513740, locals=0x7fecef513740)
    at /usr/local/src/conda/python-3.9.15/Python/pythonrun.c:1221
#35 0x00000000005bcb00 in run_mod (mod=<optimized out>, filename=<optimized out>,
    globals=0x7fecef513740, locals=0x7fecef513740, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.9.15/Python/pythonrun.c:1242
#36 0x00000000004566f4 in pyrun_file (fp=0x80e8a0, filename=0x7fecef356ae0, start=<optimized out>,
    globals=0x7fecef513740, locals=0x7fecef513740, closeit=1, flags=0x7fff33f9b5c8)
    at /usr/local/src/conda/python-3.9.15/Python/pythonrun.c:1140
#37 0x00000000005b67e2 in pyrun_simple_file (flags=0x7fff33f9b5c8, closeit=1, filename=0x7fecef356ae0,
    fp=0x80e8a0) at /usr/local/src/conda/python-3.9.15/Python/pythonrun.c:450
#38 PyRun_SimpleFileExFlags (fp=0x80e8a0, filename=<optimized out>, closeit=1, flags=0x7fff33f9b5c8)
    at /usr/local/src/conda/python-3.9.15/Python/pythonrun.c:483
#39 0x00000000005b3d5e in pymain_run_file (cf=0x7fff33f9b5c8, config=0x76de90)
    at /usr/local/src/conda/python-3.9.15/Modules/main.c:379

@swelborn swelborn added the dependencies Pull requests that update a dependency file label Jan 13, 2023
@swelborn swelborn force-pushed the update-dockerfile-staging branch 3 times, most recently from be0f032 to c835914 Compare January 14, 2023 00:14
- bump mpich version to 4
- after mpich bump, tried to bump python version
- this did not work, resulted in segfault in mpi version
@swelborn swelborn force-pushed the update-dockerfile-staging branch from c835914 to e2f5600 Compare January 14, 2023 00:20
- previous commits included bumping to py39.
- going back to py3.7.3 as bumping created segfault in mpi version
- added info on build readme
@swelborn swelborn force-pushed the update-dockerfile-staging branch from e2f5600 to eefba22 Compare January 14, 2023 00:21
.github/workflows/docker.yml Outdated Show resolved Hide resolved
docker/Dockerfile Outdated Show resolved Hide resolved
docker/stempy-conda-jammy/conda/environment_before.yml Outdated Show resolved Hide resolved
.github/workflows/docker.yml Show resolved Hide resolved
- adds pull_request to actions
- use COPY instead of git clone, for testing branches
- builds using dockerhub org instead of "samwelborn" or "openchemistry"
- GH checkout action changed to recursively grab submodules
@swelborn swelborn force-pushed the update-dockerfile-staging branch from 2d3fa48 to 1cc8150 Compare January 18, 2023 20:50
@psavery
Copy link
Collaborator

psavery commented Jan 20, 2023

It looks to me that the segmentation fault is occurring inside this constructor, which mainly converts the vector to a pybind11 array. I have two guesses for fixes:

  1. Maybe we also need to upgrade the version of pybind11 if we upgrade the version of Python. The version of pybind11 being used is ~2 years old, and we may need some changes for Python >= 3.8.
  2. Maybe that vectorToPyArray() function doesn't work anymore for newer version of Python, and we need to modify it.

@swelborn Can you try also upgrading the version of pybind11 when you upgrade the version of Python, and see if you still encounter the error?

Upgrading pybind11 however may also cause some compile errors due to API changes, but we'll see what happens!

@cjh1
Copy link
Member

cjh1 commented Jan 20, 2023

@psavery We tried upgrading the pybind11 version I believe.

@psavery
Copy link
Collaborator

psavery commented Jan 24, 2023

I wonder if we need to acquire the GIL? Like adding this at the beginning of the function:

    py::gil_scoped_acquire acquire;

But then again, I don't know how the GIL could be a problem only for the MPI version.

@swelborn
Copy link
Collaborator Author

swelborn commented Feb 1, 2023

@psavery @cjh1 I will open up a new issue on upgrading the python version. Here is a related github issue pybind/pybind11#1042. Seems like that is where the vectorToPyArray() function came from. It doesn't seem like people are complaining about this problem on that thread, but many of those posts were made a couple of years ago.

@cjh1
Copy link
Member

cjh1 commented Feb 2, 2023

@swelborn Are the two failures expected?

@swelborn
Copy link
Collaborator Author

swelborn commented Feb 7, 2023

@cjh1 The reason for that is likely that I have set ${{ vars.DOCKERHUB_ORG }} incorrectly. It should yield openchemistry, but it is appearing blank. Is that the correct name for the environment variable?

pin h5py, don't use environment_after.yml
@swelborn swelborn force-pushed the update-dockerfile-staging branch from 139042d to b36cebc Compare April 14, 2023 19:18
swelborn added 4 commits May 19, 2023 16:01
- install dev packages (gdb, vim)
- add libarchive to fix mamba issue
- move Big MPI to base build
@swelborn
Copy link
Collaborator Author

I made significant changes to this PR. It still provides the same functionality, but I think it is more coherent and avoiding of duplicated code. Here are the primary changes:

  1. Dockerfile.base and Dockerfile.stempy consolidated, with various fields for docker input arguments.
  2. apt-packages-common.txt and apt-packages-dev.txt for packages installed with apt-get.
  3. conda-mappings.json. This enables dynamic values for python version. I generated this from a simple web scraping program. I pulled most of the associated Dockerfile logic from here.
  4. Action in .github/actions/docker_setup that avoids code duplication.
  5. Actions in .github/workflows/docker.yml that set environment variables to be passed into Dockerfiles.
  6. -dev tag for added packages (gdb, vim)
  7. Commit hash to avoid :latest on tag.
  8. Tagging is generally different. Check out what they will look like here.

Probably the biggest change is ability to change dependencies easily, like:

python-version: ['3.7', '3.8', '3.9', '3.10']

to

python-version: ['3.7']

This was already helpful... I acquired gil and fixed #275, and used the git versioning to test each build with sfapi_client at NERSC, see here for the script I used. I will detail this in a separate PR after this is approved/merged.

Note: the json with the miniconda versions/SHAs can be improved and/or made into its own action to avoid the mess in the workflows file. This is more of a draft... let me know what you all think of the approach.

@swelborn swelborn closed this Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants