Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dockerHSSM #439

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

dockerHSSM #439

wants to merge 8 commits into from

Conversation

panwanke
Copy link

Add a dockerfile for dockerizing the HSSM and a docker-image.yml for allowing github action.
I tested the all notebooks in tutorials folder, and fix some issues.

Copy link
Collaborator

@digicosmos86 digicosmos86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! We generally discourage our users from using docker, especially since they can easily install hssm from conda-forge now, but there could be some special use cases for docker. I've suggested some changes. Since we can't see the diffs of the notebooks, can you let us know what are the changes and fixes to the notebooks?

Thanks again for contributing!

.github/workflows/docker-image.yml Outdated Show resolved Hide resolved
.github/workflows/docker-image.yml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated

USER $NB_UID

RUN conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be an issue for most of the users. Any reason why we cannot use conda-forge?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and that's because in my tests in China, using this proxy to prevents installation failures. Our previous experience building dockerHDDM tells me that this being applied to github should work as well. Of course, one can also use the original forge and should get the same results.

After trying it out, I realised that using conda would take hours, whereas using pip was significantly faster, so I ended up using pip.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our experience, we encountered an issue with the pip-installed pytensor not being able to find the blas library, resulting it falling back to the numpy C-API and much slower performance overall. That's why we worked to put hssm on PyPI. Is there any issue like this with this docker image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we installed pytensor correctly. But using the following code, I found that pytensor should recognize the blas library.

import pytensor
print(pytensor.config.blas__ldflags)
# -L/opt/conda/lib -llapack -lblas -lcblas -lm -Wl,-rpath,/opt/conda/lib

Please let me know if there are other better testing methods to ensure that pytensor is installed correctly

panwanke added 5 commits May 22, 2024 09:56
Dynamically fetch versions; and perform github action either at release time or manually.
add workflow_dispatch
update: install toml
@panwanke
Copy link
Author

Thank you for your reply. I am reporting the issues and changes in the notebooks below. These issues have been discussed with Alex and any help needed is appreciated.

obs_angle = ssms.basic_simulators.simulator(...) --> simulatorobs_angle = ssms.basic_simulators.simulator.simulator(...)inlikelihoods.ipynb`

Replace all config.update("jax_enable_x64", False) and pytensor.config.floatX = "float32" with hssm.set_floatX("float32")

  • Glad to see you guys updated your compatibility with jax config a few hours ago. And I still suggest if it is possible to use hssm.set_floatX consistently in all notebooks?

Remove missing api in main_tutorial.ipynb:

from hssm.distribution_utils import (
    make_distribution,  # A general function for making Distribution classes
    make_distribution_from_onnx,  # Makes Distribution classes from onnx files
    make_distribution_from_blackbox,  # Makes Distribution classes from callables
)

to

from hssm.distribution_utils import make_distribution

There is a compatibility issus with latests pymc(5.15) that makes .graph() not work. So, in the docker env, I fix pymc version to 5.14.
image

The major modifications are the file path references.

  • Firstly, cavanagh_theta_test.csv and cavanagh_idata.nc were referenced in plotting.ipynb, and they were in the tests/fixtures folder. So I mapped that folder to the docker's src directory as shown below. This allows you to reference this data via src/cavanagh_theta_test.csv.
    image
  • Secondly, given that model files such as *.onnx exist under fixtures and may not be downloadable on some network conditions, to facilitate the use of these models in docker, I added loglik="src/*onnx" to all places where onnx is needed, including pymc.ipynb, likelihoods.ipynb and main_tutorials.ipynb. such as
model_model_comp_1 = hssm.HSSM(
    data=dataset_model_comp,
    model="angle",
    loglik="src/angle.onnx",
    a=1.0,
)

@digicosmos86
Copy link
Collaborator

@panwanke Thank you again for your help with this! We really appreciate the effort in fixing the inconsistencies in our notebooks! In principle I agree with all the changes but the last one. We want to keep the notebooks directly executable in Google Colab, so switching to a local reference to the onnx files might break that. Maybe you can download all networks from the huggingface repo to the docker image, and then add references (commented out) to the onnx files locally? People can then uncomment the code when there is no Internet

@panwanke
Copy link
Author

panwanke commented May 24, 2024

@panwanke Thank you again for your help with this! We really appreciate the effort in fixing the inconsistencies in our notebooks! In principle I agree with all the changes but the last one. We want to keep the notebooks directly executable in Google Colab, so switching to a local reference to the onnx files might break that. Maybe you can download all networks from the huggingface repo to the docker image, and then add references (commented out) to the onnx files locally? People can then uncomment the code when there is no Internet

Thanks for the suggestion. I'm sorry I didn't think about running in colab before. I've now commented out where I need to use onnx and added an explanation. For example, like this,

ddm_model_approx_diff = hssm.HSSM(
    data, model="ddm", 
    loglik_kind="approx_differentiable",
    loglik="ddm.onnx",          # will be downloaded from huggingface
    # loglik="src/ddm.onnx", # If running in dockerHSSM, the local file is available
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants