Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Manual update to DockerHub fails #760

Closed
miquelduranfrigola opened this issue Aug 1, 2023 · 16 comments
Closed

🐛 Bug: Manual update to DockerHub fails #760

miquelduranfrigola opened this issue Aug 1, 2023 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Aug 1, 2023

Describe the bug.

The script https://github.com/ersilia-os/ersilia/blob/master/.github/scripts/build_model_container_and_update_to_dockerhub.py does not work anymore:

> [3/8] COPY download-miniconda.sh /root/download-miniconda.sh:
------
Dockerfile:6
--------------------
   4 |     ARG BUILD_MODEL
   5 |     ENV PATH=$PATH:/usr/bin/conda/bin
   6 | >>> COPY download-miniconda.sh /root/download-miniconda.sh
   7 |     RUN chmod + /root/download-miniconda.sh
   8 |     RUN bash /root/download-miniconda.sh && \

More generally, this script does not allow for multi-platform builds, which means that it does not provide the same functionalities as the corresponding GitHub Action: https://github.com/ersilia-os/eos-template/blob/main/.github/workflows/upload-model-to-dockerhub.yml

Moreover, the GitHub Action updates metadata with DockerHub info, modifies AirTable and Updates README file. None of this is done at the moment with the current script.

Describe the steps to reproduce the behavior

python .github/scripts/build_model_container_and_update_to_dockerhub.py eos3b5e DOCKERHUB_USERNAME DOCKERHUB_PASSWORD

Expected behavior.

No response

Screenshots.

No response

Operating environment

Ubuntu 20.04 LTS

Additional context

This issue emerged while working on @simrantan's model eos2rd8. This model is too large to run on a GitHub Actions workflow.

@emmakodes
Copy link
Contributor

Hello, @miquelduranfrigola @GemmaTuron for the following error:

> [3/8] COPY download-miniconda.sh /root/download-miniconda.sh:
------
Dockerfile:6
--------------------
   4 |     ARG BUILD_MODEL
   5 |     ENV PATH=$PATH:/usr/bin/conda/bin
   6 | >>> COPY download-miniconda.sh /root/download-miniconda.sh
   7 |     RUN chmod + /root/download-miniconda.sh
   8 |     RUN bash /root/download-miniconda.sh && \

The COPY instruction is trying to copy the file named download-miniconda.sh from the build context into the /root directory within the image.

Solution to solve the above error

Adding download-miniconda.sh to base_files = ["Dockerfile", "docker-entrypoint.sh", "nginx.conf", "download-miniconda.sh"] ensures that download-miniconda.sh is also in the build context


managers.py file in ersilia

def _build_ersilia_base(self):
    self.logger.debug("Creating docker image of ersilia base")
    path = tempfile.mkdtemp(prefix="ersilia-")
    base_folder = os.path.join(path, "base")
    os.mkdir(base_folder)
    base_files = ["Dockerfile", "docker-entrypoint.sh", "nginx.conf", "download-miniconda.sh"]
    for f in base_files:
        cmd = "cd {0}; wget {2}/base/{1}".format(
            base_folder, f, self._model_deploy_dockerfiles_url
        )
        run_command(cmd)
    cmd = "cd {0}; docker build -t {1}/base:{2} .".format(
        base_folder, DOCKERHUB_ORG, DOCKERHUB_LATEST_TAG
    )
    run_command(cmd)

@emmakodes
Copy link
Contributor

To allow ersilia to be able to build for linux/amd64 and linux/arm64 I have the following changes in the following files:

managers.py file in ersilia

def build_with_ersilia(self, model_id, docker_user, docker_pwd):
    self.logger.debug("Creating docker image with ersilia incorporated")
    if self.image_exists("base"):
        pass
    else:
        self._build_ersilia_base()
    path = tempfile.mkdtemp(prefix="ersilia-model")
    model_folder = os.path.join(path, model_id)
    os.mkdir(model_folder)
    cmd = "cd {0}; wget {1}/model/Dockerfile".format(
        model_folder, self._model_deploy_dockerfiles_url
    )
    run_command(cmd)
    file_path = os.path.join(model_folder, "Dockerfile")
    with open(file_path, "r") as f:
        text = f.read()
    text = text.replace("eos_identifier", model_id)
    with open(file_path, "w") as f:
        f.write(text)
    
    # login to docker so as to be able to push images to online docker repo
    cmd = "docker login --password {0} --username {1}".format(
            docker_pwd, docker_user
        )
    run_command(cmd)

    try:
        # Attempt to build for linux/amd64 and linux/arm64
        print('building for both linux/amd64,linux/arm64')
        cmd = "cd {0}; docker buildx build --builder=container --platform linux/amd64,linux/arm64 -t {1}/{2}:{3} --push .".format(
            model_folder, DOCKERHUB_ORG, model_id, DOCKERHUB_LATEST_TAG
        )
        run_command(cmd)
        print('done building for linux/amd64,linux/arm64')
        sys.exit()  # This will terminate the program immediately only if build for linux/amd64,linux/arm64 complete successfully
        
    except subprocess.CalledProcessError as e:
        # Build failed for both platforms, now try building only for linux/amd64
        self.logger.warning("Build failed for linux/amd64 and linux/arm64, trying linux/amd64 only")
        cmd = "cd {0}; docker build -t {1}/{2}:{3} .".format(
            model_folder, DOCKERHUB_ORG, model_id, DOCKERHUB_LATEST_TAG
        )
        run_command(cmd)

Notice, I have a try-except clause that tries to build for linux/amd64 and linux/arm64 but if that fails it builds for only linux/amd64. Also, you will notice a --push command in the docker build command of the try clause. The reason for this is that when you build an image that works on multi-platform, docker does not persist the built images locally. You have to either push the image immediately to dockerhub. See this for more details. To push to dockerhub, you will need to login to dockerhub, hence the docker login command above the try clause. Also, you notice this --builder=container command also in the try clause of the docker build command. You need a builder to use buildx. To create a builder, you can follow the guide in this article . sys.exit() will stop the program once it completes the build and push of an image to dockerhub that supports multi-platform(linux/amd64 and linux/arm64)


def build(self, model_id, docker_user, docker_pwd, use_cache=True):
    if self.with_bentoml:
        self.build_with_bentoml(model_id=model_id, use_cache=use_cache)
    else:
        self.build_with_ersilia(model_id=model_id, docker_user=docker_user, docker_pwd=docker_pwd)

Notice, docker_user and docker_pwdare now passed to build since we are going to push an image immediately once we build an image that works on a multi-platform.




terminal.py file in ersilia

In the run_command below, I am using the return_code variable to know if the execution of a command was successful or not so that in the case of building images for linux/amd64 and linux/arm64 if that fails it raises a CalledProcessError exception and goes on to build for only linux/amd64

def run_command(cmd, quiet=None):
    if quiet is None:
        quiet = is_quiet()
    if type(cmd) == str:
        if quiet:
            with open(os.devnull, "w") as fp:
                return_code = subprocess.Popen(
                    cmd, stdout=fp, stderr=fp, shell=True, env=os.environ
                ).wait()
        else:
            return_code = subprocess.Popen(cmd, shell=True, env=os.environ).wait()
    else:
        if quiet:
            try:
                subprocess.check_call(
                    cmd,
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.DEVNULL,
                    env=os.environ,
                )
                return_code = 0
            except subprocess.CalledProcessError as e:
                return_code = e.returncode
        else:
            try:
                subprocess.check_call(cmd, env=os.environ)
                return_code = 0
            except subprocess.CalledProcessError as e:
                return_code = e.returncode

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)




dockerhub.py file in ersilia

I am also passing self.docker_user, self.docker_pwd parameter to build method since we will push an image to dockerhub once we build for multi-platform linux/amd64 and linux/arm64

def build_image(self):
    dm = DockerManager()
    dm.build(self.model_id, self.docker_user, self.docker_pwd)





In summary, you can check the code snippets or methods I made changes from my forked version. 1 2 3 4 5 6 7


Maybe you make same code changes as mine in your own fork and try to build and push an image of a model using the following command: python .github/scripts/build_model_container_and_update_to_dockerhub.py eos3b5e DOCKERHUB_USERNAME DOCKERHUB_PASSWORD to see how it works and if any modifications needs to be made.


For me, with the code changes I was able to build and push the image of eos8c0o model eos8c0o docker image , eos8a5g model eos8a5g docker image to my dockerhub account

@GemmaTuron
Copy link
Member

Hi @emmakodes !

Thanks, I will test and let you know

@GemmaTuron
Copy link
Member

Hi @emmakodes

I am testing your fork, but could you open a PR to main so that I can more easily see the changes made in the code?

@emmakodes
Copy link
Contributor

Okay sure @GemmaTuron

@GemmaTuron
Copy link
Member

Hi @emmakodes

I have used to code in your fork, and while I can build and push the eos3b5e, it only appears as the ARM64 image? https://hub.docker.com/repository/docker/ersiliaos/eos3b5e/tags?page=1&ordering=last_updated

could this be because I am in a MacOs ?

@emmakodes
Copy link
Contributor

emmakodes commented Aug 16, 2023

Hello @GemmaTuron did you create a builder so as to use buildx to build for multi-platform?
multi-platform link


The --builder flag is included in the docker buildx command to build for multi-platform.

cmd = "cd {0}; docker buildx build --builder=container --platform linux/amd64,linux/arm64 -t {1}/{2}:{3} --push .".format(
            model_folder, DOCKERHUB_ORG, model_id, DOCKERHUB_LATEST_TAG
        )

To create a builder:

  • Verify that the Buildx client is installed on your system, and that you’re able to run it:
$ docker buildx version

github.com/docker/buildx v0.10.3 79e156beb11f697f06ac67fa1fb958e4762c0fab
  • Next, create a builder that uses the docker-container. Run the following docker buildx create command:
    docker buildx create --driver=docker-container --name=container

This creates a new builder with the name container. You can list available builders with docker buildx ls.

$ docker buildx ls
NAME/NODE           DRIVER/ENDPOINT               STATUS
container           docker-container
  container_0       unix:///var/run/docker.sock   inactive
default *           docker
  default           default                       running
desktop-linux       docker
  desktop-linux     desktop-linux                 running

The status for the new container builder is inactive. That’s fine - it’s because you haven’t started using it yet.

  • Then you can run the following command again: python .github/scripts/build_model_container_and_update_to_dockerhub.py eos3b5e DOCKERHUB_USERNAME DOCKERHUB_PASSWORD

@GemmaTuron
Copy link
Member

Indeed, I got this warning:
ERROR: no builder "container" found
I guess that's it, let me follow the steps you are indicating

@emmakodes
Copy link
Contributor

Okay @GemmaTuron I have also opened the PR

@emmakodes
Copy link
Contributor

Somehow I feel it may be better to use a linux/amd system instead of the Mac since from what we see from most models, they build successfully for linux/amd . I may be wrong though

@GemmaTuron
Copy link
Member

my silly question is, if I build from a Mac, can I still build for AMD as well? (I am trying on my Linux with the container set up now)

@emmakodes
Copy link
Contributor

You can still build for Linux/amd when you do the multi-platform build using buildx but when it fails it returns to using the build command to build based on your system architecture which will also fail since your system is a Mac (linux/arm).

@emmakodes
Copy link
Contributor

I was able to build and push eos2rd8 model to my dockerhub https://hub.docker.com/r/emmakodes/eos2rd8/tags
eos2rd8_image_build_log.txt

I also made prediction when I pull and run the image via my docker desktop
ersilia -v api run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"

{
    "input": {
        "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
        "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
        "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
    },
    "output": {
        "outcome": [
            "The molecule is a member of the class of thiadiazoles that is 1,3,4-thiadiazol-2-amine which is substituted by a (5-nitro-1,3-thiazol-2-yl)sulfanediyl group at position 5. It is a c-Jun N-terminal kinase inhibitor (IC50 = 0.7uM) and exhibits antibacterial properties. It has a role as a c-Jun N-terminal kinase inhibitor and an antibacterial agent. It is a member of thiadiazoles, a member of 1,3-thiazoles, a primary amino compound, a C-nitro compound and an organic sulfide."
        ]
    }
}

@emmakodes
Copy link
Contributor

emmakodes commented Aug 16, 2023

For eos2re5 model, it is still running on my PC but I am seeing this error log:

 => => # 18:56:19 | ERROR    | ❌ Checksum discrepancy in file model/checkpoints/caco2/caco2_Model.pkl_658.npy: expected 88fe6e055898dad9654d02c9d
 => => # 5357c2b164051da2be5df825065a964c626900c, actual 21fefec75bf4c284399ab55a704e9398ad32747236e871823bb022b8848f7f19
 => => # 18:55:45 | ERROR    | ❌ Checksum discrepancy in file model/checkpoints/caco2/caco2_Model.pkl_641.npy: expected ac077127dfcb00629ca8013e7
 => => # c1f6f21f587f5349b459dc25e921d1ed801baa1, actual 2ea7bc0a5d6d085a72150dd78fd97148680dc8db98c282d962bd29dd36cd942f

Not sure, It seems some checkpoints of the model are not okay or it may be an issue from my end. It also may not be an issue with limit in size. Maybe when you try to double-check. It seems we may need to check the checkpoints that are causing the discrepancy. They may either be corrupt or modified in some way since the expected checksums were calculated

@GemmaTuron
Copy link
Member

Hi @emmakodes

I've been able to update the model eos3b5e in both architectures! I am also finding issues with eos2re5. I get the same warning, and if you leave it running long enough, it eventually fails at fetching the model. @miquelduranfrigola Could you look at this?

I'll merge the changes meanwhile!

@GemmaTuron
Copy link
Member

GemmaTuron commented Aug 24, 2023

UPDATE:

Following @emmakodes instructions the upload to docker from local works. There is an unresolved issue, independent from docker, with model eos2re5, which will be archived until we can work on it.
Just as a note, if you prefer to build your model from the repository in your local (not in github) you do need to modify the Dockerfile in ersilia/dockerfiles/model-deploy/model from
RUN ersilia -v fetch $MODEL --from_github
to
RUN ersilia -v fetch $MODEL --repo_path <path to your repo>

I'll close this issue, thanks for the help Emma!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants