Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running examples on Apple Silicon M1 and fix image build errors for arm64 #1898

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@ ADD ./pkg/ ${TARGET_DIR}/pkg/
ADD ./${METRICS_COLLECTOR_DIR}/ ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}/
WORKDIR ${TARGET_DIR}/${METRICS_COLLECTOR_DIR}

RUN if [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libpcre3 libpcre3-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi

RUN pip install --no-cache-dir -r requirements.txt

RUN chgrp -R 0 ${TARGET_DIR} \
Expand Down
2 changes: 1 addition & 1 deletion cmd/suggestion/chocolate/v1beta1/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ ENV SUGGESTION_DIR cmd/suggestion/chocolate/v1beta1
RUN apt-get -y update && \
apt-get -y install git && \
if [ "$(uname -m)" = "ppc64le" ] || [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y install gfortran libopenblas-dev liblapack-dev; \
apt-get -y install gfortran libopenblas-dev liblapack-dev g++; \
fi && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Expand Down
10 changes: 8 additions & 2 deletions examples/v1beta1/kind-cluster/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ if [ -z "$(command -v kubectl)" ]; then
exit 1
fi

# Step 1. Create Kind cluster with Kubernetes v1.22.9
kind create cluster --image kindest/node:v1.22.9
# Step 1. Create Kind cluster with Kubernetes v1.23.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for changing default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I upgraded the KinD Kubernetes version since we have upgraded Kubernetes dependencies to v0.23.
Also, according to this doc, K8s v1.22 reach EoL before we will release after the next Katib major version.

kind create cluster --image kindest/node:v1.23.6
echo -e "\nKind cluster has been created\n"

# Step 2. Set context for kubectl
Expand All @@ -53,6 +53,12 @@ kubectl get nodes
echo -e "\nDeploying Katib components\n"
kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

# If the local machine's CPU architecture is arm64, rewrite mysql image.
if [ "$(uname -m)" = "arm64" ]; then
kubectl patch deployments -n kubeflow katib-mysql --type json -p \
'[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value": "arm64v8/mysql:8.0.29-oracle"}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any better replacement solution with kustomize itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have only 3 requirement tools to work on this KinD cluster example, I added that code not to make increase requirement tools.

fi

# Wait until all Katib pods are running.
kubectl wait --for=condition=ready --timeout=${TIMEOUT} -l "katib.kubeflow.org/component in (controller,db-manager,mysql,ui)" -n kubeflow pod

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,14 @@ WORKDIR ${TARGET_DIR}

ENV PYTHONPATH ${TARGET_DIR}

RUN if [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libpcre3 libpcre3-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir tensorflow==2.9.1
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ WORKDIR ${TARGET_DIR}

ENV PYTHONPATH ${TARGET_DIR}

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir scipy==1.8.1
RUN chgrp -R 0 ${TARGET_DIR} \
&& chmod -R g+rwX ${TARGET_DIR}

Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
scipy>=1.7.2
tensorflow==2.9.1; platform_machine=="x86_64"
tensorflow-aarch64==2.9.1; platform_machine=="aarch64"
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,14 @@ FROM python:3.9-slim
ADD examples/v1beta1/trial-images/tf-mnist-with-summaries /opt/tf-mnist-with-summaries
WORKDIR /opt/tf-mnist-with-summaries

RUN pip install --no-cache-dir tensorflow==2.9.1
RUN if [ "$(uname -m)" = "aarch64" ]; then \
apt-get -y update && \
apt-get -y install gfortran libpcre3 libpcre3-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*; \
fi

RUN pip install --no-cache-dir -r requirements.txt
RUN chgrp -R 0 /opt/tf-mnist-with-summaries \
&& chmod -R g+rwX /opt/tf-mnist-with-summaries

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tensorflow==2.9.1; platform_machine=="x86_64"
tensorflow-aarch64==2.9.1; platform_machine=="aarch64"
19 changes: 10 additions & 9 deletions scripts/v1beta1/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -112,32 +112,33 @@ echo -e "\nBuilding median stopping rule...\n"
docker build --platform "linux/$ARCH" -t "${REGISTRY}/earlystopping-medianstop:${TAG}" -f ${CMD_PREFIX}/earlystopping/medianstop/${VERSION}/Dockerfile .

# Training container images
echo -e "\nBuilding training container images..."

if [ ! "$ARCH" = "amd64" ]; then
echo -e "\nTraining container images are supported only amd64."
echo -e "\nSome training container images are supported only amd64."
else

echo -e "\nBuilding training container images..."

echo -e "\nBuilding mxnet mnist training container example...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/mxnet-mnist:${TAG}" -f examples/${VERSION}/trial-images/mxnet-mnist/Dockerfile .

echo -e "\nBuilding Tensorflow with summaries mnist training container example...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/tf-mnist-with-summaries:${TAG}" -f examples/${VERSION}/trial-images/tf-mnist-with-summaries/Dockerfile .

echo -e "\nBuilding PyTorch mnist training container example...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/pytorch-mnist:${TAG}" -f examples/${VERSION}/trial-images/pytorch-mnist/Dockerfile .

echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with GPU support...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/enas-cnn-cifar10-gpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.gpu .

echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with CPU support...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/enas-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.cpu .

echo -e "\nBuilding PyTorch CIFAR-10 CNN training container example for DARTS with CPU support...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/darts-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/darts-cnn-cifar10/Dockerfile.cpu .

echo -e "\nBuilding PyTorch CIFAR-10 CNN training container example for DARTS with GPU support...\n"
docker build --platform linux/amd64 -t "${REGISTRY}/darts-cnn-cifar10-gpu:${TAG}" -f examples/${VERSION}/trial-images/darts-cnn-cifar10/Dockerfile.gpu .

fi

echo -e "\nBuilding Tensorflow with summaries mnist training container example...\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are arm64 images built only for tf-mnist-with-summaries and enas-cnn-cifar10 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all trial images for amd64 except tf-mnist-with-summaries and enas-cnn-cifar10 can work on the M1 Mac since we can emulate x86_64 in docker desktop with Rosetta2.

Would you like to make all images conform to arm64 in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can fix later as well as it works now. We need to cleanup scripts for building images of different architecture.(rather than having arch checks per image etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. @johnugeorge
It might be better to introduce a multi-arch build after the next Katib release.

ref: https://docs.docker.com/desktop/multi-arch/

I will create an issue to keep tracking this feature.

docker build --platform "linux/$ARCH" -t "${REGISTRY}/tf-mnist-with-summaries:${TAG}" -f examples/${VERSION}/trial-images/tf-mnist-with-summaries/Dockerfile .

echo -e "\nBuilding Keras CIFAR-10 CNN training container example for ENAS with CPU support...\n"
docker build --platform "linux/$ARCH" -t "${REGISTRY}/enas-cnn-cifar10-cpu:${TAG}" -f examples/${VERSION}/trial-images/enas-cnn-cifar10/Dockerfile.cpu .

echo -e "\nAll Katib images with ${TAG} tag have been built successfully!\n"