Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Make service account configurable for build_image_from_working_dir #3419

Merged
merged 13 commits into from
Apr 15, 2020
Original file line number Diff line number Diff line change
Expand Up @@ -698,3 +698,8 @@ spec:
- containerPort: 8888
- containerPort: 8887
serviceAccountName: ml-pipeline
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubeflow-pipelines-container-builder
4 changes: 4 additions & 0 deletions manifests/kustomize/base/pipeline/container-builder-sa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kubeflow-pipelines-container-builder
2 changes: 1 addition & 1 deletion manifests/kustomize/gcp-workload-identity-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ USER_GSA=${USER_GSA:-$CLUSTER_NAME-kfp-user}

# Kubernetes Service Account (KSA)
SYSTEM_KSA=(ml-pipeline-ui ml-pipeline-visualizationserver)
USER_KSA=(pipeline-runner default) # default service account is used for container building, TODO: give it a specific name
USER_KSA=(pipeline-runner kubeflow-pipelines-container-builder)

cat <<EOF

Expand Down
1 change: 1 addition & 0 deletions sdk/python/kfp/containers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@
# See the License for the speci

from ._build_image_api import *
from ._container_builder import *
6 changes: 6 additions & 0 deletions sdk/python/kfp/containers/_build_image_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ def build_image_from_working_dir(image_name: str = None, working_dir: str = None
timeout: Optional. The image building timeout in seconds.
base_image: Optional. The container image to use as the base for the new image. If not set, the Google Deep Learning Tensorflow CPU image will be used.
builder: Optional. An instance of ContainerBuilder or compatible class that will be used to build the image.
The default builder uses "kubeflow-pipelines-container-builder" service account in "kubeflow" namespace. It works with Kubeflow Pipelines clusters installed using Google Cloud Marketplace or Standalone with version > 0.4.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in "kubeflow" namespace
in namespace installed Kubeflow Pipelines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right! thanks for pointing out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Depending on how you installed Kubeflow Pipelines, you need to configure your ContainerBuilder instance's namespace and service_account:
For clusters installed with Kubeflow >= 0.7, use ContainerBuidler(namespace='<your-user-namespace>', service_account='default-editor', ...). You can omit the namespace if you use kfp sdk from in-cluster notebook, it uses notebook namespace by default.
For clusters installed with Kubeflow < 0.7, use ContainerBuilder(service_account='default', ...).
For clusters installed using Google Cloud Marketplace or Standalone with version <= 0.4.0, use ContainerBuilder(namespace='<your-kfp-namespace>' service_account='default')
You may refer to https://www.kubeflow.org/docs/pipelines/installation/overview/ for more details about different installation options.

Returns:
The full name of the container image including the hash digest. E.g. gcr.io/my-org/my-image@sha256:86c1...793c.
Expand Down
22 changes: 16 additions & 6 deletions sdk/python/kfp/containers/_container_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.

__all__ = [
'ContainerBuilder',
]

import logging
import tarfile
import tempfile
Expand All @@ -22,7 +26,6 @@
GCS_STAGING_BLOB_DEFAULT_PREFIX = 'kfp_container_build_staging'
GCR_DEFAULT_IMAGE_SUFFIX = 'kfp_container'


def _get_project_id():
import requests
URL = "http://metadata.google.internal/computeMetadata/v1/project/project-id"
Expand Down Expand Up @@ -51,20 +54,27 @@ class ContainerBuilder(object):
"""
ContainerBuilder helps build a container image
"""
def __init__(self, gcs_staging=None, default_image_name=None, namespace=None):
def __init__(self, gcs_staging=None, default_image_name=None, namespace=None,
service_account='kubeflow-pipelines-container-builder'):
"""
Args:
gcs_staging (str): GCS bucket/blob that can store temporary build files,
default is gs://PROJECT_ID/kfp_container_build_staging.
default is gs://PROJECT_ID/kfp_container_build_staging. You have to
specify this when it doesn't run in cluster.
default_image_name (str): Target container image name that will be used by the build method if the target_image argument is not specified.
namespace (str): kubernetes namespace where the pod is launched,
namespace (str): Kubernetes namespace where the container builder pod is launched,
default is the same namespace as the notebook service account in cluster
or 'kubeflow' if not in cluster
or 'kubeflow' if not in cluster. If using the full Kubeflow
deployment and not in cluster, you should specify your own user namespace.
service_account (str): Kubernetes service account the pod uses for container building,
The default value is "kubeflow-pipelines-container-builder". It works with Kubeflow Pipelines clusters installed using Google Cloud Marketplace or Standalone with version > 0.4.0.
The service account should have permission to read and write from staging gcs path and upload built images to gcr.io.
"""
self._gcs_staging = gcs_staging
self._gcs_staging_checked = False
self._default_image_name = default_image_name
self._namespace = namespace
self._service_account = service_account

def _get_namespace(self):
if self._namespace is None:
Expand Down Expand Up @@ -134,7 +144,7 @@ def _generate_kaniko_spec(self, context, docker_filename, target_image):
],
'image': 'gcr.io/kaniko-project/executor@sha256:78d44ec4e9cb5545d7f85c1924695c89503ded86a59f92c7ae658afa3cff5400',
}],
'serviceAccountName': 'default'}
'serviceAccountName': self._service_account}
}
return content

Expand Down
28 changes: 25 additions & 3 deletions sdk/python/tests/compiler/container_builder_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,32 @@ def test_generate_kaniko_yaml(self, mock_gcshelper):
test_data_dir = os.path.join(os.path.dirname(__file__), 'testdata')

# check
builder = ContainerBuilder(gcs_staging=GCS_BASE, default_image_name=DEFAULT_IMAGE_NAME, namespace='default')
builder = ContainerBuilder(gcs_staging=GCS_BASE,
default_image_name=DEFAULT_IMAGE_NAME,
namespace='default')
generated_yaml = builder._generate_kaniko_spec(docker_filename='dockerfile',
context='gs://mlpipeline/kaniko_build.tar.gz', target_image='gcr.io/mlpipeline/kaniko_image:latest')
context='gs://mlpipeline/kaniko_build.tar.gz',
target_image='gcr.io/mlpipeline/kaniko_image:latest')
with open(os.path.join(test_data_dir, 'kaniko.basic.yaml'), 'r') as f:
golden = yaml.safe_load(f)

self.assertEqual(golden, generated_yaml)
self.assertEqual(golden, generated_yaml)

def test_generate_kaniko_yaml_kubeflow(self, mock_gcshelper):
""" Test generating the kaniko job yaml for Kubeflow deployment """

# prepare
test_data_dir = os.path.join(os.path.dirname(__file__), 'testdata')

# check
builder = ContainerBuilder(gcs_staging=GCS_BASE,
default_image_name=DEFAULT_IMAGE_NAME,
namespace='user',
service_account='default-editor',)
generated_yaml = builder._generate_kaniko_spec(docker_filename='dockerfile',
context='gs://mlpipeline/kaniko_build.tar.gz',
target_image='gcr.io/mlpipeline/kaniko_image:latest',)
with open(os.path.join(test_data_dir, 'kaniko.kubeflow.yaml'), 'r') as f:
golden = yaml.safe_load(f)

self.assertEqual(golden, generated_yaml)
2 changes: 1 addition & 1 deletion sdk/python/tests/compiler/testdata/kaniko.basic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ metadata:
sidecar.istio.io/inject: 'false'
spec:
restartPolicy: Never
serviceAccountName: default
serviceAccountName: kubeflow-pipelines-container-builder
containers:
- name: kaniko
image: gcr.io/kaniko-project/executor@sha256:78d44ec4e9cb5545d7f85c1924695c89503ded86a59f92c7ae658afa3cff5400
Expand Down
34 changes: 34 additions & 0 deletions sdk/python/tests/compiler/testdata/kaniko.kubeflow.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


apiVersion: v1
kind: Pod
metadata:
generateName: kaniko-
namespace: user
annotations:
sidecar.istio.io/inject: 'false'
spec:
restartPolicy: Never
serviceAccountName: default-editor
containers:
- name: kaniko
image: gcr.io/kaniko-project/executor@sha256:78d44ec4e9cb5545d7f85c1924695c89503ded86a59f92c7ae658afa3cff5400
args: ["--cache=true",
"--dockerfile=dockerfile",
"--context=gs://mlpipeline/kaniko_build.tar.gz",
"--destination=gcr.io/mlpipeline/kaniko_image:latest",
"--digest-file=/dev/termination-log",
]