-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement Request: Add AWS S3 Support for TensorBoard in KFP #4364
Comments
/assign @Jeffwan @PatrickXYS |
Hello @PatrickXYS, does that include TensorBoard support? I didn't see anything in that PR related to TensorBoard at first glance. In our last discussion @Bobgy informed me that no such support exists. |
If you check https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/env/aws/viewer-pod-template.json, this is for The point here is you need to configure those AWS parameters correctly, such as |
@PatrickXYS thanks, that looks promising. We'll try that out. |
@PatrickXYS are there instructions on how to use IAM roles with Tensorboard? We don't have long-lived AWS secrets, so we'll need an IAM based solution |
Bump ^ we also use IAM roles instead of long-lived AWS secrets. @PatrickXYS do you know if this is supported? And if it's undocumented but supported, happy to help supply documentation provided we can get it working. |
We haven't supported IAM role yet. Also, I'm not sure if that's feasible for now, will add in our roadmap. |
@PatrickXYS we were finally able to get tensorboard working with IAM. I'll document it below so that other people can leverage the setup. Although I'm happy to add the docs somewhere else if there's a better location. Similar to https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/env/aws/README.md we have a configmap with the content necessary for the tensorboard launcher: We're using - op: replace
path: /spec/template/spec/containers/0/env
value:
...
- name: VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH
value: /etc/config/viewer-tensorboard-template.json
... Then we've modified viewer-tensorboard-template.json: |-
{
"metadata": {
"annotations": {
"iam.amazonaws.com/role": "ai-rancher/rancher_ai_training_shared"
}
},
"spec": {
"serviceAccountName": "kubeflow-pipelines-viewer"
}
} After that, the tensorboard viewer pod gets started with the above IAM role and can access our S3 buckets:
I should also note that in our kustomize configs we specify a commonAnnotation: commonAnnotations:
iam.amazonaws.com/role: ai-rancher/rancher_ai_training_shared But we still needed the above All of this is to say that we've been able to create TensorBoard artifacts as follows with S3 paths from within our pipelines, and the viewer pod is able to use IAM roles to download the S3 log dir. Example {
"outputs": [
{
"type": "tensorboard",
"source": "s3://invitae-ai-training-shared/kubeflow/experiments/ccccfe49-a23e-4a5b-9684-a3e7c5e26095/runs/17400f2b-11dc-4729-b7dc-f2336a81aadd/logs"
}
]
} Also, the KFP docs (here say that "The pipeline component must write a JSON file specifying metadata for the output viewer(s) that you want to use for visualizing the results. The file name must be op = dsl.ContainerOp(
name="Write tensorboard metadata",
image=docker_image,
command=["sh", "-c"],
arguments=[ ... some command that produces metadata.json ... ],
file_outputs={"mlpipeline-ui-metadata": "metadata.json"},
) It would be great to update the KFP for the above metadata.json issue, since debugging this issue cost me a few hours, and I felt a bit mislead by the existing documentation. cc @nlarusstone since you were asking in the #kubeflow-pipelines slack channel. |
I should note, none of this worked until we bumped to KFP standalone version: |
Overview
My team is interested in attempting to integrate TensorBoard into Kubeflow Pipelines (using
v1.0.0
, standalone installation) we find ourself unable to do so due to our dependency on AWS instead of GCP. I was recommended by someone in the Kubeflow Slack to open this Enhancement Request for adding S3 support for using TensorBoard in Kubeflow Pipelines.Proposal
A great improvement to the KFP UI would be to see the
Start TensorBoard
button in the output page of a pipeline run as described in the KFP docs (https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/#tensorboard) even if the TensorBoard log directory has been uploaded to an AWS S3 bucket.End user requirements to leverage this feature:
/mlpipeline-ui-metadata.json
file successfully in the container,logdir
that is valid (i.e.tensorboard --inspect --logdir /app/logs/fit/…
succeeds)logdir
to AWS S3 successfullyHere’s the content of an example metadata file for a S3 logdir:
@Bobgy pointed me to #4208 as a potential future workaround using a mount path (note: not yet merged as of creation of this GitHub issue), but it would be great to have AWS S3 support for this as well.
I'm happy to chip in where I can with design discussions/implementation here, especially with regard to AWS integration in general, since my team is exclusively (and mostly successfully) using AWS instead of GCP for our KFP cluster.
Original Slack channel thread:
https://kubeflow.slack.com/archives/CE10KS9M4/p1595512605179800
The text was updated successfully, but these errors were encountered: