Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not parse metadata file from S3 #3388

Closed
Jeffwan opened this issue Mar 29, 2020 · 1 comment
Closed

Could not parse metadata file from S3 #3388

Jeffwan opened this issue Mar 29, 2020 · 1 comment

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Mar 29, 2020

What steps did you take:

I configure argo, KFP UI and KFP APIServer to use S3 as artifact store.

What happened:

Could not parse metadata file at: artifacts/iris-classification-pipeline-4ktlj/iris-classification-pipeline-4ktlj-3034405897/mlpipeline-ui-metadata.tgz. Error: SyntaxError: Unexpected token � in JSON at position 0

It's a little bit weird this is working fine if artifact is stored in Minio but not in S3. As you see in the screenshot, it has some unreadable chars and I verify that bucket and key path are correct and I can fetch file using command.

Screen Shot 2020-03-29 at 3 06 33 PM

I am not sure if the way we persist file matters. I download both files from minio and s3. Both tgz and unzipped json return the same file type.

$ file ~/Downloads/mlpipeline-ui-metadata.tgz
~/Downloads/mlpipeline-ui-metadata.tgz: gzip compressed data, last modified: Sun Mar 29 06:24:22 2020, from Unix

$ file ~/Downloads/mlpipeline-ui-metadata.json
~/Downloads/mlpipeline-ui-metadata.json: ASCII text, with very long lines, with no line terminators

The only difference I know is in S3, the file metadata has Content-Type: application/x-gtar-compressed and I download minio file and reupload to S3, it has content type Content-Type: application/gzip but this still doesn't work.

Python code I used to persist the artifact.

from tensorflow.python.lib.io import file_io
with file_io.FileIO('/tmp/mlpipeline-ui-metadata.json', 'w') as f:
        json.dump(metadata, f)
{
  "outputs": [
    {
      "storage": "inline",
      "source": "# Inline Markdown\n[A link](https://www.kubeflow.org/)",
      "type": "markdown"
    },
    {
      "source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/README.md",
      "type": "markdown"
    },
    {
      "type": "confusion_matrix",
      "format": "csv",
      "schema": [
        {
          "name": "target",
          "type": "CATEGORY"
        },
        {
          "name": "predicted",
          "type": "CATEGORY"
        },
        {
          "name": "count",
          "type": "NUMBER"
        }
      ],
      "source": "s3://jiaixn-kubeflow-pipeline-data/iris-example/confusion_matrix.csv",
      "labels": [
        "0",
        "1"
      ]
    },
    {
      "type": "tensorboard",
      "source": "s3://jiaixn-kubeflow-pipeline-data/iris-example/tb-logs"
    }
  ]
}

What did you expect to happen:

Environment:

How did you deploy Kubeflow Pipelines (KFP)?
Standalone

KFP version: 0.2.5

/kind bug
/area frontend
/area backend

@Jeffwan
Copy link
Member Author

Jeffwan commented Mar 30, 2020

#2992 Fix this issue and I think I will go try the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants