The file metric collector example docker image does not sync with the code #945

yeya24 · 2019-12-02T20:52:59Z

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
The trial image docker.io/liuhougangxa/pytorch-mnist:1.0 in https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metricscollector-example.yaml is outdated with https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/file-metrics-collector/mnist.py.

The mnist.py in the docker image

def test(args, model, device, test_loader, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
            pred = output.max(1, keepdim=True)[1] # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    logging.info('\n{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n'.format(float(correct) / len(test_loader.dataset), test_loss))

Here the logging format is {{metricName: accuracy, metricValue: {:.4f}}}, so that the file collector cannot parse it correctly.

@hougangliu

What did you expect to happen:

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Kubeflow version:
Minikube version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

andreyvelich · 2019-12-03T17:11:54Z

I faced with the same problem.
After creating docker image with the latest mnist.py code file-metricscollector-example works.

Maybe, we can't pass CI tests because of this issue, in e2e test we use yaml from examples folder:
https://github.com/kubeflow/katib/blob/master/test/scripts/v1alpha3/run-file-metricscollector.sh#L59

hougangliu · 2019-12-04T05:54:22Z

sorry blocking you, I updated the image in #947

johnugeorge · 2019-12-04T09:00:28Z

@hougangliu Can you move it to a common repo instead of your private registry? Currently, did you retag your image with latest changes?

johnugeorge · 2019-12-05T05:46:30Z

Closing this issue as #949 keeps images in kubeflowkatib repo.

johnugeorge · 2019-12-05T05:46:37Z

/close

k8s-ci-robot · 2019-12-05T05:46:39Z

@johnugeorge: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the kind/bug label Dec 2, 2019

yeya24 mentioned this issue Dec 4, 2019

rename counter metrics #942

Merged

k8s-ci-robot closed this as completed Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The file metric collector example docker image does not sync with the code #945

The file metric collector example docker image does not sync with the code #945

yeya24 commented Dec 2, 2019

andreyvelich commented Dec 3, 2019

hougangliu commented Dec 4, 2019 •

edited

Loading

johnugeorge commented Dec 4, 2019

johnugeorge commented Dec 5, 2019

johnugeorge commented Dec 5, 2019

k8s-ci-robot commented Dec 5, 2019

The file metric collector example docker image does not sync with the code #945

The file metric collector example docker image does not sync with the code #945

Comments

yeya24 commented Dec 2, 2019

andreyvelich commented Dec 3, 2019

hougangliu commented Dec 4, 2019 • edited Loading

johnugeorge commented Dec 4, 2019

johnugeorge commented Dec 5, 2019

johnugeorge commented Dec 5, 2019

k8s-ci-robot commented Dec 5, 2019

hougangliu commented Dec 4, 2019 •

edited

Loading