Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clean up UTs for file metrics collector #2285

Merged
merged 5 commits into from
Apr 3, 2024

Conversation

Electronic-Waste
Copy link
Member

What this PR does / why we need it:

Current error comparison in file-metricscollector_test.go is based on a boolean rather than the error content To make test cases more detailed and accurate, we should expose and compare errors.

And also @tenzen-y suggested that t.TempDir should be used for generating test directory. So I replace static directories with t.TempDir.

Related discussion: #1756 (comment) #2274 (comment)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1756

Checklist:

  • Docs included if any changes are user facing

@Electronic-Waste
Copy link
Member Author

Electronic-Waste commented Mar 14, 2024

PTAL👀 /assign @andreyvelich @tenzen-y . BTW my flag is standing :)

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give my first review

@Electronic-Waste
Copy link
Member Author

Electronic-Waste commented Mar 14, 2024

I reordered the judging logics in CollectObservatoinLog, for we should check the fileFormat first in case of opening an invalid file. (Or the test case "Invalid file format" will report "can't find file or directory error` error since the order of test file generation is not sure.)

Also, some typical error's descriptions are added to the returned error in file-metricscollector.

PTAL👀 if you have some spare time @tenzen-y @andreyvelich .

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, lgtm

@Electronic-Waste
Copy link
Member Author

@tenzen-y It seems that the CI failed due to other reasons which are not concerned with my code.

@tenzen-y
Copy link
Member

tenzen-y commented Apr 1, 2024

@tenzen-y It seems that the CI failed due to other reasons which are not concerned with my code.

I'll try to check it.

@tenzen-y
Copy link
Member

tenzen-y commented Apr 1, 2024

@tenzen-y It seems that the CI failed due to other reasons which are not concerned with my code.

I'll try to check it.

Crated: #2297

@tenzen-y
Copy link
Member

tenzen-y commented Apr 2, 2024

@Electronic-Waste Could you rebase this PR? Because I fixed CI issues.

@Electronic-Waste
Copy link
Member Author

@tenzen-y Sure! I'll rebase it soon.

Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>
Signed-off-by: Electronic-Waste <2690692950@qq.com>
@Electronic-Waste
Copy link
Member Author

Electronic-Waste commented Apr 3, 2024

Some unexpected errors occurred in Go Test / Unit Test (1.27.1) (pull_request)

{"level":"info","ts":"2024-04-03T01:41:34Z","logger":"trial-controller","msg":"Trial status changed to Metrics Unavailable","Trial":{"name":"test-trial","namespace":"default"}}
--- FAIL: TestReconcileBatchJob (82.16s)
    trial_controller_test.go:274: 
        Timed out after 80.000s.
        Expected
            <bool>: false
        to be true
FAIL

And also E2E Test with pytorch-mnist / e2e (v1.25.12, file-metrics-collector,pytorchjob-mnist):

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
katib-controller    ClusterIP   10.103.178.5     <none>        443/TCP,8080/TCP,18080/TCP   2s
katib-db-manager    ClusterIP   10.107.4.68      <none>        6789/TCP                     2s
katib-mysql         ClusterIP   10.103.235.10    <none>        3306/TCP                     2s
training-operator   ClusterIP   10.104.220.108   <none>        8080/TCP                     3s
Error from server (InternalError): error when creating "../../testdata/valid-experiment.yaml": Internal error occurred: failed calling webhook "defaulter.experiment.katib.kubeflow.org": failed to call webhook: Post "[https://katib-controller.kubeflow.svc:443/mutate-experiment?timeout=10s](https://katib-controller.kubeflow.svc/mutate-experiment?timeout=10s)": dial tcp 10.103.178.5:443: connect: connection refused

I encountered them many times in CI. Should we look into these errors and fix them?

@tenzen-y
Copy link
Member

tenzen-y commented Apr 3, 2024

I encountered them many times in CI. Should we look into these errors and fix them?

Those errors are known flakness issues. I'll try to restart jobs.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Apr 3, 2024
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Electronic-Waste, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 7df05c2 into kubeflow:master Apr 3, 2024
60 of 61 checks passed
@Electronic-Waste
Copy link
Member Author

@tenzen-y Thank you for your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add unit tests for File Metrics Collector
3 participants