Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Python version to 3.9 #1731

Merged

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Nov 10, 2021

What this PR does / why we need it:

Upgrade the Python version and the dependent libraries version with that fix some training codes that stopped working after upgrading them and fix some dockerfile for arm64 and ppc64le that were not buildable.

Also, it allows the TFEventMetricsCollector to collect metrics generated by TF 2.X and, stop supporting TF <= 1.X.

Furthermore, add the task to kubeflow-katib-presubmit to verify that tfevent-metrics-collector can collect metrics output by training code using TF2.0.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1730

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented Nov 10, 2021

Coverage Status

Coverage decreased (-0.1%) to 74.138% when pulling d428ddc on tenzen-y:issue-1730-bump-python-to-3.7 into 7be1f0a on kubeflow:master.

@tenzen-y tenzen-y changed the title [WIP] Bump Python to 3.9 [WIP] Bump Python version to 3.9 Nov 11, 2021
@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch from 8ff0d76 to 56e5651 Compare November 12, 2021 13:39
@google-oss-prow google-oss-prow bot added size/XL and removed size/L labels Nov 12, 2021
@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch 4 times, most recently from d5d6ed8 to 504d3c4 Compare November 12, 2021 14:16
@tenzen-y
Copy link
Member Author

I added a new trial image as discussed in this comment but could not push it to ECR in CI.

スクリーンショット 2021-11-12 23 31 18

Do I need to create PR as this in kubeflow/testing repository? @kubeflow/wg-automl-leads

@andreyvelich
Copy link
Member

I added a new trial image as discussed in this comment but could not push it to ECR in CI.

スクリーンショット 2021-11-12 23 31 18

Do I need to create PR as this in kubeflow/testing repository? @kubeflow/wg-automl-leads

Yes, please can you create PR with the new image:

"katib-trial-tf-mnist-with-summaries": "katib/v1beta1/trial-tf-mnist-with-summaries"

@tenzen-y
Copy link
Member Author

Thank you for letting me know. @andreyvelich

@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch 4 times, most recently from 6fa3915 to 03bc04f Compare November 13, 2021 09:30
@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch 5 times, most recently from 9692bfc to 3561aac Compare November 14, 2021 05:37
@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch from 30cc0bf to a08222a Compare December 10, 2021 02:18
@tenzen-y
Copy link
Member Author

I have rebased this PR and resolved conflicts.

@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch from a08222a to 6d74b60 Compare December 10, 2021 02:27
@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch from 6d74b60 to a052149 Compare December 10, 2021 02:37
@tenzen-y
Copy link
Member Author

/retest

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andreyvelich
Copy link
Member

@tenzen-y Please can you also update update-images.sh script with your new Trial image name.

@tenzen-y
Copy link
Member Author

@tenzen-y Please can you also update update-images.sh script with your new Trial image name.

Sure.

@tenzen-y tenzen-y force-pushed the issue-1730-bump-python-to-3.7 branch from b63b40e to 8553312 Compare December 10, 2021 08:41
@tenzen-y
Copy link
Member Author

I have modified update-images.sh.
Can you double-check the changes? @andreyvelich

@andreyvelich
Copy link
Member

Thank you for driving this @tenzen-y!
/lgtm
/retest

@google-oss-prow google-oss-prow bot added the lgtm label Dec 10, 2021
@google-oss-prow google-oss-prow bot removed the lgtm label Dec 10, 2021
@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 10, 2021

E2E sometimes fails in this PR by timeout.
I guess that when the suggestion service proposes a large batch size in this, the training Pod will be killed by OOM, and the run-e2e-experiment will fail. So I reduced batch size in the example for TFJob.

2021/12/10 11:07:56 Waiting for Experiment tfjob-mnist-with-summaries to finish
2021/12/10 11:07:56 Experiment is running: 2 Trials, 0 Pending Trials, 1 Running Trials, 1 Succeeded Trials, 0 Failed Trials
2021/12/10 11:07:56 Current optimal Trial: {tfjob-mnist-with-summaries-mzdtdxhr [{learning_rate 0.014925334116019179} {batch_size 118}] {[{accuracy 0.9707 0.9707 0.9707}]}}
2021/12/10 11:07:56 Experiment conditions: [{Created True ExperimentCreated Experiment is created 2021-12-10 10:18:12 +0000 UTC 2021-12-10 10:18:12 +0000 UTC} {Running True Experi
mentRunning Experiment is running 2021-12-10 10:18:26 +0000 UTC 2021-12-10 10:18:26 +0000 UTC}]
2021/12/10 11:08:16 Deleting Experiment tfjob-mnist-with-summaries
2021/12/10 11:08:16 Wait Experiment finish failed: Experiment run timed out

@tenzen-y
Copy link
Member Author

tenzen-y commented Dec 10, 2021

@andreyvelich
Copy link
Member

Should be good now, thanks @tenzen-y!
/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Dec 10, 2021
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 2a0b12e into kubeflow:master Dec 10, 2021
@tenzen-y tenzen-y deleted the issue-1730-bump-python-to-3.7 branch December 10, 2021 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bump Python version to 3.9
5 participants