Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] update oneDNN to v2.6 #11140

Merged
merged 6 commits into from
May 18, 2022
Merged

[CI] update oneDNN to v2.6 #11140

merged 6 commits into from
May 18, 2022

Conversation

yangulei
Copy link
Contributor

@yangulei yangulei commented Apr 27, 2022

I submitted the PR to enable bfloat16 in DNNL BYOC(#11111). But the PR failed to be built due to the version of the oneDNN in CI is limited to v2.2, which is a relative old one. So I update the CI script to get and build the latest oneDNN release.
I had tested the script on my own PC and it works fine, but I don't know how to test the script in CI environment. So let me know if I did something inappropriate, thanks!

cc @Mousius @areusch @driazati

@github-actions github-actions bot requested review from areusch and driazati April 27, 2022 08:39
Copy link
Member

@driazati driazati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI tests your changes automatically so no worries there, I would prefer that we just upgrade the oneDNN version manually (and keep the version pinned) though instead of grabbing the latest (imagine that there is a breaking update, the Docker image build would just stop working one day)

@yangulei
Copy link
Contributor Author

CI tests your changes automatically so no worries there, I would prefer that we just upgrade the oneDNN version manually (and keep the version pinned) though instead of grabbing the latest (imagine that there is a breaking update, the Docker image build would just stop working one day)

Thanks for your comments and suggestions. I had modified the script to pin oneDNN to the latest release v2.6, and it's easy to change the pinned version or set to build the latest release.

@yangulei
Copy link
Contributor Author

yangulei commented May 3, 2022

The CI failed due to ConnectTimeoutError during the installation of tensorflow-aarch64 v2.6.2:

[2022-04-28T01:43:42.770Z]` WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0xffff827a3050>, 'Connection to snapshots.linaro.org timed out. (connect timeout=15)')': /ldcg/python-cache/tensorflow-aarch64/
[2022-04-28T01:43:55.126Z] ERROR: Could not find a version that satisfies the requirement tensorflow-aarch64==2.6.2 (from versions: 1.2, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.9.0rc0, 2.9.0rc1)
[2022-04-28T01:43:55.126Z] ERROR: No matching distribution found for tensorflow-aarch64==2.6.2
[2022-04-28T01:44:00.406Z] WARNING: You are using pip version 19.3.1; however, version 22.0.4 is available.
[2022-04-28T01:44:00.406Z] You should consider upgrading via the 'pip install --upgrade pip' command.
[2022-04-28T01:44:00.406Z] The command '/bin/sh -c bash /install/ubuntu_install_tensorflow_aarch64.sh' returned a non-zero code: 1
[2022-04-28T01:44:00.406Z] ERROR: docker build failed.
script returned exit code 1

Should I do something or just ignore this?

@areusch
Copy link
Contributor

areusch commented May 11, 2022

@yangulei i think this means the CI's quota to PyPI was exceeded. perhaps retry now that it's been a while?

@driazati
Copy link
Member

Yeah that index was down for a while last week (#11156 (comment)) but it should be working now. If this happens again we might want to look into rehosting some of those assets

@yangulei yangulei changed the title [CI] Enable CI to get and build the latest oneDNN release [CI] update oneDNN to v2.6 May 12, 2022
@yangulei
Copy link
Contributor Author

@areusch @driazati
PyPi works fine, thanks!
Looks like we should install the package to /usr/lib instead of /usr/local in the latest main branch. I have modified my script (follow the installation of gtest) and it pass all the CI now.

@yangulei
Copy link
Contributor Author

Hi @areusch @driazati,
PR #11111 depends on this PR and had waited for 3 weeks, so I want this PR to be merged if it's ready.
Could you please take a look at this script and approve it if no more modifications are required?
Thanks a lot.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @yangulei and sorry for the delay here!

@areusch areusch merged commit fb0938a into apache:main May 18, 2022
@masahi
Copy link
Member

masahi commented May 18, 2022

@areusch We still need to go through the actual update process for the cpu image right? I've got another request for gpu update in #11346, I can do the cpu update as well.

@yangulei yangulei deleted the dev_CI branch May 19, 2022 02:58
@yangulei
Copy link
Contributor Author

Thank you @areusch, I will rebase #11111 once the docker image is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants