-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI Image] Update PyTorch to v1.10 in GPU image #9713
Comments
Hi @masahi , FYI, I'd be careful with S3. I was trying to use the image from S2 in S3 as mentioned here : #9659 Now the ci_cpu upgrade is currently blocked on this and we dont have access to any of the nodes. cc : @leandron |
@manupa-arm Thanks, yes I'm aware of the recent CI outrage. Probably I'll work on this next week. |
I can deal with this request in #9762. |
@leandron This is for updating the GPU image, so I think it should be independent. |
@manupa-arm @leandron @areusch I'm not familiar with the new CI image update protocol. How does one push an image to
But got
Previously I would just create an image like ci_gpu:v0.76 and push it to tlcpack dockerhub org |
@masahi you should not be needing to push images there. The ci-docker-build nighly job will automatically build images from a nightly checkout of main for all images, and push it there. The last number indicates the commit hash used when building the image |
Interesting! I guess I don't even need to build a new image locally anymore... So should I just send a PR to update I have more CI questions I want to ask. Recently I've joined discord to learn about CI issues. Can we continue there? |
Yes, that should do it. S3 should capture any issues should the updated image breaks the current tests anyway. As a next step, we want to explore the possibility to making every PR rebuild the images using tlcpackstaging images as a cache. i.e. we don't sacrifice correctness in the verification but the PRs that involves docker changes will be a bit slow due to the invalidation of the cache -- however, if we could push another tlcpackstaging image then it should be go back to normal behavior of using the cache. Something to discuss in the next meetup : @leandron @areusch |
@jiangjiajun I'm testing a new image with PT 1.10 and getting an error from paddle: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/182/pipeline/ Any idea what's going on? I can run |
It looks strange, I'll check this today |
@masahi I'm not sure why this error occurred. I have found there was a core dumped problem while we running To solve the problem, we have released a new version of 2.1.3, use the new function This function is called in paddle frontend and tvmc frontend, we have tested in TVM, it works with no problem I checked all the codes in tvm where we have imported paddle, and found the test code didn't call |
@jiangjiajun Unfortunately it didn't help https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/184/pipeline Were you able to reproduce the issue locally? |
Do you know how to reproduce the problem, or which script I should run |
It's not clear what script caused the failure. The error message seems to indicate that the issue happens during
I thought the error was coming from |
Okay, I will test on my evironment |
@jiangjiajun I was able to reproduce the error under the new gpu container. Does paddle use PyTorch internally, or link against |
@jiangjiajun ok it turned out the error has nothing to do with paddle: After applying the mitigation for PyTorch + LLVM symbol conflict issue #9362 (comment), there is no longer |
Paddle does not depend on PyTorch or LibTorch. Did you solve the problem now, there's error in this link https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/185/pipeline/267/ |
S0. The version on CI is 1.7, more than a year old
S1. Tag of nightly build: TAG. Docker hub: https://hub.docker.com/layers/tlcpackstaging/ci_gpu/20220105-225914-79cfb797e
S2. The nightly is built on TVM commit: 79cfb79
S3. Testing the nightly image on ci-docker-staging: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/ci-docker-staging/193/pipeline/
S4. Retag TAG to VERSION:
S5. Check if the new tag is really there: https://hub.docker.com/u/tlcpack
S6. Submit a PR updating the IMAGE_NAME version on Jenkins [CI Image] Update PyTorch to v1.10 in GPU image #9713
The text was updated successfully, but these errors were encountered: