-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocm docker github action build failed #2408
Comments
adding revert pr to help isolate the problem: |
Right - just created a branch without #2403 to check the latest successful ROCm image build version and to compare. |
The same action workflow but without push image can be executed normally at https://github.com/KagurazakaNyaa/tabby/actions/runs/9496954836. This fork uses GitHub's default runner instead of the self-hosted runner. |
Also the ROCm version is kind of outdated with 5.7.1 although compatible with older cards, version 6.1.2 is out and has massive improvements in the newer cards, I don't know how much this can affect the model performance. I tried compiling the 0.12.0 tag and I get this error with my registry and also tried local with this command
|
docker ROCm is still not building, latest version is 0.11. |
Hi - we turned off the rocm build as our github action runner is not able to complete it - as an alternative, I recommend use vulkan backend instead for amd gpu deployments. |
@wsxiaoys well llamacpp rocm docker builds are also failing, but metal are ok. |
@wsxiaoys so I figured one part, but I am kind of hitting a wall, maybe some config is missing?! On build.rs you need to change this and add this for future compatibility with rocm an and future proof for 6.1.2
but now I get this error can't figure out why llama-server can't access libomp.so. I managed to build llamacpp llama-server with rocm docker 5.7.1 and 6.1.2 and it runs great. Everything was tested today with tabby's master branch. Any pointers why this happens? |
I tried to get the RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
curl \
openssh-client \
ca-certificates \
libssl3 \
rocblas \
hipblas \
libgomp1 \
# add the package that provides libomp.so
libomp-dev \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
# create the symlink
ln -s /usr/lib/x86_64-linux-gnu/libomp.so.5 /usr/lib/x86_64-linux-gnu/libomp.so I want to stress that I have no experience with any of this. It's just tinkering around and getting tabby with amd to work on my machine, so I don't know if |
@JayPi4c That is strange, libomp exists in the /opt/rocm.
|
Thanks! I did not know about Currently root@d17bd20c90d1:/# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/tabby/bin So there is no reference to |
@JayPi4c well there is in the Makefile of llamacpp
I can understand makefiles, with cmake I am going to admit my ignorance. I do not know if these flags are even passed to the cpp with Cmake. In the make file it is swaped by LDFLAGS
in llama-server it is passed by
well time to learn cmake and figure out what is going on. |
I think I might have found the issue. Going to try and compile llamacpp with cmake to test. I think that in llamacpp cmake/llama-config.cmake.in you have the GGML_HIPBLAS variable that has find_package, but does not add rocm path as an add_library. I will refer to this issue in llamacpp that I opened. |
Describe the bug
action: Create and publish docker image run failed
https://github.com/TabbyML/tabby/actions/runs/9506018585
release-docker (rocm)
The hosted runner: GitHub Actions 15 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
Additional context
In PR #2043, I attempted to update the version of action. In my fork, it can be built normally; however, after merging, it is still unable to build rocm docker images normally. It's recommended that check if a self-hosted Action Runner has been configured incorrectly.
The text was updated successfully, but these errors were encountered: