-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROCm Port #1087
Merged
+335
−59
Merged
ROCm Port #1087
Changes from all commits
Commits
Show all changes
105 commits
Select commit
Hold shift + click to select a range
0fd8363
use hipblas based on cublas
SlyEcho 54a63c1
Update Makefile for the Cuda kernels
SlyEcho 0e005f7
Build file changes
SlyEcho d3e1984
add rpath
SlyEcho 3677235
More build file changes
SlyEcho db7a012
Merge 'origin/master' into hipblas
SlyEcho 3a004b2
add rpath
SlyEcho 608aa33
change default GPU arch to match CMake
SlyEcho d571d16
Merge 'origin/master' into hipblas
SlyEcho ef51e9e
Merge branch 'ggerganov:master' into hipblas
SlyEcho ecc0565
only .cu file needs to be complied as device
SlyEcho a1caa48
add more cuda defines
SlyEcho 3b4a531
Merge 'origin/master' into hipblas
SlyEcho 2ab9d11
Merge 'origin/master' into hipblas
SlyEcho d194586
Merge 'origin/master' into hipblas
SlyEcho d8ea75e
Merge 'origin/master' into hipblas
SlyEcho c73def1
Merge 'origin/master' into hipblas
SlyEcho fcbc262
Merge 'origin/master' into hipblas
SlyEcho b67cc50
Merge 'origin/master' into hipblas
SlyEcho d83cfba
Merge 'origin/master' into hipblas
SlyEcho 04c0d48
Move all HIP stuff to ggml-cuda.cu
SlyEcho 1107194
Merge 'origin/master' into hipblas
SlyEcho 289073a
Merge 'origin/master' into hipblas
SlyEcho baeb482
Revert to default copy
SlyEcho 0aefa6a
Merge 'origin/master' into hipblas
SlyEcho a3296d5
Merge 'origin/master' into hipblas
SlyEcho 070cbcc
occupanct function
SlyEcho 127f68e
Merge 'origin/master' into hipblas
SlyEcho 605560d
Merge 'origin/master' into hipblas
SlyEcho 0fe6384
fix makefile
SlyEcho 2956630
Merge 'origin/master' into hipblas
SlyEcho 8bab456
Merge 'origin/master' into hipblas
SlyEcho a0b2d5f
Merge 'origin/master' into hipblas
SlyEcho c66115b
Merge 'origin/master' into hipblas
SlyEcho b19fefe
Forwardcompat
SlyEcho 600ace3
update warp size
SlyEcho f80ce7a
Merge branch 'origin/master' into hipblas
SlyEcho 174bf6a
Merge 'origin/master' into hipblas
SlyEcho a593a4f
Add missing parameters
SlyEcho 30d921a
and makefile
SlyEcho 4c8b3fb
add configurable vars
SlyEcho a4648c1
Merge 'origin/master' into hipblas
SlyEcho 9fdaa1d
Add more defs
SlyEcho 33091a9
Merge 'origin/master' into hipblas
SlyEcho 5d6eb72
warp size fixes
SlyEcho 1ba4ce4
Revert "warp size fixes"
SlyEcho fa5b3d7
fix makefile.
SlyEcho 4362e80
Merge 'origin/master' into hipblas
SlyEcho 85f902d
Merge 'origin/master' into hipblas
SlyEcho a836529
Merge 'origin/master' into hipblas
SlyEcho 61df8e9
add cudaMemset
SlyEcho 6f7c156
Merge 'origin/master' into hipblas
SlyEcho 67e229b
Merge 'origin/master' into hipblas
SlyEcho 5dd2fbe
Merge 'origin/master' into hipblas
SlyEcho df7346c
Merge 'origin/master' into hipblas
SlyEcho 35a6031
Merge 'origin/master' into hipblas
SlyEcho c1e5c83
Merge 'origin/master' into hipblas
SlyEcho c8ae945
Merge 'origin/master' into hipblas
SlyEcho bb16eff
headers fix; add kquants_iter for hipblas and add gfx803 (#1)
YellowRoseCx 04419f1
Merge 'origin/master' into hipblas
SlyEcho 15db19a
Merge 'origin/master' into hipblas
SlyEcho c3e3733
ROCm fixes
SlyEcho 7735c5a
Merge 'origin/master' into hipblas
SlyEcho 80e4e54
Merge 'origin/master' into hipblas
SlyEcho e610466
Expand arch list and make it overrideable
SlyEcho 8c2c497
Merge 'origin/master' into hipblas
SlyEcho afcb8fe
Add new config option
SlyEcho cd36b18
Merge 'origin/master' into hipblas
SlyEcho 2ec4466
Update build flags.
SlyEcho 3db70b5
Merge 'origin/master' into hipblas
SlyEcho 1f6294d
Fix multi GPU on multiple amd architectures with rocblas_initialize()…
YellowRoseCx 8e8054a
Add rocblas to build files
SlyEcho cde52d6
Merge 'origin/master' into hipblas
SlyEcho d2ade63
Merge 'origin/master' into hipblas
SlyEcho f8e3fc6
rocblas init stuff
SlyEcho 4336231
add hipBLAS to README
SlyEcho c1664a0
Merge 'origin/master' into hipblas
SlyEcho c1cb70d
new build arg LLAMA_CUDA_MMQ_Y
SlyEcho d91456a
fix half2 decomposition
ardfork ab62128
Merge 'origin/master' into hipblas
SlyEcho 4024f91
Add intrinsics polyfills for AMD
SlyEcho 610ba4c
Merge 'origin/master' into hipblas
SlyEcho 8f8ab6c
hipLDFLAG Path change Unix to multisystem in Makefile
YellowRoseCx 29a59b5
Fix merge
SlyEcho f41920e
AMD assembly optimized __dp4a
Engininja2 42e055d
ws fix
SlyEcho e6b6ae5
Undo mess
SlyEcho c299c4a
New __dp4a assembly
Engininja2 b815e97
Merge 'origin/master' into hipblas
SlyEcho 4e58a05
Allow overriding CC_TURING
SlyEcho 6415610
gfx1100 support
SlyEcho 70e2f7c
Merge 'origin/master' into hipblas
SlyEcho 68e79cc
Merge 'origin/master' into hipblas
SlyEcho 3de6a9a
reenable LLAMA_CUDA_FORCE_DMMV
SlyEcho bbbc0ce
makefile rewrite
SlyEcho c88c2a9
probably lld is not required
SlyEcho 423db74
Merge 'origin/master' into hipblas
SlyEcho 391dd9a
Merge 'origin/master' into hipblas
SlyEcho 5d3e7b2
use "ROCm" instead of "CUDA"
SlyEcho 7b84217
Merge 'origin/master' into hipblas
SlyEcho 058f905
ignore all build dirs
SlyEcho a60231f
Add Dockerfiles
SlyEcho 81ecaa4
fix llama-bench
SlyEcho 238335f
fix -nommq help for non CUDA/HIP
SlyEcho 9035cfc
Merge 'origin/master' into hipblas
SlyEcho File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
ARG UBUNTU_VERSION=22.04 | ||
|
||
# This needs to generally match the container host's environment. | ||
ARG ROCM_VERSION=5.6 | ||
|
||
# Target the CUDA build image | ||
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete | ||
|
||
FROM ${BASE_ROCM_DEV_CONTAINER} as build | ||
|
||
# Unless otherwise specified, we make a fat build. | ||
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878 | ||
# This is mostly tied to rocBLAS supported archs. | ||
ARG ROCM_DOCKER_ARCH=\ | ||
gfx803 \ | ||
gfx900 \ | ||
gfx906 \ | ||
gfx908 \ | ||
gfx90a \ | ||
gfx1010 \ | ||
gfx1030 \ | ||
gfx1100 \ | ||
gfx1101 \ | ||
gfx1102 | ||
|
||
COPY requirements.txt requirements.txt | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install -r requirements.txt | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
# Set nvcc architecture | ||
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH} | ||
# Enable ROCm | ||
ENV LLAMA_HIPBLAS=1 | ||
ENV CC=/opt/rocm/llvm/bin/clang | ||
ENV CXX=/opt/rocm/llvm/bin/clang++ | ||
|
||
RUN make | ||
|
||
ENTRYPOINT ["/app/.devops/tools.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
ARG UBUNTU_VERSION=22.04 | ||
|
||
# This needs to generally match the container host's environment. | ||
ARG ROCM_VERSION=5.6 | ||
|
||
# Target the CUDA build image | ||
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete | ||
|
||
FROM ${BASE_ROCM_DEV_CONTAINER} as build | ||
|
||
# Unless otherwise specified, we make a fat build. | ||
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878 | ||
# This is mostly tied to rocBLAS supported archs. | ||
ARG ROCM_DOCKER_ARCH=\ | ||
gfx803 \ | ||
gfx900 \ | ||
gfx906 \ | ||
gfx908 \ | ||
gfx90a \ | ||
gfx1010 \ | ||
gfx1030 \ | ||
gfx1100 \ | ||
gfx1101 \ | ||
gfx1102 | ||
|
||
COPY requirements.txt requirements.txt | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install -r requirements.txt | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
# Set nvcc architecture | ||
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH} | ||
# Enable ROCm | ||
ENV LLAMA_HIPBLAS=1 | ||
ENV CC=/opt/rocm/llvm/bin/clang | ||
ENV CXX=/opt/rocm/llvm/bin/clang++ | ||
|
||
RUN make | ||
|
||
ENTRYPOINT [ "/app/main" ] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ROCm path shouldn't be hardcoded to
/opt/rocm
. It's common to use the env varROCM_PATH
(alsoROCM_HOME
is sometime used)./opt/rocm
should only be a fallback.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this from AMD's docs, but they have updated it now: Using CMake. Probably because it is not going to work in Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't taken a look at AMD's docs. But they at least internally use
ROCM_PATH
on all the projects that I have seen.As the CMake config would probably need change anyway for windows, and I don't think a lot of people will be impacted by not using their configured ROCm path, I think it's fine to let it that way for now. But whenever change to CMake config to add support for windows, it would be nice to also add support for one of the
ROCM_PATH
/HIP_PATH
/ROCM_HOME
on linux.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the latest docs say to always manually use a CMake prefix for configuring. Guess that makes sense because on Windows, people could install it anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows, you'd instead have
HIP_PATH
set IIRC. But someone would need to check the HIP Windows SDK installation to be sure.