forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronise with IBM/vllm:main #20
Merged
Merged
+28
−30
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`format.sh` now has mypy checks after pulling in upstream changes. This PR makes the mypy suggested modifications to our code. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
…ubi (opendatahub-io#23) Changes: - vLLM v0.4.2 was published today, update our build to use pre-built libs from their wheel - bump other dependencies in the image build (base UBI image, miniforge, flash attention, grpcio-tools, accelerate) - little cleanup to remove `PYTORCH_` args that are no longer used --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
not supported by openshift CI (buildah) See containers/buildah#4325
z103cb
force-pushed
the
ibm_main_synch05092024
branch
from
May 9, 2024 11:05
6d951fd
to
c810eb8
Compare
z103cb
changed the title
Synchronise with IBM/vllm:main o
Synchronise with IBM/vllm:main
May 9, 2024
z103cb
commented
May 9, 2024
z103cb
force-pushed
the
ibm_main_synch05092024
branch
from
May 9, 2024 12:21
c810eb8
to
d173ce7
Compare
dtrifiro
approved these changes
May 9, 2024
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtrifiro, z103cb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Xaenalt
pushed a commit
that referenced
this pull request
Sep 18, 2024
* Fix setup.py for HPU * Fix vllm._C import ops -> vllm.hpu import ops * more of the same thing * re-add hpex rmsnorm and rope; but rope is crashing * remove unnecessary comments * add vllm/hpu files * add hpu autodetection * Add HabanaAttention stub * revert accidental changes * revert non-habana backend attention changes * add habana attention/worker/executor, sampling fails now * Restore unnecessarily changed files * enable HabanaMemoryProfiler * Make sampler pass * restore habana fused rope * prefill is now working!!! * fix prefill padding; decode is now working!!!!! * revert accidental changes * remove unused stuff in habana_paged_attn.py * remove diagnostic stuff from llm_engine.py * use HabanaExecutorAsync in async_llm_engine.py * add habana copyright headers to habana_*.py files * fix prefill attention conformance * minor naming fixes * remove naive attention from habana_attn (it never worked anyway) * re-enable profile run * Add fake HPUGraph support * add more metrics * indentation fix * ~~recipe cache metrics don't work lalalala~~ * i'm done with metrics for now * fix corner case in which hl-smi is not available but synapse is * FIXME: temporary setup.py workaround * WIP: add tensor parallelism stubs * habana worker cleanup * tensor parallelism is now working * remove unused files * remove unused func * add hpugraphrunner * improve hpu layernorm * Port pipelined PA * Port context length bucketing * remove cudagraphrunner from hpu runner * restore HPUGraphRunner back from FakeHPUGraphRunner * handle rotary embeddings properly on gaudi3 * oopsie! captured_block_counts was incorrect! * captured_block_counts.append doesn't do anything * Restore habana_main KV cache memory layout * fix memory profiler * overhaul hpugraph capture * memory profiling overhaul * format memory properly in model warmup * add graph compilation profiler for graph capture phase * adroll back log lvl on graph capture message * Remove unnecessary view on residual connection in RMSNorm (#25) --------- Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
prarit
pushed a commit
to prarit/vllm
that referenced
this pull request
Oct 18, 2024
…refactor Dockerfile improvements: multistage
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bringing in the latest changes from IBM/main