Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronise with IBM/vllm:main #20

Merged
merged 4 commits into from
May 9, 2024

Conversation

z103cb
Copy link

@z103cb z103cb commented May 9, 2024

Bringing in the latest changes from IBM/main

tjohnson31415 and others added 3 commits May 9, 2024 14:01
`format.sh` now has mypy checks after pulling in upstream changes. This
PR makes the mypy suggested modifications to our code.

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
…ubi (opendatahub-io#23)

Changes:
- vLLM v0.4.2 was published today, update our build to use pre-built
libs from their wheel
- bump other dependencies in the image build (base UBI image, miniforge,
flash attention, grpcio-tools, accelerate)
- little cleanup to remove `PYTORCH_` args that are no longer used

---------

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
@z103cb z103cb force-pushed the ibm_main_synch05092024 branch from 6d951fd to c810eb8 Compare May 9, 2024 11:05
@z103cb z103cb changed the title Synchronise with IBM/vllm:main o Synchronise with IBM/vllm:main May 9, 2024
Dockerfile.ubi Show resolved Hide resolved
@z103cb z103cb force-pushed the ibm_main_synch05092024 branch from c810eb8 to d173ce7 Compare May 9, 2024 12:21
@z103cb z103cb enabled auto-merge (rebase) May 9, 2024 13:26
@z103cb z103cb requested a review from dtrifiro May 9, 2024 13:27
Copy link

openshift-ci bot commented May 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtrifiro, z103cb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@z103cb z103cb merged commit 9543d0b into opendatahub-io:ibm_main May 9, 2024
2 of 3 checks passed
@z103cb z103cb deleted the ibm_main_synch05092024 branch May 9, 2024 13:50
Xaenalt pushed a commit that referenced this pull request Sep 18, 2024
* Fix setup.py for HPU

* Fix  vllm._C import ops -> vllm.hpu import ops

* more of the same thing

* re-add hpex rmsnorm and rope; but rope is crashing

* remove unnecessary comments

* add vllm/hpu files

* add hpu autodetection

* Add HabanaAttention stub

* revert accidental changes

* revert non-habana backend attention changes

* add habana attention/worker/executor, sampling fails now

* Restore unnecessarily changed files

* enable HabanaMemoryProfiler

* Make sampler pass

* restore habana fused rope

* prefill is now working!!!

* fix prefill padding; decode is now working!!!!!

* revert accidental changes

* remove unused stuff in habana_paged_attn.py

* remove diagnostic stuff from llm_engine.py

* use HabanaExecutorAsync in async_llm_engine.py

* add habana copyright headers to habana_*.py files

* fix prefill attention conformance

* minor naming fixes

* remove naive attention from habana_attn (it never worked anyway)

* re-enable profile run

* Add fake HPUGraph support

* add more metrics

* indentation fix

* ~~recipe cache metrics don't work lalalala~~

* i'm done with metrics for now

* fix corner case in which hl-smi is not available but synapse is

* FIXME: temporary setup.py workaround

* WIP: add tensor parallelism stubs

* habana worker cleanup

* tensor parallelism is now working

* remove unused files

* remove unused func

* add hpugraphrunner

* improve hpu layernorm

* Port pipelined PA

* Port context length bucketing

* remove cudagraphrunner from hpu runner

* restore HPUGraphRunner back from FakeHPUGraphRunner

* handle rotary embeddings properly on gaudi3

* oopsie! captured_block_counts was incorrect!

* captured_block_counts.append doesn't do anything

* Restore habana_main KV cache memory layout

* fix memory profiler

* overhaul hpugraph capture

* memory profiling overhaul

* format memory properly in model warmup

* add graph compilation profiler for graph capture phase

* adroll back log lvl on graph capture message

* Remove unnecessary view on residual connection in RMSNorm (#25)

---------

Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024
…refactor

Dockerfile improvements: multistage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants