Synchronise with IBM/vllm:main #20

z103cb · 2024-05-09T10:56:57Z

Bringing in the latest changes from IBM/main

`format.sh` now has mypy checks after pulling in upstream changes. This PR makes the mypy suggested modifications to our code. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

…ubi (opendatahub-io#23) Changes: - vLLM v0.4.2 was published today, update our build to use pre-built libs from their wheel - bump other dependencies in the image build (base UBI image, miniforge, flash attention, grpcio-tools, accelerate) - little cleanup to remove `PYTORCH_` args that are no longer used --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

not supported by openshift CI (buildah) See containers/buildah#4325

Dockerfile.ubi

openshift-ci · 2024-05-09T13:46:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtrifiro, z103cb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dtrifiro,z103cb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Fix setup.py for HPU * Fix vllm._C import ops -> vllm.hpu import ops * more of the same thing * re-add hpex rmsnorm and rope; but rope is crashing * remove unnecessary comments * add vllm/hpu files * add hpu autodetection * Add HabanaAttention stub * revert accidental changes * revert non-habana backend attention changes * add habana attention/worker/executor, sampling fails now * Restore unnecessarily changed files * enable HabanaMemoryProfiler * Make sampler pass * restore habana fused rope * prefill is now working!!! * fix prefill padding; decode is now working!!!!! * revert accidental changes * remove unused stuff in habana_paged_attn.py * remove diagnostic stuff from llm_engine.py * use HabanaExecutorAsync in async_llm_engine.py * add habana copyright headers to habana_*.py files * fix prefill attention conformance * minor naming fixes * remove naive attention from habana_attn (it never worked anyway) * re-enable profile run * Add fake HPUGraph support * add more metrics * indentation fix * ~~recipe cache metrics don't work lalalala~~ * i'm done with metrics for now * fix corner case in which hl-smi is not available but synapse is * FIXME: temporary setup.py workaround * WIP: add tensor parallelism stubs * habana worker cleanup * tensor parallelism is now working * remove unused files * remove unused func * add hpugraphrunner * improve hpu layernorm * Port pipelined PA * Port context length bucketing * remove cudagraphrunner from hpu runner * restore HPUGraphRunner back from FakeHPUGraphRunner * handle rotary embeddings properly on gaudi3 * oopsie! captured_block_counts was incorrect! * captured_block_counts.append doesn't do anything * Restore habana_main KV cache memory layout * fix memory profiler * overhaul hpugraph capture * memory profiling overhaul * format memory properly in model warmup * add graph compilation profiler for graph capture phase * adroll back log lvl on graph capture message * Remove unnecessary view on residual connection in RMSNorm (#25) --------- Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>

…refactor Dockerfile improvements: multistage

openshift-merge-robot added the needs-rebase label May 9, 2024

openshift-ci bot requested review from rpancham and terrytangyuan May 9, 2024 10:57

openshift-ci bot added the approved label May 9, 2024

tjohnson31415 and others added 3 commits May 9, 2024 14:01

format: make mypy happy (opendatahub-io#24)

ae322bf

`format.sh` now has mypy checks after pulling in upstream changes. This PR makes the mypy suggested modifications to our code. --------- Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Dockerfile.ubi: get rid of --link flags for COPY operations

84e4805

not supported by openshift CI (buildah) See containers/buildah#4325

z103cb force-pushed the ibm_main_synch05092024 branch from 6d951fd to c810eb8 Compare May 9, 2024 11:05

openshift-merge-robot removed the needs-rebase label May 9, 2024

z103cb changed the title ~~Synchronise with IBM/vllm:main o~~ Synchronise with IBM/vllm:main May 9, 2024

z103cb commented May 9, 2024

View reviewed changes

Dockerfile.ubi Show resolved Hide resolved

TGISStatLogger: fix stats usage

Loading
Loading status checks…

d173ce7

z103cb force-pushed the ibm_main_synch05092024 branch from c810eb8 to d173ce7 Compare May 9, 2024 12:21

z103cb enabled auto-merge (rebase) May 9, 2024 13:26

z103cb requested a review from dtrifiro May 9, 2024 13:27

dtrifiro approved these changes May 9, 2024

View reviewed changes

z103cb merged commit 9543d0b into opendatahub-io:ibm_main May 9, 2024
2 of 3 checks passed

z103cb deleted the ibm_main_synch05092024 branch May 9, 2024 13:50

prarit pushed a commit to prarit/vllm that referenced this pull request Oct 18, 2024

Merge pull request opendatahub-io#20 from ROCm/Dockerfile_multistage_…

23c696d

…refactor Dockerfile improvements: multistage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronise with IBM/vllm:main #20

Synchronise with IBM/vllm:main #20

z103cb commented May 9, 2024

openshift-ci bot commented May 9, 2024

Synchronise with IBM/vllm:main #20

Synchronise with IBM/vllm:main #20

Conversation

z103cb commented May 9, 2024

openshift-ci bot commented May 9, 2024