Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync release with main for RHOAI 2.12 #110

Merged
merged 393 commits into from
Jul 26, 2024
Merged
Changes from 1 commit
Commits
Show all changes
393 commits
Select commit Hold shift + click to select a range
ae96ef8
[VLM] Calculate maximum number of multi-modal tokens by model (#6121)
DarkLight1337 Jul 4, 2024
a41357e
[VLM] Improve consistency between feature size calculation and dummy …
ywang96 Jul 5, 2024
ea4b570
[VLM] Cleanup validation and update docs (#6149)
DarkLight1337 Jul 5, 2024
0097bb1
[Bugfix] Use templated datasource in grafana.json to allow automatic …
frittentheke Jul 5, 2024
f1e15da
[Frontend] Continuous usage stats in OpenAI completion API (#5742)
jvlunteren Jul 5, 2024
e58294d
[Bugfix] Add verbose error if scipy is missing for blocksparse attent…
JGSweets Jul 5, 2024
abad574
bump version to v0.5.1 (#6157)
simon-mo Jul 5, 2024
79d406e
[Docs] Fix readthedocs for tag build (#6158)
simon-mo Jul 5, 2024
2de490d
Update wheel builds to strip debug (#6161)
simon-mo Jul 5, 2024
f025062
Fix release wheel build env var (#6162)
simon-mo Jul 5, 2024
bc96d5c
Move release wheel env var to Dockerfile instead (#6163)
simon-mo Jul 6, 2024
175c43e
[Doc] Reorganize Supported Models by Type (#6167)
ywang96 Jul 6, 2024
9389380
[Doc] Move guide for multimodal model and other improvements (#6168)
DarkLight1337 Jul 6, 2024
6206dcb
[Model] Add PaliGemma (#5189)
ywang96 Jul 7, 2024
333306a
add benchmark for fix length input and output (#5857)
haichuan1221 Jul 7, 2024
abfe705
[ Misc ] Support Fp8 via `llm-compressor` (#6110)
robertgshaw2-redhat Jul 7, 2024
3b08fe2
[misc][frontend] log all available endpoints (#6195)
youkaichao Jul 7, 2024
16620f4
do not exclude `object` field in CompletionStreamResponse (#6196)
kczimm Jul 8, 2024
717f4bc
Feature/add benchmark testing (#5947)
haichuan1221 Jul 8, 2024
f7a8fa3
[Kernel] reloading fused_moe config on the last chunk (#6210)
avshalomman Jul 8, 2024
543aa48
[Kernel] Correctly invoke prefill & decode kernels for cross-attentio…
afeldman-nm Jul 8, 2024
185ad31
[Bugfix] use diskcache in outlines _get_guide #5436 (#6203)
ericperfect Jul 8, 2024
ddc369f
[Bugfix] Mamba cache Cuda Graph padding (#6214)
tomeras91 Jul 8, 2024
4f0e0ea
Add FlashInfer to default Dockerfile (#6172)
simon-mo Jul 8, 2024
a3c9435
[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_dev…
youkaichao Jul 9, 2024
70c232f
[core][distributed] fix ray worker rank assignment (#6235)
youkaichao Jul 9, 2024
5d5b4c5
[Bugfix][TPU] Add missing None to model input (#6245)
WoosukKwon Jul 9, 2024
08c5bde
[Bugfix][TPU] Fix outlines installation in TPU Dockerfile (#6256)
WoosukKwon Jul 9, 2024
a0550cb
Add support for multi-node on CI (#5955)
khluu Jul 9, 2024
4d6ada9
[CORE] Adding support for insertion of soft-tuned prompts (#4645)
SwapnilDreams100 Jul 9, 2024
673dd4c
[Docs] Docs update for Pipeline Parallel (#6222)
andoorve Jul 9, 2024
d3a2451
[Bugfix]fix and needs_scalar_to_array logic check (#6238)
qibaoyuan Jul 9, 2024
2416b26
[Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978)
abhigoyal1997 Jul 10, 2024
da78cae
[core][distributed] zmq fallback for broadcasting large objects (#6183)
youkaichao Jul 10, 2024
5ed3505
[Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279)
WoosukKwon Jul 10, 2024
8a924d2
[Doc] Guide for adding multi-modal plugins (#6205)
DarkLight1337 Jul 10, 2024
e72ae80
[Bugfix] Support 2D input shape in MoE layer (#6287)
WoosukKwon Jul 10, 2024
c38eba3
[Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case.…
tdoublep Jul 10, 2024
b422d49
[CI/Build] Enable mypy typing for remaining folders (#6268)
Jul 10, 2024
44cc766
[Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296)
park12sj Jul 10, 2024
ae151d7
[Speculative Decoding] Enabling bonus token in speculative decoding f…
sroy745 Jul 10, 2024
997df46
[Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313)
WoosukKwon Jul 10, 2024
99ded1e
[Doc] Remove comments incorrectly copied from another project (#6286)
daquexian Jul 11, 2024
439c845
[Doc] Update description of vLLM support for CPUs (#6003)
DamonFool Jul 11, 2024
fc17110
[BugFix]: set outlines pkg version (#6262)
xiangyang-95 Jul 11, 2024
c4774eb
[Bugfix] Fix snapshot download in serving benchmark (#6318)
ywang96 Jul 11, 2024
3963a53
[Misc] refactor(config): clean up unused code (#6320)
aniaan Jul 11, 2024
546b101
[BugFix]: fix engine timeout due to request abort (#6255)
pushan01 Jul 11, 2024
8a1415c
[Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_mo…
tdoublep Jul 11, 2024
55f692b
[BugFix] get_and_reset only when scheduler outputs are not empty (#6266)
mzusman Jul 11, 2024
b675069
[ Misc ] Refactor Marlin Python Utilities (#6082)
robertgshaw2-redhat Jul 11, 2024
52b7fcb
Benchmark: add H100 suite (#6047)
simon-mo Jul 11, 2024
1df43de
[bug fix] Fix llava next feature size calculation. (#6339)
xwjiang2010 Jul 11, 2024
2d23b42
[doc] update pipeline parallel in readme (#6347)
youkaichao Jul 11, 2024
a4feba9
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeplo…
KuntaiDu Jul 11, 2024
7ed6a4f
[ BugFix ] Prompt Logprobs Detokenization (#6223)
robertgshaw2-redhat Jul 11, 2024
d6ab528
[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351)
LiuXiaoxuanPKU Jul 12, 2024
2b0fb53
[distributed][misc] be consistent with pytorch for libcudart.so (#6346)
youkaichao Jul 12, 2024
adf32e0
[Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349)
helena-intel Jul 12, 2024
d59eb98
[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343)
mgoin Jul 12, 2024
d26a8b3
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350)
adityagoel14 Jul 12, 2024
b6c16cf
[ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352)
hongxiayang Jul 12, 2024
6047187
[ Misc ] Remove separate bias add (#6353)
robertgshaw2-redhat Jul 12, 2024
f7160d9
[Misc][Bugfix] Update transformers for tokenizer issue (#6364)
ywang96 Jul 12, 2024
aea19f0
[ Misc ] Support Models With Bias in `compressed-tensors` integration…
robertgshaw2-redhat Jul 12, 2024
024ad87
[Bugfix] Fix dtype mismatch in PaliGemma (#6367)
DarkLight1337 Jul 12, 2024
f9d25c2
[Build/CI] Checking/Waiting for the GPU's clean state (#6379)
Alexei-V-Ivanov-AMD Jul 12, 2024
b039cbb
[Misc] add fixture to guided processor tests (#6341)
kevinbu233 Jul 12, 2024
b75bce1
[ci] Add grouped tests & mark tests to run by default for fastcheck p…
khluu Jul 12, 2024
4dbebd0
[ci] Add GHA workflows to enable full CI run (#6381)
khluu Jul 12, 2024
aa48e50
[MISC] Upgrade dependency to PyTorch 2.3.1 (#5327)
comaniac Jul 12, 2024
d719ba2
Build some nightly wheels by default (#6380)
simon-mo Jul 12, 2024
bb1a784
Fix release-pipeline.yaml (#6388)
simon-mo Jul 12, 2024
07b35af
Fix interpolation in release pipeline (#6389)
simon-mo Jul 12, 2024
21b2dce
Fix release pipeline's -e flag (#6390)
simon-mo Jul 12, 2024
75f64d8
[Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382)
comaniac Jul 12, 2024
111fc6e
[Misc] Add generated git commit hash as `vllm.__commit__` (#6386)
mgoin Jul 12, 2024
6bc9710
Fix release pipeline's dir permission (#6391)
simon-mo Jul 12, 2024
f8f9ff5
[Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397)
WoosukKwon Jul 12, 2024
16ff6bd
[ci] Fix wording for GH bot (#6398)
khluu Jul 12, 2024
a27f87d
[Doc] Fix Typo in Doc (#6392)
esaliya Jul 13, 2024
e1684a7
[Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373)
tdoublep Jul 13, 2024
d80aef3
[Docs] Clean up latest news (#6401)
WoosukKwon Jul 13, 2024
41708e5
[ci] try to add multi-node tests (#6280)
youkaichao Jul 13, 2024
9da4aad
Updating LM Format Enforcer version to v10.3 (#6411)
noamgat Jul 13, 2024
babf52d
[ Misc ] More Cleanup of Marlin (#6359)
robertgshaw2-redhat Jul 13, 2024
eeceada
[Misc] Add deprecation warning for beam search (#6402)
WoosukKwon Jul 13, 2024
fb6af8b
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
robertgshaw2-redhat Jul 14, 2024
540c036
[Model] Initialize Fuyu-8B support (#3924)
Isotr0py Jul 14, 2024
6ef3bf9
Remove unnecessary trailing period in spec_decode.rst (#6405)
terrytangyuan Jul 14, 2024
9dad5cc
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
tlrmchlsmth Jul 14, 2024
ccd3c04
[ci][build] fix commit id (#6420)
youkaichao Jul 14, 2024
73030b7
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
robertgshaw2-redhat Jul 14, 2024
dbfe254
[Feature] vLLM CLI (#5090)
EthanqX Jul 14, 2024
61e85db
[Doc] xpu backend requires running setvars.sh (#6393)
rscohn2 Jul 15, 2024
a754dc2
[CI/Build] Cross python wheel (#6394)
robertgshaw2-redhat Jul 15, 2024
ccb20db
[Bugfix] Benchmark serving script used global parameter 'args' in fun…
lxline Jul 15, 2024
32c9d7f
Report usage for beam search (#6404)
simon-mo Jul 15, 2024
9bfece8
Add FUNDING.yml (#6435)
simon-mo Jul 15, 2024
b47008b
[BugFix] BatchResponseData body should be optional (#6345)
zifeitong Jul 15, 2024
44874a0
[Doc] add env docs for flashinfer backend (#6437)
DefTruth Jul 15, 2024
69672f1
[core][distributed] simplify code to support pipeline parallel (#6406)
youkaichao Jul 15, 2024
de19916
[Bugfix] Convert image to RGB by default (#6430)
DarkLight1337 Jul 15, 2024
22e79ee
[doc][misc] doc update (#6439)
youkaichao Jul 15, 2024
6ae1597
[VLM] Minor space optimization for `ClipVisionModel` (#6436)
ywang96 Jul 15, 2024
94b82e8
[doc][distributed] add suggestion for distributed inference (#6418)
youkaichao Jul 15, 2024
c8fd97f
[Kernel] Use CUTLASS kernels for the FP8 layers with Bias (#6270)
tlrmchlsmth Jul 15, 2024
a63a4c6
[Misc] Use 0.0.9 version for flashinfer (#6447)
Pernekhan Jul 15, 2024
eaec4b9
[Bugfix] Add custom Triton cache manager to resolve MoE MP issue (#6…
tdoublep Jul 15, 2024
4ef95b0
[Bugfix] use float32 precision in samplers/test_logprobs.py for compa…
tdoublep Jul 15, 2024
64fdc08
bump version to v0.5.2 (#6433)
simon-mo Jul 15, 2024
4cf256a
[misc][distributed] fix pp missing layer condition (#6446)
youkaichao Jul 15, 2024
3dee97b
[Docs] Add Google Cloud to sponsor list (#6450)
WoosukKwon Jul 15, 2024
ec9933f
[Misc] Add CustomOp Interface to UnquantizedFusedMoEMethod (#6289)
WoosukKwon Jul 15, 2024
4552e37
[CI/Build][TPU] Add TPU CI test (#6277)
WoosukKwon Jul 15, 2024
d6f3b3d
Pin sphinx-argparse version (#6453)
khluu Jul 16, 2024
9ad32da
[BugFix][Model] Jamba - Handle aborted requests, Add tests and fix cl…
mzusman Jul 16, 2024
d92b3c5
[Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#…
g-eoj Jul 16, 2024
37d7766
[Docs] Announce 5th meetup (#6458)
WoosukKwon Jul 16, 2024
d970115
[CI/Build] vLLM cache directory for images (#6444)
DarkLight1337 Jul 16, 2024
7a3d2a5
[Frontend] Support for chat completions input in the tokenize endpoin…
sasha0552 Jul 16, 2024
7508a3d
[Misc] Fix typos in spec. decode metrics logging. (#6470)
tdoublep Jul 16, 2024
2bb0489
[Core] Use numpy to speed up padded token processing (#6442)
peng1999 Jul 16, 2024
38ef948
[CI/Build] Remove "boardwalk" image asset (#6460)
DarkLight1337 Jul 16, 2024
9f4ccec
[doc][misc] remind to cancel debugging environment variables (#6481)
youkaichao Jul 16, 2024
c467dff
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457)
WoosukKwon Jul 16, 2024
94162be
[Doc] Fix the lora adapter path in server startup script (#6230)
Jeffwan Jul 16, 2024
160e1d8
[Misc] Log spec decode metrics (#6454)
comaniac Jul 16, 2024
978aed5
[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and …
mgoin Jul 16, 2024
09c2eb8
[ci][distributed] add pipeline parallel correctness test (#6410)
youkaichao Jul 16, 2024
7f62077
[misc][distributed] improve tests (#6488)
youkaichao Jul 17, 2024
ce37be7
[misc][distributed] add seed to dummy weights (#6491)
youkaichao Jul 17, 2024
1d094fd
[Distributed][PP] only create embedding & lm head when necessary (#6455)
wushidonguc Jul 17, 2024
1038388
[ROCm] Cleanup Dockerfile and remove outdated patch (#6482)
hongxiayang Jul 17, 2024
a19e8d3
[Misc][Speculative decoding] Typos and typing fixes (#6467)
ShangmingCai Jul 17, 2024
5bf35a9
[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431)
DarkLight1337 Jul 17, 2024
5fa6e98
[Bugfix] Fix for multinode crash on 4 PP (#6495)
andoorve Jul 17, 2024
e09ce75
[TPU] Remove multi-modal args in TPU backend (#6504)
WoosukKwon Jul 17, 2024
a9a2e74
[Misc] Use `torch.Tensor` for type annotation (#6505)
WoosukKwon Jul 17, 2024
2fa4623
[Core] Refactor _prepare_model_input_tensors - take 2 (#6164)
comaniac Jul 17, 2024
a38524f
[DOC] - Add docker image to Cerebrium Integration (#6510)
milo157 Jul 17, 2024
5f0b993
[Bugfix] Fix Ray Metrics API usage (#6354)
Yard1 Jul 17, 2024
e76466d
[Core] draft_model_runner: Implement prepare_inputs on GPU for advanc…
alexm-redhat Jul 17, 2024
b5241e4
[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511)
varun-sundar-rabindranath Jul 18, 2024
b5af8c2
[Model] Pipeline parallel support for Mixtral (#6516)
comaniac Jul 18, 2024
18fecc3
[ Kernel ] Fp8 Channelwise Weight Support (#6487)
robertgshaw2-redhat Jul 18, 2024
1c27d25
[core][model] yet another cpu offload implementation (#6496)
youkaichao Jul 18, 2024
d25877d
[BugFix] Avoid secondary error in ShmRingBuffer destructor (#6530)
njhill Jul 18, 2024
61e5927
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
ruisearch42 Jul 18, 2024
8a74c68
[Misc] Minor patch for draft model runner (#6523)
comaniac Jul 18, 2024
e2fbaee
[BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227)
njhill Jul 18, 2024
c8a7d51
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Ge…
noamgat Jul 18, 2024
4634c87
[TPU] Refactor TPU worker & model runner (#6506)
WoosukKwon Jul 18, 2024
58ca663
[ Misc ] Improve Min Capability Checking in `compressed-tensors` (#6522)
robertgshaw2-redhat Jul 18, 2024
ecdb462
[ci] Reword Github bot comment (#6534)
khluu Jul 18, 2024
15c6a07
[Model] Support Mistral-Nemo (#6548)
mgoin Jul 18, 2024
2d4733b
Fix PR comment bot (#6554)
khluu Jul 18, 2024
f53b8f0
[ci][test] add correctness test for cpu offloading (#6549)
youkaichao Jul 18, 2024
4ffffcc
[Kernel] Implement fallback for FP8 channelwise using torch._scaled_m…
tlrmchlsmth Jul 18, 2024
1689219
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517)
tlrmchlsmth Jul 19, 2024
c5df56f
Add support for a rope extension method (#6553)
simon-mo Jul 19, 2024
b5672a1
[Core] Multiprocessing Pipeline Parallel support (#6130)
njhill Jul 19, 2024
d4201e0
[Bugfix] Make spec. decode respect per-request seed. (#6034)
tdoublep Jul 19, 2024
dbe5588
[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama…
robertgshaw2-redhat Jul 19, 2024
6366efc
[Bugfix][Frontend] Fix missing `/metrics` endpoint (#6463)
DarkLight1337 Jul 19, 2024
a921e86
[BUGFIX] Raise an error for no draft token case when draft_tp>1 (#6369)
wooyeonlee0 Jul 19, 2024
a5314e8
[Model] RowParallelLinear: pass bias to quant_method.apply (#6327)
tdoublep Jul 19, 2024
51f8aa9
[Bugfix][Frontend] remove duplicate init logger (#6581)
dtrifiro Jul 19, 2024
9ed82e7
[Misc] Small perf improvements (#6520)
Yard1 Jul 19, 2024
30efe41
[Docs] Update docs for wheel location (#6580)
simon-mo Jul 19, 2024
f0bbfaf
[Bugfix] [SpecDecode] AsyncMetricsCollector: update time since last c…
tdoublep Jul 19, 2024
07eb6f1
[bugfix][distributed] fix multi-node bug for shared memory (#6597)
youkaichao Jul 19, 2024
4cc24f0
[ Kernel ] Enable Dynamic Per Token `fp8` (#6547)
robertgshaw2-redhat Jul 19, 2024
45ceb85
[Docs] Update PP docs (#6598)
andoorve Jul 19, 2024
e81522e
[build] add ib in image for out-of-the-box infiniband support (#6599)
youkaichao Jul 20, 2024
2e26564
[ Kernel ] FP8 Dynamic Per Token Quant - Add scale_ub (#6593)
varun-sundar-rabindranath Jul 20, 2024
7bd8200
[Core] Allow specifying custom Executor (#6557)
Yard1 Jul 20, 2024
3f8d42c
Pipeline Parallel: Guard for KeyErrors at request abort (#6587)
tjohnson31415 Jul 20, 2024
9042d68
[Misc] Consolidate and optimize logic for building padded tensors (#6…
DarkLight1337 Jul 20, 2024
683e3cb
[ Misc ] `fbgemm` checkpoints (#6559)
robertgshaw2-redhat Jul 20, 2024
06d6c5f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update…
mawong-amd Jul 20, 2024
9364f74
[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models (#6606)
robertgshaw2-redhat Jul 20, 2024
f952bbc
[Misc] Fix input_scale typing in w8a8_utils.py (#6579)
mgoin Jul 20, 2024
082ecd8
[ Bugfix ] Fix AutoFP8 fp8 marlin (#6609)
robertgshaw2-redhat Jul 20, 2024
d7f4178
[Frontend] Move chat utils (#6602)
DarkLight1337 Jul 21, 2024
14f91fe
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding…
sroy745 Jul 21, 2024
b6df37f
[Misc] Remove abused noqa (#6619)
WoosukKwon Jul 21, 2024
25e778a
[Model] Refactor and decouple phi3v image embedding (#6621)
Isotr0py Jul 21, 2024
396d92d
[Kernel][Core] Add AWQ support to the Marlin kernel (#6612)
alexm-redhat Jul 21, 2024
c9eef37
[Model] Initial Support for Chameleon (#5770)
ywang96 Jul 22, 2024
42de2ce
[Misc] Add a wrapper for torch.inference_mode (#6618)
WoosukKwon Jul 22, 2024
89c1c6a
[Bugfix] Fix `vocab_size` field access in `llava_next.py` (#6624)
jaywonchung Jul 22, 2024
739b61a
[Frontend] Refactor prompt processing (#4028)
DarkLight1337 Jul 22, 2024
fea59c7
[Bugfix][Kernel] Use int64_t for indices in fp8 quant kernels (#6649)
tlrmchlsmth Jul 22, 2024
69d5ae3
[ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656)
khluu Jul 22, 2024
42c7f66
[Core] Support dynamically loading Lora adapter from HuggingFace (#6234)
Jeffwan Jul 22, 2024
5a96ee5
[ci][build] add back vim in docker (#6661)
youkaichao Jul 22, 2024
bdf5fd1
[Misc] Remove deprecation warning for beam search (#6659)
WoosukKwon Jul 23, 2024
e0c1575
[Core] Modulize prepare input and attention metadata builder (#6596)
comaniac Jul 23, 2024
c5e8330
[Bugfix] Fix null `modules_to_not_convert` in FBGEMM Fp8 quantizatio…
cli99 Jul 23, 2024
729171a
[Misc] Enable chunked prefill by default for long context models (#6666)
WoosukKwon Jul 23, 2024
7c2749a
[misc] add start loading models for users information (#6670)
youkaichao Jul 23, 2024
e519ae0
add tqdm when loading checkpoint shards (#6569)
zhaotyer Jul 23, 2024
9e0b558
[Misc] Support FP8 kv cache scales from compressed-tensors (#6528)
mgoin Jul 23, 2024
c051bfe
[doc][distributed] doc for setting up multi-node environment (#6529)
youkaichao Jul 23, 2024
97234be
[Misc] Manage HTTP connections in one place (#6600)
DarkLight1337 Jul 23, 2024
c520124
[misc] only tqdm for first rank (#6672)
youkaichao Jul 23, 2024
22fa2e3
[VLM][Model] Support image input for Chameleon (#6633)
ywang96 Jul 23, 2024
3eda4ec
support ignore patterns in model loader (#6673)
simon-mo Jul 23, 2024
bb2fc08
Bump version to v0.5.3 (#6674)
simon-mo Jul 23, 2024
cb1362a
[Docs] Announce llama3.1 support (#6688)
WoosukKwon Jul 23, 2024
71950af
[doc][distributed] fix doc argument order (#6691)
youkaichao Jul 23, 2024
461089a
[Bugfix] Fix a log error in chunked prefill (#6694)
WoosukKwon Jul 23, 2024
a112a84
[BugFix] Fix RoPE error in Llama 3.1 (#6693)
WoosukKwon Jul 23, 2024
38c4b7e
Bump version to 0.5.3.post1 (#6696)
simon-mo Jul 23, 2024
1668192
chore: add fork OWNERS
z103cb Apr 30, 2024
cc99216
add ubi Dockerfile
dtrifiro May 21, 2024
d15b373
Dockerfile.ubi: remove references to grpc/protos
dtrifiro May 21, 2024
bc7dccc
Dockerfile.ubi: use vllm-tgis-adapter
dtrifiro May 28, 2024
adc357d
gha: add sync workflow
dtrifiro Jun 3, 2024
e9a9553
Dockerfile.ubi: use distributed-executor-backend=mp as default
dtrifiro Jun 10, 2024
43c7876
Dockerfile.ubi: remove vllm-nccl workaround
dtrifiro Jun 13, 2024
2648a1f
Dockerfile.ubi: add missing requirements-*.txt bind mounts
dtrifiro Jun 18, 2024
510aa47
add triton CustomCacheManger
tdoublep May 29, 2024
910a985
gha: sync-with-upstream workflow create PRs as draft
dtrifiro Jun 19, 2024
3a99c2d
add smoke/unit tests scripts
dtrifiro Jun 19, 2024
f29efce
extras: exit unit tests on err
dtrifiro Jun 20, 2024
6efc7b0
Dockerfile.ubi: misc improvements
dtrifiro May 28, 2024
3bb9e9f
update OWNERS
dtrifiro Jun 21, 2024
88a0456
Dockerfile.ubi: use tensorizer (#64)
prashantgupta24 Jun 25, 2024
e15634d
Dockerfile.ubi: pin vllm-tgis-adapter to 0.1.2
dtrifiro Jun 26, 2024
b2fd1af
gha: fix fetch step in upstream sync workflow
dtrifiro Jul 2, 2024
fd4204b
gha: always update sync workflow PR body/title
dtrifiro Jul 2, 2024
8551e8f
Dockerfile.ubi: bump vllm-tgis-adapter to 0.1.3
dtrifiro Jul 3, 2024
5fe6a00
Dockerfile.ubi: get rid of --distributed-executor-backend=mp
dtrifiro Jul 10, 2024
f9ae74b
Dockerfile.ubi: add flashinfer
dtrifiro Jul 9, 2024
280bc9f
pin adapter to 2.0.0
prashantgupta24 Jul 12, 2024
b92b6d6
deps: bump flashinfer to 0.0.9
dtrifiro Jul 15, 2024
afd1436
Update OWNERS with IBM folks
heyselbi Jun 27, 2024
1a74d61
Dockerfile.ubi: bind mount .git dir to allow inclusion of git commit …
dtrifiro Jul 17, 2024
d05d51f
gha: remove reminder_comment
dtrifiro Jul 17, 2024
97cd508
Dockerfile: bump vllm-tgis-adapter to 0.2.1
dtrifiro Jul 18, 2024
08a7f70
fix: update setup.py to differentiate between fork and upstream
nathan-weinberg Jul 18, 2024
242ea7e
Dockerfile.ubi: properly mount .git dir
dtrifiro Jul 19, 2024
76aa5cf
Revert "[CI/Build] fix: update setup.py to differentiate between fork…
dtrifiro Jul 19, 2024
61207a7
Dockerfile.ubi: bump vllm-tgis-adapter to 0.2.2
dtrifiro Jul 19, 2024
3c182aa
gha: remove unused upstream workflows
dtrifiro Jul 23, 2024
d379e0a
deps: bump vllm-tgis-adapter to 0.2.3
dtrifiro Jul 24, 2024
7a21f52
Dockerfile.ubi: get rid of custom cache manager
dtrifiro Jul 24, 2024
b145c20
Merge branch 'release' into sync-release-with-main
dtrifiro Jul 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update OWNERS with IBM folks
heyselbi authored and dtrifiro committed Jul 23, 2024
commit afd1436ba87718fec1246716e99fc05191eec682
16 changes: 13 additions & 3 deletions OWNERS
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
approvers:
- dtrifiro
- fialhocoelho
- heyselbi
- rpancham
- joerunde
- maxdebayser
- njhill
- prashantgupta24
- RH-steve-grubb
- rpancham
- terrytangyuan
- vaibhavjainwiz
- Xaenalt
- z103cb
- Xaenalt
reviewers:
- dtrifiro
- fialhocoelho
- heyselbi
- rpancham
- joerunde
- maxdebayser
- njhill
- prashantgupta24
- RH-steve-grubb
- rpancham
- terrytangyuan
- vaibhavjainwiz
- Xaenalt