-
Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: huggingface/text-generation-inference
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix the tool_choice format for named choice by adapting OpenAIs scheme
#2634
opened Oct 10, 2024 by
linusbierhoff
Loading…
propagate caught TERM/INT/HUP signals from entrypoint to tgi child process
#2633
opened Oct 10, 2024 by
oOraph
Loading…
Make moe-kernels and marlin-kernels mandatory in CUDA installs
#2632
opened Oct 10, 2024 by
danieldk
Loading…
5 tasks
feat: add basic test for the warmup step and memory allocation of the…
#2629
opened Oct 9, 2024 by
drbh
Loading…
[DOCS] Add Google Cloud TGI integration via dedicated DLCs
#2612
opened Oct 5, 2024 by
alvarobartt
Loading…
1 of 5 tasks
feat: propagate max_concurrent_requests to queue state entries instead of hardcoded value in v2 and v3 backends
#2578
opened Sep 26, 2024 by
Venkat2811
Loading…
feat: enable pytorch xpu support for non-attention models
#2561
opened Sep 24, 2024 by
dvrogozh
Loading…
CI for add gptq and awq int4 support in intel platform
#2494
opened Sep 5, 2024 by
ErikKaum
Loading…
fix: skip cuda graphs that will oom and improve free memory logging
#2450
opened Aug 22, 2024 by
drbh
Loading…
add gptq and awq int4 support in intel platform
#2444
opened Aug 22, 2024 by
sywangyi
Loading…
5 tasks
[TENSORRT-LLM] - Implement new looper thread based backend
#2357
opened Aug 2, 2024 by
mfuntowicz
•
Draft
Previous Next
ProTip!
Filter pull requests by the default branch with base:main.