Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge back #45

Closed
wants to merge 801 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
801 commits
Select commit Hold shift + click to select a range
0e97af4
Updating tokenizers. (#1517)
Narsil Feb 1, 2024
3ab578b
[docs] Fix link to Install CLI (#1526)
pcuenca Feb 2, 2024
0da00be
feat: add ie update to message docs (#1523)
drbh Feb 2, 2024
1734540
feat: use existing add_generation_prompt variable from config in temp…
drbh Feb 7, 2024
bd405e0
Impl simple mamba model (#1480)
drbh Feb 8, 2024
39af000
Update to peft 0.8.2 (#1537)
Stillerman Feb 8, 2024
09b7c26
feat(server): add frequency penalty (#1541)
OlivierDehaene Feb 8, 2024
c5ef81b
chore: bump ci rust version (#1543)
drbh Feb 9, 2024
a4e5801
ROCm AWQ support (#1514)
IlyasMoutawwakil Feb 9, 2024
5321463
feat(router): add max_batch_size (#1542)
OlivierDehaene Feb 9, 2024
0d794af
feat: experimental support for cuda graphs (#1428)
OlivierDehaene Feb 12, 2024
246ad39
feat: add deserialize_with that handles strings or objects with conte…
drbh Feb 13, 2024
6f68bb1
Fixing glibc version in the runtime. (#1556)
Narsil Feb 13, 2024
7671a41
Upgrade intermediary layer for nvidia too. (#1557)
Narsil Feb 13, 2024
d6b0fb9
Improving mamba runtime by using updates (#1552)
Narsil Feb 14, 2024
4c2848b
Small cleanup. (#1560)
Narsil Feb 14, 2024
cef0553
Outlines guided generation (#1539)
drbh Feb 15, 2024
c55abac
Added `name` field to OpenAI compatible API Messages (#1563)
amihalik Feb 15, 2024
142cdab
Bugfix: eos and bos tokens positions are inconsistent (#1567)
amihalik Feb 16, 2024
9946165
chore: add pre-commit (#1569)
OlivierDehaene Feb 16, 2024
0f2daad
feat: add chat template struct to avoid tuple ordering errors (#1570)
OlivierDehaene Feb 16, 2024
4139054
v1.4.1 (#1568)
OlivierDehaene Feb 16, 2024
d19c768
Fix mistral with length > window_size for long prefills (rotary doesn…
Narsil Feb 19, 2024
df23062
improve endpoint support (#1577)
drbh Feb 20, 2024
c9f4c1a
fix: refactor syntax to correctly include structs (#1580)
drbh Feb 20, 2024
fa8a8e0
fix(router): fix openapi and add jsonschema validation (#1578)
OlivierDehaene Feb 21, 2024
c86f58d
feat: add support for Gemma (#1583)
OlivierDehaene Feb 21, 2024
9c1cb81
v1.4.2 (#1585)
OlivierDehaene Feb 21, 2024
010508c
fix: fix openapi schema (#1586)
OlivierDehaene Feb 21, 2024
ac5a1c6
fix: avoid default message (#1579)
drbh Feb 22, 2024
bf700e7
Revamp medusa implementation so that every model can benefit. (#1588)
Narsil Feb 26, 2024
9b6db5f
Support tools (#1587)
drbh Feb 28, 2024
910d0a9
Fixing x-compute-time. (#1606)
Narsil Feb 28, 2024
97e2236
Fixing guidance docs. (#1607)
Narsil Feb 28, 2024
b40e833
feat: starcoder2 (#1605)
OlivierDehaene Feb 28, 2024
26cdea5
feat: Qwen2 (#1608)
OlivierDehaene Feb 28, 2024
e6bb3ff
v1.4.3 (#1609)
OlivierDehaene Feb 28, 2024
343aa7a
fix: Handle concurrent grammar requests (#1610)
drbh Feb 29, 2024
5a3903b
Fix idefics default. (#1614)
Narsil Feb 29, 2024
9ed4d2c
Fix async client timeout (#1617)
hugoabonizio Feb 29, 2024
3dd7da2
feat: accept legacy request format and response (#1527)
drbh Feb 29, 2024
7e08751
fix: add missing stop parameter for chat request (#1619)
drbh Mar 1, 2024
7dbaf9e
fix: correctly index into mask when applying grammar (#1618)
drbh Mar 1, 2024
d3711a6
Use a better model for the quick tour (#1639)
lewtun Mar 12, 2024
8a5bcba
Upgrade nix version from 0.27.1 to 0.28.0 (#1638)
yuanwu2017 Mar 12, 2024
0d9917f
Update peft + transformers + accelerate + bnb + safetensors (#1646)
abhishekkrthakur Mar 15, 2024
23fba67
Fix index in ChatCompletionChunk (#1648)
Wauplin Mar 16, 2024
0d72af5
Fixing minor typo in documentation: supported hardware section (#1632)
SachinVarghese Mar 18, 2024
dfbd9a3
feat: bump minijina and add test for core templates (#1626)
drbh Mar 20, 2024
6f15ac6
feat: support force downcast after FastRMSNorm multiply for Gemma (#1…
drbh Mar 21, 2024
4f09c80
fix: prefer spaces url over temp url (#1662)
drbh Mar 21, 2024
de6cb15
fix: improve tool type, bump pydantic and outlines (#1650)
drbh Mar 21, 2024
ed29d6e
Remove unecessary cuda graph. (#1664)
Narsil Mar 21, 2024
deb440b
Repair idefics integration tests. (#1663)
Narsil Mar 21, 2024
08e9181
feat: update client to 0.7 (#1667)
OlivierDehaene Mar 22, 2024
66914f7
fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py (#1637)
SeongBeomLEE Mar 22, 2024
f171bdc
Inline images for multimodal models. (#1666)
Narsil Mar 22, 2024
1e9bcd9
feat: cohere (#1660)
OlivierDehaene Mar 22, 2024
6c4496a
v1.4.4 (#1668)
OlivierDehaene Mar 22, 2024
818aee3
fix: adjust logprob response logic (#1682)
drbh Mar 28, 2024
762dbf3
fix: handle batches with and without grammars (#1676)
drbh Mar 28, 2024
f04255c
feat: Add dbrx support (#1685)
OlivierDehaene Mar 29, 2024
4ee0a0c
v1.4.5 (#1686)
OlivierDehaene Mar 29, 2024
99874ea
Add cuda graphs sizes and make it default. (#1703)
Narsil Apr 4, 2024
c7e570e
Pickle conversion now requires `--trust-remote-code`. (#1704)
Narsil Apr 5, 2024
5062fda
Push users to streaming in the readme. (#1698)
Narsil Apr 5, 2024
f9958ee
Fixing cohere tokenizer. (#1697)
Narsil Apr 5, 2024
8dca3b0
Force weights_only (before fully breaking pickle files anyway). (#1710)
Narsil Apr 5, 2024
53c2c3d
Regenerate ld.so.cache (#1708)
oOraph Apr 8, 2024
ff42d33
Revert license to Apache 2.0 (#1714)
OlivierDehaene Apr 8, 2024
106d8ee
Automatic quantization config. (#1719)
Narsil Apr 9, 2024
4634b00
Adding Llava-Next (Llava 1.6) with full support. (#1709)
Narsil Apr 9, 2024
ad9d628
fix: fix CohereForAI/c4ai-command-r-plus (#1707)
OlivierDehaene Apr 10, 2024
30620a9
hotfix: mixtral
OlivierDehaene Apr 10, 2024
10d9083
Update libraries (#1713)
abhishekkrthakur Apr 11, 2024
b83aab9
Easier defaults for models stemmed from configs.
Narsil Apr 11, 2024
842f665
Revert "Easier defaults for models stemmed from configs."
Narsil Apr 11, 2024
c2fd35d
Dev/mask ldconfig output v2 (#1716)
oOraph Apr 11, 2024
408dbc4
Fp8 Support (#1726)
Narsil Apr 12, 2024
6c2c44b
Upgrade EETQ (Fixes the cuda graphs). (#1729)
Narsil Apr 12, 2024
c2c9872
fix(router): fix a possible deadlock in next_batch (#1731)
OlivierDehaene Apr 12, 2024
9d8f21c
chore(cargo-toml): apply lto fat and codegen-units of one (#1651)
somehowchris Apr 12, 2024
1b2670c
Improve the defaults for the launcher (#1727)
Narsil Apr 12, 2024
eefea5e
feat: medusa v2 (#1734)
OlivierDehaene Apr 12, 2024
275caa0
Fix typo in guidance.md (#1735)
eltociear Apr 12, 2024
c38a7d7
v2.0.0 (#1736)
OlivierDehaene Apr 12, 2024
88702d8
Fixing CI. (#1748)
Narsil Apr 15, 2024
7276d43
feat: improve tools to include name and add tests (#1693)
drbh Apr 16, 2024
00f3653
Update response type for `/v1/chat/completions` and `/v1/completions`…
Wauplin Apr 16, 2024
e4d31a4
fix: bump clients test base url to llama (#1751)
drbh Apr 16, 2024
06c3d4b
feat: accept list as prompt and use first string (#1702)
drbh Apr 17, 2024
f9ee2c4
Upgrading all versions. (#1759)
Narsil Apr 18, 2024
2d0a717
v2.0.1
OlivierDehaene Apr 18, 2024
26b3916
Make `--cuda-graphs` work as expected (bis) (#1768)
fxmarty Apr 22, 2024
ed72e92
fix typos in docs and add small clarifications (#1790)
MoritzLaurer Apr 22, 2024
455cada
Add attribute descriptions for `GenerateParameters` (#1798)
Wauplin Apr 23, 2024
9be1db3
feat: allow null eos and bos tokens in config (#1791)
drbh Apr 23, 2024
986b404
Phi3 support (#1797)
Narsil Apr 23, 2024
bfddfa5
Idefics2. (#1756)
Narsil Apr 23, 2024
23d82b8
fix: avoid frequency and repetition penalty on padding tokens (#1765)
drbh Apr 23, 2024
4c698fa
Adding support for `HF_HUB_OFFLINE` support in the router. (#1789)
Narsil Apr 23, 2024
0acac5c
feat: improve temperature logic in chat (#1749)
drbh Apr 25, 2024
fccf5ed
Updating the benchmarks so everyone uses openai compat layer. (#1800)
Narsil Apr 25, 2024
eb08b9f
Update guidance docs to reflect grammar support in API (#1775)
dr3s Apr 25, 2024
ee47973
Use the generation config. (#1808)
Narsil Apr 25, 2024
bbc547a
2nd round of benchmark modifications (tiny adjustements to avoid over…
Narsil Apr 26, 2024
f9cf345
Adding new env variables for TPU backends. (#1755)
Narsil Apr 26, 2024
45ecf9d
add intel xpu support for TGI (#1475)
sywangyi Apr 26, 2024
a8fd423
Blunder (#1815)
Narsil Apr 26, 2024
8b8e8f6
Fixing qwen2. (#1818)
Narsil Apr 26, 2024
e9f03f8
Dummy CI run. (#1817)
Narsil Apr 26, 2024
007d5e5
Changing the waiting_served_ratio default (stack more aggressively by…
Narsil Apr 28, 2024
eade737
Better graceful shutdown. (#1827)
Narsil Apr 29, 2024
f75c1a5
Prepare release.
Narsil Apr 30, 2024
51ee60d
Add the missing `tool_prompt` parameter to Python client (#1825)
maziyarpanahi Apr 30, 2024
04d4765
Small CI cleanup. (#1801)
Narsil Apr 30, 2024
743ecbc
Add reference to TPU support (#1760)
brandonroyal Apr 30, 2024
8332fc4
fix: use get_speculate to the number of layers (#1737)
OlivierDehaene Apr 30, 2024
f661508
feat: add how it works section (#1773)
drbh Apr 30, 2024
9192de5
Fixing frequency penalty (#1811)
martinigoyanes Apr 30, 2024
b2c9827
feat: add vlm docs and simple examples (#1812)
drbh Apr 30, 2024
c99ecd7
Handle images in chat api (#1828)
drbh Apr 30, 2024
b4ef038
chore: update torch (#1730)
OlivierDehaene Apr 30, 2024
dccab72
(chore): torch 2.3.0 (#1833)
Narsil Apr 30, 2024
6073ece
fix: split docs and start conceptual page (#1836)
drbh May 1, 2024
27b3a2c
Fix: "Fixing" double BOS for mistral too. (#1843)
Narsil May 1, 2024
0038e60
Adding scripts to prepare load data. (#1841)
Narsil May 1, 2024
de079d6
Remove misleading warning (not that important nowadays anyway). (#1848)
Narsil May 2, 2024
65539b7
feat: prefer huggingface_hub in docs and show image api (#1844)
drbh May 2, 2024
a257371
Updating Phi3 (long context). (#1849)
Narsil May 2, 2024
bb2b295
Add router name to /info endpoint (#1854)
Wauplin May 3, 2024
ac7076b
Upgrading to rust 1.78. (#1851)
Narsil May 6, 2024
59b3ffe
update xpu docker image and use public ipex whel (#1860)
sywangyi May 6, 2024
fd89d9d
Refactor layers. (#1866)
Narsil May 13, 2024
d348d2b
Granite support? (#1882)
Narsil May 13, 2024
3136f27
Add: Support for the Falcon2 11B architecture (#1886)
Nilabhra May 14, 2024
e3d7656
MLPSpeculator. (#1865)
Narsil May 14, 2024
33bc721
Fixing truncation. (#1890)
Narsil May 14, 2024
92f1338
Correct 'using guidance' link (#1892)
brandon-lockaby May 14, 2024
b5bc6e5
Add GPT-2 with flash attention (#1889)
danieldk May 15, 2024
a70b087
Removing accepted ids in the regular info logs, downgrade to debug. (…
Narsil May 15, 2024
a69ef52
feat: add deprecation warning to clients (#1855)
drbh May 15, 2024
6c715f8
[Bug Fix] Update torch import reference in bnb quantization (#1902)
DhruvSrikanth May 15, 2024
40213c9
Pali gemma modeling (#1895)
drbh May 16, 2024
d8402ea
OpenAI function calling compatible support (#1888)
phangiabao98 May 16, 2024
f5d4341
Fixing types. (#1906)
Narsil May 16, 2024
b3dd390
Types. (#1909)
Narsil May 16, 2024
3b5d93e
Fixing signals. (#1910)
Narsil May 16, 2024
a60fa84
Removing some unused code. (#1915)
Narsil May 17, 2024
232e8d5
MI300 compatibility (#1764)
fxmarty May 17, 2024
c4cf8b4
Add TGI monitoring guide through Grafana and Prometheus (#1908)
fxmarty May 17, 2024
422bf1f
Update grafana template (#1918)
fxmarty May 17, 2024
b5f1c9d
Fix TunableOp bug (#1920)
fxmarty May 17, 2024
5dad0c0
Fix TGI issues with ROCm (#1921)
fxmarty May 17, 2024
f871f11
Fixing the download strategy for ibm-fms (#1917)
Narsil May 18, 2024
293b812
ROCm: make CK FA2 default instead of Triton (#1924)
fxmarty May 20, 2024
904ff36
docs: Fix grafana dashboard url (#1925)
edwardzjl May 21, 2024
fc0eaff
feat: include token in client test like server tests (#1932)
drbh May 22, 2024
2f243a1
Creating doc automatically for supported models. (#1929)
Narsil May 22, 2024
efb73fc
fix: use path inside of speculator config (#1935)
drbh May 22, 2024
a103e3e
feat: add train medusa head tutorial (#1934)
drbh May 23, 2024
f41d644
reenable xpu for tgi (#1939)
sywangyi May 23, 2024
f4a073a
Fixing some legacy behavior (big swapout of serverless on legacy stuf…
Narsil May 23, 2024
629047c
Add completion route to client and add stop parameter where it's miss…
thomas-schillaci May 23, 2024
9546534
Improving the logging system. (#1938)
Narsil May 23, 2024
cff472b
Fixing codellama loads by using purely `AutoTokenizer`. (#1947)
Narsil May 24, 2024
d32e33b
Fix seeded output. (#1949)
Narsil May 24, 2024
9231098
Fix (flash) Gemma prefix and enable tests
danieldk May 24, 2024
a401c83
Fix GPTQ for models which do not have float16 at the default dtype (s…
danieldk May 27, 2024
0732b9d
Processor config chat template (#1954)
drbh May 27, 2024
b7ffa28
fix small typo and broken link (#1958)
MoritzLaurer May 27, 2024
e76b982
Upgrade to Axum 0.7 and Hyper 1.0 (Breaking change: disabled ngrok tu…
Narsil May 28, 2024
f20463e
Fix (non-container) pytest stdout buffering-related lock-up
danieldk May 28, 2024
612bc48
Fixing the text part from tokenizer endpoint. (#1967)
Narsil May 28, 2024
cbced7f
feat: adjust attn weight loading logic (#1975)
drbh May 29, 2024
36dd160
Add support for exl2 quantization
danieldk May 28, 2024
967ced2
Gemma GPTQ checks: skip logprob checks
danieldk May 30, 2024
659bd67
Update documentation version to 2.0.4 (#1980)
fxmarty May 31, 2024
06edde9
Purely refactors paged/attention into `layers/attention` and make har…
Narsil May 31, 2024
5ab4cef
Fixing exl2 scratch buffer. (#1990)
Narsil May 31, 2024
08b3eac
single char ` addition for docs (#1989)
nbroad1881 May 31, 2024
79402fb
Rest API to download lora adapter on router
tjluyao May 31, 2024
799a193
Fixing Phi3.
Narsil Jun 1, 2024
650c743
directly merge from tgi
rainj-me Jun 2, 2024
7243638
fix the lora-id parameter in the benchmark
rainj-me Jun 2, 2024
f125e73
Merge pull request #23 from mlsys-io/reorder-codebase
tjluyao Jun 2, 2024
40a70bc
Update README.md
tjluyao Jun 2, 2024
47f4685
add placeholder for flashinfer phi modeling (#24)
ag-flex-2024 Jun 2, 2024
e7fb9b9
integrate lora intommistral
PeterYaoNYU Jun 3, 2024
72d74cf
Update Makefile to include punica kernels
tjluyao Jun 3, 2024
e6af233
Integrate qwen2
NovTi Jun 3, 2024
80d4a60
Fix minor typos
NovTi Jun 4, 2024
48b5053
testing llama-3-70b-gptq
tjluyao Jun 4, 2024
5935cce
add lora functions to python client; test llama-3-70b AWQ
tjluyao Jun 5, 2024
482ef98
Add qwen2 1.8b and 72b base inference
NovTi Jun 5, 2024
7dda533
Support Flashinfer based Phi2 and Phi3 models (#26)
ag-flex-2024 Jun 8, 2024
3956e46
Refactor the Flashinfer models (#27)
ag-flex-2024 Jun 9, 2024
a814437
Introduce the flashinfer attention wrapper abstraction and use it for…
ag-flex-2024 Jun 10, 2024
d58a35e
Compliant for pre-commit configs
tjluyao Jun 10, 2024
4757af8
kv.run test workflow
tjluyao Jun 10, 2024
9ec483d
kv.run test workflows (#29)
tjluyao Jun 10, 2024
9dd3b75
Kv.run test workflows (#30)
tjluyao Jun 11, 2024
6c96fdd
Llama rewrite (#31)
ag-flex-2024 Jun 11, 2024
e0cd4a6
reformat the llama files (#32)
ag-flex-2024 Jun 11, 2024
b599cc6
Decouple flashinfer code paths from flash attention library dependenc…
ag-flex-2024 Jun 11, 2024
6010fad
critical output bug (#25)
tjluyao Jun 11, 2024
b7c8735
minor typo
tjluyao Jun 11, 2024
b821d68
bug fix in layers/__init__.py
tjluyao Jun 11, 2024
8ae802c
fix dtype bugs in flashinfer model def
tjluyao Jun 11, 2024
e61ea77
minor fixes and rename tests.xml
tjluyao Jun 13, 2024
c7613eb
test docker (#34)
tjluyao Jun 14, 2024
e263ba8
fix warm up issue
alfredgui2 Jun 14, 2024
a4802b7
docker build workflow; remove submodules (#35)
tjluyao Jun 14, 2024
e49f754
remove tgi build workflow
tjluyao Jun 14, 2024
e8f9ff4
docker workflow
tjluyao Jun 14, 2024
66d2723
docker workflow
tjluyao Jun 14, 2024
83fc271
build workflow update
tjluyao Jun 14, 2024
de58365
fix in workflow
tjluyao Jun 14, 2024
85f34cb
Merge pull request #36 from mlsys-io/fix_warm
alfredgui2 Jun 14, 2024
93edec5
dependency and rust toolchain fix
tjluyao Jun 14, 2024
fa2f2f2
Merge branch 'master' of github.com:mlsys-io/kv.run
tjluyao Jun 14, 2024
b8a4785
finalize docker build workflow
tjluyao Jun 14, 2024
868d3f2
minor router-server fix
tjluyao Jun 15, 2024
ad40a17
flash attn rotary
alfredgui2 Jun 16, 2024
6c4fa6e
minor fixes
tjluyao Jun 16, 2024
7a93d84
update to rust 1.79
tjluyao Jun 16, 2024
e0feabb
fix rotary bug
alfredgui2 Jun 17, 2024
da84f6b
fixes
alfredgui2 Jun 17, 2024
1e2bf10
fix the flashinfer adapter
alfredgui2 Jun 17, 2024
31ad6bd
merge master
alfredgui2 Jun 17, 2024
b45e896
fix phi2 and phi3 modeling
alfredgui2 Jun 17, 2024
7dfa57d
empty
alfredgui2 Jun 17, 2024
c51e36e
fix lint
alfredgui2 Jun 17, 2024
08fde0f
revert test file
alfredgui2 Jun 17, 2024
6aaab88
Merge pull request #38 from mlsys-io/flash_attn_rotary
alfredgui2 Jun 17, 2024
0ba0ac9
minor fix in output example
tjluyao Jun 17, 2024
4a40c64
Merge branch 'master' of github.com:mlsys-io/kv.run
tjluyao Jun 17, 2024
f0d3664
adjust the flashinfer llama model to accomodate baichuan
alfredgui2 Jun 20, 2024
9b3c098
Merge pull request #40 from mlsys-io/add_baichuan
alfredgui2 Jun 20, 2024
2311872
decouple flashinfer files from flash attention (#41)
alfredgui2 Jun 22, 2024
8d3dd48
Fix the server CLI issue with use_flashinfer flag (#42)
alfredgui2 Jun 24, 2024
fa213e2
update FlashinferAttentionWrapper to flashinfer 0.0.6
MichaelYuan2 Jun 25, 2024
9da076d
minor fix in makefile
tjluyao Jul 1, 2024
4edacd5
update submodules
tjluyao Jul 1, 2024
d099bbb
update submodules
tjluyao Jul 1, 2024
9fafffc
update mistral flashinfer
PeterYaoNYU Jul 1, 2024
b9838c5
Add ChatGLM and refactor Qwen2
NovTi Jul 1, 2024
466b0a6
Add the batch concatenation functionality for flashinfer server (#43)
alfredgui2 Jul 2, 2024
f355733
bug fixes
tjluyao Jul 6, 2024
6adf978
Fix the decoding logic in test_local_grpc.py (#44)
alfredgui2 Jul 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
aml
target
server/transformers
server/flash-attention
67 changes: 67 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve text-generation-inference
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: |
Please share your system info with us (`text-generation-launcher --env` if installed locally).
The full command line used that causes issues:
OS version:
Rust version (if self-compiling, `cargo version`):
Model being used (`curl 127.0.0.1:8080/info | jq`):
If local model please explicit the kind of model and/or equivalents.
Hardware used (GPUs, how many, on which cloud) (`nvidia-smi`):
Deployment specificities (Kubernetes, EKS, AKS, any particular deployments):
The current version being used:

placeholder: text-generation-inference version, platform, python version, ...
validations:
required: true

- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "Docker"
- label: "The CLI directly"

- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The thing I am working on is:"
options:
- label: "An officially supported command"
- label: "My own modifications"

- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.

placeholder: |
Steps to reproduce the behavior:

1.
2.
3.


- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: true
version: 2.1
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new text-generation-inference feature
labels: [ "feature" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.

- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.


- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/huggingface/text-generation-inference/blob/main/CONTRIBUTING.md)
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/new-model-addition.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: "\U0001F31F New model addition"
description: Submit a proposal/request to implement a new model
labels: [ "New model" ]

body:
- type: textarea
id: description-request
validations:
required: true
attributes:
label: Model description
description: |
Put any and all important information relative to the model

- type: checkboxes
id: information-tasks
attributes:
label: Open source status
description: |
Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `transformers`.
options:
- label: "The model implementation is available"
- label: "The model weights are available"

- type: textarea
id: additional-info
attributes:
label: Provide useful links for the implementation
description: |
Please provide information regarding the implementation, the weights, and the authors.
Please mention the authors by @gh-username if you're aware of their usernames.
40 changes: 40 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet though.

Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.

Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.

Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @


@OlivierDehaene OR @Narsil

-->
20 changes: 20 additions & 0 deletions .github/workflows/autodocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Automatic Documentation for Launcher

on:
pull_request:

jobs:
update_docs:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Install Launcher
id: install-launcher
run: cargo install --path launcher/
- name: Check launcher Docs are up-to-date
run: |
echo text-generation-launcher --help
python update_doc.py --check
100 changes: 100 additions & 0 deletions .github/workflows/build-kvrun.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
name: Build and push kv.run docker image

on:
workflow_dispatch:

jobs:
build-and-push-image:
runs-on: [self-hosted, Linux, X64]

concurrency:
group: ${{ github.workflow }}-build-and-push-image-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

permissions:
contents: write
packages: write
# This is used to complete the identity challenge
# with sigstore/fulcio when running outside of PRs.
id-token: write
security-events: write

steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Initialize Docker Buildx
uses: docker/setup-buildx-action@v2.0.0
with:
install: true
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4.4.1
- name: Login to GitHub Container Registry
if: github.event_name != 'pull_request'
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4.3.0
with:
flavor: |
latest=auto
images: |
ghcr.io/${{env.GITHUB_REPOSITORY}}
tags: |
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=raw,value=latest,enable=${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}
type=raw,value=sha-${{ env.GITHUB_SHA_SHORT }}
- name: Build and push Docker image
id: build-and-push
uses: docker/build-push-action@v5
with:
context: .
file: Dockerfile_kvrun
push: true
platforms: 'linux/amd64'
build-args: |
GIT_SHA=${{ env.GITHUB_SHA }}
DOCKER_LABEL=sha-${{ env.GITHUB_SHA_SHORT }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
#cache-from: type=gha
#cache-to: type=gha,mode=max

# integration-tests:
# runs-on: [self-hosted, Linux, X64]
#
# concurrency:
# group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }}
# cancel-in-progress: true
#
# needs:
# - build-and-push-image # Wait for the docker image to be built
#
# env:
# DOCKER_VOLUME: /cache
#
# steps:
# - uses: actions/checkout@v2
# - name: Inject slug/short variables
# uses: rlespinasse/github-slug-action@v4.4.1
# - name: Set up Python
# uses: actions/setup-python@v4
# with:
# python-version: 3.10.14
# - name: Prepare disks
# run: |
# sudo mkfs -t ext4 /dev/nvme1n1
# sudo mkdir ${{ env.DOCKER_VOLUME }}
# sudo mount /dev/nvme1n1 ${{ env.DOCKER_VOLUME }}
# - name: Install
# run: |
# make install-integration-tests
# - name: Run tests
# run: |
# export DOCKER_IMAGE=registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sha-${{ env.GITHUB_SHA_SHORT }}
# export HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }}
# pytest -s -vv integration-tests
20 changes: 20 additions & 0 deletions .github/workflows/build_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Build documentation

on:
push:
paths:
- "docs/source/**"
branches:
- main
- doc-builder*
- v*-release

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: text-generation-inference
additional_args: --not_python_module
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
19 changes: 19 additions & 0 deletions .github/workflows/build_pr_documentation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Build PR Documentation

on:
pull_request:
paths:
- "docs/source/**"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: text-generation-inference
additional_args: --not_python_module
26 changes: 26 additions & 0 deletions .github/workflows/client-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Python Client Tests

on:
pull_request:
paths:
- ".github/workflows/client-tests.yaml"
- "clients/python/**"

jobs:
run_tests:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.9
- name: Install
run: |
cd clients/python && pip install .
- name: Run tests
run: |
pip install pytest pytest-asyncio
export HUGGING_FACE_HUB_TOKEN=${{ secrets.HUGGING_FACE_HUB_TOKEN }}
make python-client-tests
Loading
Loading