triton-inference-server / fastertransformer_backend Public

Notifications You must be signed in to change notification settings
Fork 134
Star 411

Code
Issues 68
Pull requests 6
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/fastertransformer_backend

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

68 Open 47 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

tritonserver version

#173 opened Nov 2, 2023 by double-vin

Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

#171 opened Oct 19, 2023 by cabbagetalk

No response is received during inference in decoupled mode. bug

Something isn't working

#169 opened Sep 26, 2023 by amazingkmy

what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ? bug

Something isn't working

#168 opened Sep 22, 2023 by flyingjohn

the docs are not updated with the source code.

#167 opened Sep 22, 2023 by trinhtuanvubk

Failed to run on H100 GPU with tensor para=8

#166 opened Sep 15, 2023 by sfc-gh-zhwang

How to deploy multiple model in a node with multople GPUs bug

Something isn't working

#165 opened Sep 14, 2023 by jjjjohnson

Memory usage is doubled when loading a fp16 model into bf16 bug

Something isn't working

#164 opened Sep 6, 2023 by skyser2003

Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs

#163 opened Aug 22, 2023 by chunyat

Can i stop execution? (w/ decoupled mode) bug

Something isn't working

#162 opened Aug 21, 2023 by Yeom

Do I need to specify ARG SM=80 when building the image manually?

#161 opened Aug 15, 2023 by sfc-gh-zhwang

is_return_log_probs is required for decoupled model?

#160 opened Aug 9, 2023 by flexwang

Can I enable streaming on an ensemble model?

#155 opened Jul 18, 2023 by flexwang

huggingface_bert_convert.py can't convert some key bug

Something isn't working

#152 opened Jul 3, 2023 by SeungjaeLim

Failing to build with triton 23.04 bug

Something isn't working

#150 opened Jun 30, 2023 by bronzafa

Is deberta supported in the fastertranformer backend?

#148 opened Jun 28, 2023 by sfc-gh-zhwang

Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written

#144 opened Jun 13, 2023 by songkq

Why is it needed to set max_batch_size to 1 under interactive mode?

#143 opened Jun 12, 2023 by zhypku

Why processing requests of batch size=1 is much slower than batch size>1

#142 opened Jun 8, 2023 by mapcan

FasterTransformer Backend fails to build using latest version of Triton Server bug

Something isn't working

#140 opened Jun 2, 2023 by mshuffett

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

#139 opened Jun 1, 2023 by songkq

triton support using factertransfer backend for flan-ul2 and flan-ul2-alpaca-lora

#138 opened May 27, 2023 by ma-siddiqui

config file for flan-ul2-alpaca-lora - config.pbtxt

#137 opened May 27, 2023 by ma-siddiqui

flan-ul2 sample config.pbtxt

#136 opened May 27, 2023 by ma-siddiqui

Feature request: Conversion from GPTBigCodeForCausalLM / Starcoder

#132 opened May 19, 2023 by michaelfeil

Previous 1 2 3 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly