How to improve inference latency performance? #796

twocode · 2024-10-22T13:31:06Z

2024-10-22 03:26:36.033 | INFO     | app:generate_audio:73 - Refined text: ['but since [uv_break] 波 卡 [uv_break] like [uv_break] like 里 法, like pocari sweat, [uv_break] the drink. [uv_break], and [uv_break] 东 方 民 族, [uv_break] eastern cultures and peoples, are super different,']
2024-10-22 03:26:36.033 | INFO     | app:generate_audio:78 - Start voice inference.
text:  16%|█▌        | 62/384(max) [00:01, 56.04it/s]
code:  30%|██▉       | 606/2048(max) [00:10, 55.61it/s]
2024-10-22 03:26:48.069 | INFO     | app:generate_audio:91 - Inference completed.

This simple sentence took 12 seconds on Nvidia Tesla T4. Is it correct to assume ChatTTS is not suitable for situations that require low "Time To First Audio(TTFA)"?

The text was updated successfully, but these errors were encountered:

medemi68 · 2024-10-24T08:27:51Z

I second this. I think there is some major improvement needed to get the Time to first audio down pat. I was able to do some optimization personally by reducing the chunk size and setting the stream speed. I was running it on a 4090 which has 80 TFLOPS or so, and I was able to get the inference speed a lot faster. But definitely you would need to use streams to get a fast time to first byte, disable the refine text portion too.

fumiama added documentation Improvements or additions to documentation help wanted Extra attention is needed algorithm Algorithm improvements & issues performance Running speed & quality labels Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve inference latency performance? #796

How to improve inference latency performance? #796

twocode commented Oct 22, 2024

medemi68 commented Oct 24, 2024

How to improve inference latency performance? #796

How to improve inference latency performance? #796

Comments

twocode commented Oct 22, 2024

medemi68 commented Oct 24, 2024