Reuse session for 60% speedup #5

samwillis · 2024-07-10T17:53:56Z

vits-web is really awesome!

I've started trying to speed it up a little, currently predict() setups a whole new ort session and loads the models each time. If instead you split the process into two, so that you can create a "vits-web session" first, you can get a 60% speedup on the basic example. This is particularly useful for repeated calls, setup the model once, and then repeatedly call it with text chunks for more audio (this is what I want to use it for).

As you can see in the video from my machine it cuts the call time from ~2.7s to ~1.1s, with the ~1.7s during the initiation of the session. Repeat calls then only take ~1s.

I've not added/modified any tests yet as it makes more sense to run this past you first.

Screen.Recording.2024-07-10.at.19.47.50.mov

k9p5 · 2024-07-10T21:36:19Z

Thanks you very much, good job :) I really did not expect it to make that much of a difference. Am I correct to assume that if you terminate the worker after a result has been generated the time difference will diminish?

k9p5 · 2024-07-10T22:22:49Z

I just noticed that you're also holding the model in memory, was that on purpose?

samwillis · 2024-07-11T07:23:15Z

Yes. So for my use case I want to progressively do ttv, essentially one sentence at a time, and start playing it as soon as the first chunk is available.

Holding the model in memory for the duration of the multi step session is ideal.

I tried the transformers.js ttv and it's far too slow, even when using multiple workers. Your packaging of Sherpa/Onnx/Piper is perfect, and with this change makes real-time ttv possible in the browser.

This will need some more polish, particularly around disposing of the session after use.

k9p5 · 2024-07-11T10:13:20Z

Makes sense, it kind of depends on the use case whether to keep the memory footprint as small as possible (my goal) or to minimize runtime. Effectively, the part that is missing from the library are environment variables. I'm gonna copy this strategy from onnix then you can just set something like tts.env.keepSessionInMemory = true and maybe this also requires tts.releaseMemory()

samwillis · 2024-07-11T12:29:12Z

Absolutely, and I think vits-web can cover both use cases well. On the minimum runtime side, it's important to be able to start the runtime+model before you need it too, not just keep it around for the next call.

In my refactor the original predict function still exists and is just a thin wrapper around the session object, starting it for the single call, then discarding it. The largest part of the memory footprint is the voice model, so I think having a wrapper session object (started with the voiceId) works quite well. It seems to me this API is the most flexible, for example allowing someone to start and use multiple voices at once.

tts.env.keepSessionInMemory = true implies that there wouldn't be a way to start the runtime+model pre-emptively, that's very important for the time to first output in some applications. With the thing I'm experimenting with (voice gen in real time of a streaming response from an LLM) I would want to start the model at application start, then when required produce voice output as fast as possible.

Let me know if you are happy with my approach, and if so I can tidy it up and add tests.

avarayr · 2024-09-20T14:23:24Z

Hey guys, sorry to revive this thread. I'm I right to assume that the recent commit bdf7f36 addresses this issue?

mikebaldry · 2024-11-11T14:56:00Z

Hey guys, sorry to revive this thread. I'm I right to assume that the recent commit bdf7f36 addresses this issue?

I think it addresses part of the issue, but it will still create a new session every predict and I'm not sure the loading of the imports is the slowest part. That would be creating the InferenceSession from the model (and maybe getting the blob, though that's probably cached anyway).

This is very promising for me - I would like to create a session, run many predictions against the session, then close the session when it makes sense for me.

samwillis added 2 commits July 10, 2024 19:41

Reuse session for 60% speedup

f73b3ee

Bump tsconfig lib to ES2022 so that build works

9b38ee2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse session for 60% speedup #5

Reuse session for 60% speedup #5

samwillis commented Jul 10, 2024

k9p5 commented Jul 10, 2024

k9p5 commented Jul 10, 2024

samwillis commented Jul 11, 2024 •

edited

Loading

k9p5 commented Jul 11, 2024

samwillis commented Jul 11, 2024

avarayr commented Sep 20, 2024

mikebaldry commented Nov 11, 2024 •

edited

Loading

Reuse session for 60% speedup #5

Are you sure you want to change the base?

Reuse session for 60% speedup #5

Conversation

samwillis commented Jul 10, 2024

k9p5 commented Jul 10, 2024

k9p5 commented Jul 10, 2024

samwillis commented Jul 11, 2024 • edited Loading

k9p5 commented Jul 11, 2024

samwillis commented Jul 11, 2024

avarayr commented Sep 20, 2024

mikebaldry commented Nov 11, 2024 • edited Loading

samwillis commented Jul 11, 2024 •

edited

Loading

mikebaldry commented Nov 11, 2024 •

edited

Loading