Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-55475] Performance tests #47

Merged
merged 1 commit into from
Nov 21, 2023
Merged

[MM-55475] Performance tests #47

merged 1 commit into from
Nov 21, 2023

Conversation

streamer45
Copy link
Contributor

Summary

Attaching the results of preliminary performance tests. I selected the same instance type we use in production for recordings c6i.2xlarge and executed on all the models we include (tiny, base, small) with a base sample of 10 minutes.

For the default configuration of threads (NumCPU / 2) I also performed tests on a full hour of meeting sample.

The call samples were extracted from real developers meetings so the test should be as close as possible to a real use case with the caveat that it was a single track. In general though I wouldn't expect multiple tracks to cause significant overhead since it's unlikely for speech from different tracks to be overlapping for long periods.

What's likely causing some overhead is the number of speech segments that we get out of these tracks (due to the speech detection process). We can probably tune this further to try and minimize the number of contiguous samples. Right now we are using a value of 2 seconds of silence after which we split.

Overall the results show almost linear performance gains with the number of threads of execution.

Please let me know if you have any questions or concerns.

Ticket Link

https://mattermost.atlassian.net/browse/MM-55475

@streamer45 streamer45 added the 2: Dev Review Requires review by a core committer label Nov 18, 2023
@streamer45 streamer45 added this to the v0.5.0 milestone Nov 18, 2023
@streamer45 streamer45 self-assigned this Nov 18, 2023
Copy link
Member

@cpoile cpoile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
One q: Is there an accuracy measure for these? If I were a customer trying to decide which model to use, would I just have to try each out and see?

@streamer45
Copy link
Contributor Author

Nice! One q: Is there an accuracy measure for these? If I were a customer trying to decide which model to use, would I just have to try each out and see?

That's a good question. We can plan some accuracy tests but I'd expect that to be an effort on its own as we need to find some good samples (not just audio books or well known speeches).

At this point I'd probably refer them to the results for the original models (from the paper itself) since it doesn't seem there's anything "official" from whisper.cpp. But of course whisper.cpp isn't as accurate as the original implementation. There's a very good technical breakdown of why that's the case at ggerganov/whisper.cpp#1163 if you are interested.

Overall I think a customer would most likely start with the default and move up or down as needed. If you think that's confusing, we are still in time to just stick with a single model, but it felt nice to give some degree of performance/accuracy customization.

@streamer45 streamer45 added 3: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels Nov 20, 2023
@streamer45 streamer45 merged commit e8e15bd into MM-53432 Nov 21, 2023
3 checks passed
@streamer45 streamer45 deleted the MM-55475 branch November 21, 2023 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants