-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we automatically update the model files that whisper uses? if so, at what frequency and with what mechanism? #23
Comments
possible storytime fodder |
Perhaps we could add a unit test that compares the list of models: https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L32 with the ones in https://github.com/sul-dlss/speech-to-text/blob/main/whisper_models/urls.txt Then when we update whisper, and there is a new model, the test will start to fail? We will need to remember that fixing the test requires rebuilding the Docker container... |
discussed regression testing for model upgrades today as a tangent while troubleshooting some
it's also possible that we'll discover that an unexpected deviation is an improvement, in which case we'd probably want to update the test expectations going forward? |
I like the sound of this. I think we could assemble some/all of the Pilot test data so it could be easily run by an integration test. In addition to doing some basic checks, we we could add some tools here to this repository to use the jiwer library to compare the results against an expected baseline? I suspect that there will be some human level evaluation that is needed. |
came here to note some of what we discussed after standup today about regression testing. seems like all of what we talked about was already captured by the above comments. but it maybe is worth repeating that it's unlikely that evaluation of regression testing results will be totally automatable, in the same way that runs of CI or infra integration tests get a mechanical pass/fail. as ed says, it's likely that some amount of human evaluation will be needed to interpret test results. we should also probably break out a separate regression testing ticket, as we've realized there are updates we need to regression test besides model updates: CUDA version, underlying GPU hardware, pytorch version, default settings changes, and many other things can each on their own possibly introduce significant changes to output (some of which will improve output, and some of which will make it worse). see also today's post-standup discussion around segfaults, #68, and https://stackoverflow.com/questions/78196316/pytorch-segementation-fault-core-dumped-when-moving-pytorch-tensor-to-gpu |
I noticed that the large-v3 image gets pulled in about 20-30 seconds in AWS Batch. We could ensure that the image is written to disk such that it is available for another docker run, which would mean subsequent jobs processed by the same ec2 instance would not need to pull down the model file? This way the model file would always be up to date? |
this list has the URLs for retrieving models as setup for building the container: https://github.com/sul-dlss/speech-to-text/blob/main/whisper_models/urls.txt
see also https://github.com/sul-dlss/speech-to-text?tab=readme-ov-file#build
The text was updated successfully, but these errors were encountered: