You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have spotted a memory leak in the latest release (1.0.3). When transcribing sequentially, the memory behaves as expected. When however it is called in parallel, the memory usage keeps increasing until OOM, even if garbage collected manually.
I have built a minimal reproducible example. Notice also how "mem after gc" increases when you increase PARALLEL to 5 or 6.
importgcfromthreadingimportThreadfromfaster_whisperimportWhisperModelimportpsutilPARALLEL=4THREADS= []
MODEL=WhisperModel('large-v2', device='auto', compute_type='int8', cpu_threads=4)
defget_rss():
''' Get current memory usage (RSS) in MB '''returnint(psutil.Process().memory_info().rss/1048576)
deftranscribe():
print(f'mem before {get_rss()}')
segments, _info=MODEL.transcribe('test.mp3')
_=list(segments)
print(f'mem after {get_rss()}')
defsequential():
print('sequential:')
for_inrange(PARALLEL):
transcribe()
defparallel():
print('\nparallel:')
for_inrange(PARALLEL):
THREADS.append(Thread(target=transcribe))
THREADS[-1].start()
defmain():
sequential()
gc.collect()
print(f'\nmem after gc {get_rss()}')
parallel()
fortinTHREADS:
t.join()
gc.collect()
print(f'\nmem after gc {get_rss()}')
if__name__=='__main__':
main()
Output:
sequential:
membefore1761memafter2617membefore2617memafter2617membefore2617memafter2617membefore2617memafter2617memaftergc2617# everything fine up until hereparallel:
membefore2617membefore2617membefore2617membefore2617memafter2691memafter2691memafter2691memafter2691memaftergc2691# leak
Edit:
There's another catch. When you run the same script multiple times, you will notice the outcome is very different sometimes for the final "mem after gc".
for run in {1..10}; do python test.py|tail -n1;done
Hey there, thank you for the project.
I have spotted a memory leak in the latest release (1.0.3). When transcribing sequentially, the memory behaves as expected. When however it is called in parallel, the memory usage keeps increasing until OOM, even if garbage collected manually.
I have built a minimal reproducible example. Notice also how "mem after gc" increases when you increase PARALLEL to 5 or 6.
wget "https://cdn.pixabay.com/download/audio/2024/07/23/audio_9f165cf892.mp3?filename=medieval-gamer-voice-donx27t-forget-to-subscribe-226581.mp3" -O test.mp3
Output:
Edit:
There's another catch. When you run the same script multiple times, you will notice the outcome is very different sometimes for the final "mem after gc".
for run in {1..10}; do python test.py|tail -n1;done
The text was updated successfully, but these errors were encountered: