silero-models and silero-vad combined lead to ImportError #28

samueldomdey · 2021-02-04T12:27:31Z

If using both silero-models and silero-vad combined in a function call, only either the models or vad call works, while the second leads to an ImportError:

ImportError: cannot import name 'get_speech_ts'

I assume not being aware of something trivial here, but couldn't figure out how to solve this until now. Any ideas?

snakers4 · 2021-02-04T13:01:45Z

Can you please post a code snippet?

samueldomdey · 2021-02-04T13:18:51Z

It seems to be related to local cache and torch.hub.load() when loading both models in the same subprocess.

Quoting PyTorch documentation:
"A known limitation that worth mentioning here is user CANNOT load two different branches of the same repo in the same python process. It’s just like installing two packages with the same name in Python, which is not good. Cache might join the party and give you surprises if you actually try that. Of course it’s totally fine to load them in separate processes."

https://pytorch.org/docs/stable/hub.html

Current implementation:
-> call in loop
outputs = speech_to_text_extraction._speech_to_text(path+country+i, country_v1)
timestamps = speech_to_timestamps_extraction._speech_to_timestamps(path+country+i)

-> function 1

def _speech_to_timestamps(filepath):

# load model and utils from github if not previously done so
# force_reload=True to ensure newest version
model2, utils2 = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_vad')

#  util functions
(get_speech_ts,
 _, read_audio,
 _, _, _) = utils2


# read .wav file
wav = read_audio(filepath)

# extract timestamps
speech_timestamps = get_speech_ts(wav, model2,
                                  num_steps=4)

# print and return speech timestamps
print(speech_timestamps)
return speech_timestamps

-> function 2

def _speech_to_text(filepath, language):

# silero STT model
# set device to cpu
device = torch.device('cpu')  

# load model from silero repo
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)

# util functions
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  



# test file on STT silero model
file = glob(filepath)

batches = split_into_batches(file, batch_size=100)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

# generate prediction from model
output = model(input)
# return entire prediction as string
print(decoder(output[0]))
return decoder(output[0])

Currently trying to implement the second call into a new subprocess as a workaround.
If you know a better solution I'd greatly appreciate it though!

snakers4 · 2021-02-04T13:57:17Z

Hm, interesting, I suppose repos are different but the name is the same, that's why it has a problem (I believe it can be checked by following torch hub cache paths)?

I will verify a bit later, but some plain solutions out of my head:

Use force_reload=True, though it is suboptimal, maybe you will have to load in one function;
Use suprocess;
Architecture wise it makes more sense to separate vad and stt into different workers / entities / consumers;

samueldomdey · 2021-02-04T16:22:28Z

Alright, thank you

snakers4 · 2021-02-05T07:16:55Z

It pulls repos into different folders for sure

files_dir = torch.hub.get_dir()
!ls $files_dir

>> snakers4_silero-models_master  snakers4_silero-vad_master

snakers4 · 2021-02-05T07:41:11Z

So, I believe I am narrowing on the culprit
I have tried various torch hub models
And omitting the dependencies and other code bulk, this can be made to work

stt_model, decoder, stt_utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                               model='silero_stt',
                                               language='en',
                                               device=device)

model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased')

but this cannot

stt_model, decoder, stt_utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                               model='silero_stt',
                                               language='en',
                                               device=device)

vad_model, utils_vad = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                      model='silero_vad')

Using force_reload=True does not really help, and cache is correctly stored in two different folders

snakers4 · 2021-02-05T07:58:49Z

Found the culprit, looks like all torch.hub models are treated as the same package / namespace. So you cannot have duplicate module names there.

This minimal example solves the problem

import torch

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
files_dir = torch.hub.get_dir()

!ls $files_dir

hubconf = """dependencies = ["torch"]
import torch
from utils2 import init_jit_model

def silero_vad(**kwargs):
    hub_dir = torch.hub.get_dir()
    model = init_jit_model(model_path=f"{hub_dir}/snakers4_silero-vad_master/files/model.jit")
    return model
"""

!cp $files_dir/snakers4_silero-vad_master/utils.py $files_dir/snakers4_silero-vad_master/utils2.py
!echo '$hubconf' > $files_dir/snakers4_silero-vad_master/hubconf.py
!cat $files_dir/snakers4_silero-vad_master/hubconf.py


vad_model = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                           model='silero_vad')

stt_model, decoder, stt_utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                               model='silero_stt',
                                               language='en',
                                               device=device)

snakers4 · 2021-02-05T08:03:56Z

@samueldomdey

After a small fix in the silero-vad this works just fine (notice that I reloaded cached files intentionally here):

import torch
device = torch.device('cpu')

stt_model, decoder, stt_utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                               model='silero_stt',
                                               language='en',
                                               device=device,
                                               force_reload=True)

vad_model, utils_vad = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                      model='silero_vad',
                                      force_reload=True)

Architecture wise it makes more sense to separate vad and stt into different workers / entities / consumers;

Also please note that architecture-wise VAD is fast enough to operate on 1 thread, while STT is not.
So you for test purposes it is of course fine to keep them together, but in ideal world they are better kept in separate workers.

snakers4 · 2021-02-05T08:04:14Z

Please confirm the fix and close

samueldomdey · 2021-02-05T08:59:13Z

You are correct, it does work now.
Vielen Dank for looking into it and fixing it!

samueldomdey added the help wanted Extra attention is needed label Feb 4, 2021

samueldomdey assigned snakers4 Feb 4, 2021

snakers4 added bug Something isn't working enhancement New feature or request and removed help wanted Extra attention is needed labels Feb 5, 2021

snakers4 mentioned this issue Feb 5, 2021

Inconvenience Loading Several Models With Colliding Namespaces pytorch/hub#185

Open

samueldomdey closed this as completed Feb 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

silero-models and silero-vad combined lead to ImportError #28

silero-models and silero-vad combined lead to ImportError #28

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 4, 2021

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 4, 2021

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

samueldomdey commented Feb 5, 2021

silero-models and silero-vad combined lead to ImportError #28

silero-models and silero-vad combined lead to ImportError #28

Comments

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 4, 2021

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 4, 2021

samueldomdey commented Feb 4, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

snakers4 commented Feb 5, 2021

samueldomdey commented Feb 5, 2021