You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For my university FYP project related to text simplification, there's a requirement for me to generate LASER embeddings for a large number of sentences. (15.7 million) However when I try to generate LASER embeddings using the SentenceEncoder in the embed.py, the program stays fully utilized for around 12 hours and then exits without any error (I assume it is because of the high CPU and GPU utilization). I'm using the SentenceEncoder in the following way.
Initialize the SentenceEncoder with the following params. I'm using the pretrained encoder (models/bilstm.93langs.2018-12-26.pt )
I tried to execute the setup with above params in a GCP compute engine with 16 cores with 102 GB memory and 1 Nvidia Tesla T4 GPU. The CPU utilization reaches 100% while the GPU utilization is somewhere around 90%. It stays like that for around 12 hours and exits without any error. (no error in nohup.out).
Any idea about what could go wrong ? I'm stucked at this point for several weeks and really appreciate if someone can help me.
I also have a similar issue. Not sure if this would help, but have you tried ThreadPoolExecutor or multiprocessing to parallelize? If you are not married to LASER, there are many embedding models now based on transformer architecture (unlike LASER's BiLSTM), so the computation is much faster from the get-go.
For my university FYP project related to text simplification, there's a requirement for me to generate LASER embeddings for a large number of sentences. (15.7 million) However when I try to generate LASER embeddings using the
SentenceEncoder
in theembed.py
, the program stays fully utilized for around 12 hours and then exits without any error (I assume it is because of the high CPU and GPU utilization). I'm using theSentenceEncoder
in the following way.Initialize the
SentenceEncoder
with the following params. I'm using the pretrained encoder (models/bilstm.93langs.2018-12-26.pt
)And then generate LASER embeddings as follows.
I tried to execute the setup with above params in a GCP compute engine with 16 cores with 102 GB memory and 1 Nvidia Tesla T4 GPU. The CPU utilization reaches
100%
while the GPU utilization is somewhere around90%
. It stays like that for around 12 hours and exits without any error. (no error innohup.out
).Any idea about what could go wrong ? I'm stucked at this point for several weeks and really appreciate if someone can help me.
cc @hoschwenk
The text was updated successfully, but these errors were encountered: