[Help Wanted] Generate LASER embeddings for a large number of sentences (15.7 million) #192

NomadXD · 2021-08-04T19:47:54Z

For my university FYP project related to text simplification, there's a requirement for me to generate LASER embeddings for a large number of sentences. (15.7 million) However when I try to generate LASER embeddings using the SentenceEncoder in the embed.py, the program stays fully utilized for around 12 hours and then exits without any error (I assume it is because of the high CPU and GPU utilization). I'm using the SentenceEncoder in the following way.

Initialize the SentenceEncoder with the following params. I'm using the pretrained encoder (models/bilstm.93langs.2018-12-26.pt )

SentenceEncoder(encoder_path, max_tokens=3000, cpu=False, verbose=True)

And then generate LASER embeddings as follows.

embeddings = encoder.encode_sentences(read_lines(bpe_filepath))

I tried to execute the setup with above params in a GCP compute engine with 16 cores with 102 GB memory and 1 Nvidia Tesla T4 GPU. The CPU utilization reaches 100% while the GPU utilization is somewhere around 90%. It stays like that for around 12 hours and exits without any error. (no error in nohup.out).

Any idea about what could go wrong ? I'm stucked at this point for several weeks and really appreciate if someone can help me.

cc @hoschwenk

The text was updated successfully, but these errors were encountered:

prasunshrestha · 2024-02-13T21:50:36Z

I also have a similar issue. Not sure if this would help, but have you tried ThreadPoolExecutor or multiprocessing to parallelize? If you are not married to LASER, there are many embedding models now based on transformer architecture (unlike LASER's BiLSTM), so the computation is much faster from the get-go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help Wanted] Generate LASER embeddings for a large number of sentences (15.7 million) #192

[Help Wanted] Generate LASER embeddings for a large number of sentences (15.7 million) #192

NomadXD commented Aug 4, 2021 •

edited

Loading

prasunshrestha commented Feb 13, 2024

[Help Wanted] Generate LASER embeddings for a large number of sentences (15.7 million) #192

[Help Wanted] Generate LASER embeddings for a large number of sentences (15.7 million) #192

Comments

NomadXD commented Aug 4, 2021 • edited Loading

prasunshrestha commented Feb 13, 2024

NomadXD commented Aug 4, 2021 •

edited

Loading