Does DPR document store "update embeddings" utilize multiple GPUs? #1318

shihabrashid-ucr · 2021-08-04T05:57:27Z

I am trying to create DPR embeddings for the whole Wikipedia dataset (11 million documents).
First I ran the code on a single 16GB GPU and 61 GB ram. In tqdm I can see that the whole "document_store.update_embeddings()" with a batch size of 32 would take total 30 hours. However, after 20 hours the process gets "killed" everytime, I am guessing due to low RAM.
So, I again ran the code, but this time on 12GB x 8 = 96 GB GPU(8 Tesla K80 GPUs) and 488GB of RAM machine. But now, the tqdm process with batch size of 32 shows 157 hours! I ran it multiple times to make sure. So I am confused, does the update embeddings code utilize multiple GPUs?
I followed issue (#601) and got an understanding that the batch mode was introduced.
Is there any alternate way I can create embeddings for 11m docs utilizing multiple GPUs?
Here is the code I am using:
`

dpr_document_store = FAISSDocumentStore(faiss_index_factory_str="Flat", similarity="dot_product", sql_url= "sqlite:///all_docs.db", index='wiki_docs')
dpr_document_store.write_documents(wiki_dict)
retriever = DensePassageRetriever(document_store=dpr_document_store,
                              query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
                              passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
                              max_seq_len_query=365,
                              max_seq_len_passage=350,
                              batch_size=32,
                              use_gpu=True,
                              embed_title=False,
                              use_fast_tokenizers=True)
dpr_document_store.update_embeddings(retriever)
dpr_document_store.save('/home/ubuntu/FAISS_saves/wiki_all_docs')

`

The text was updated successfully, but these errors were encountered:

tholor · 2021-08-04T18:03:48Z

Hey @shihab3252 ,

DPR's update_embeddings() is currently not supporting multiple GPUs. However, it makes totally sense to enable multiple GPUs (at least via DataParallell) - I'll add it to our next sprint unless you want to provide a PR yourself here.

Regarding your problems with the single GPU: The GPU memory shouldn't be a problem here. Can you share the error message you get there? As a temporary, hacky workaround you could also try to save the FAISS Index every ~ 1 Mio documents. Then you at least wouldn't need to start from scratch if errors happen so late. Could be something along these lines:

...
for batch in wiki_doc_batches: 
     dpr_document_store.write_documents(wiki_dict)
     dpr_document_store.update_embeddings(retriever, update_existing_embeddings=False)
     dpr_document_store.save("/home/ubuntu/FAISS_saves/wiki_all_docs")

shihabrashid-ucr · 2021-08-04T23:07:22Z

The error I am getting is killed after around 25 hours.
It would be great if you can add DataParallel to update_embeddings().
Let me try out the temporary workaround and see if it works.
Thanks for all your help!

tholor · 2021-09-10T11:25:40Z

Implemented in #1414

FYI @shihabrashid-ucr

tholor self-assigned this Aug 4, 2021

tholor added topic:modeling type:feature New feature or request labels Aug 6, 2021

chrk623 mentioned this issue Aug 13, 2021

Cannot updating embeddings on a Wikipedia dump #1343

Closed

MichelBartels mentioned this issue Sep 5, 2021

Adding multi gpu support for DPR inference #1414

Merged

4 tasks

tholor closed this as completed Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does DPR document store "update embeddings" utilize multiple GPUs? #1318

Does DPR document store "update embeddings" utilize multiple GPUs? #1318

shihabrashid-ucr commented Aug 4, 2021 •

edited

Loading

tholor commented Aug 4, 2021

shihabrashid-ucr commented Aug 4, 2021

tholor commented Sep 10, 2021

Does DPR document store "update embeddings" utilize multiple GPUs? #1318

Does DPR document store "update embeddings" utilize multiple GPUs? #1318

Comments

shihabrashid-ucr commented Aug 4, 2021 • edited Loading

tholor commented Aug 4, 2021

shihabrashid-ucr commented Aug 4, 2021

tholor commented Sep 10, 2021

shihabrashid-ucr commented Aug 4, 2021 •

edited

Loading