You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to create DPR embeddings for the whole Wikipedia dataset (11 million documents).
First I ran the code on a single 16GB GPU and 61 GB ram. In tqdm I can see that the whole "document_store.update_embeddings()" with a batch size of 32 would take total 30 hours. However, after 20 hours the process gets "killed" everytime, I am guessing due to low RAM.
So, I again ran the code, but this time on 12GB x 8 = 96 GB GPU(8 Tesla K80 GPUs) and 488GB of RAM machine. But now, the tqdm process with batch size of 32 shows 157 hours! I ran it multiple times to make sure. So I am confused, does the update embeddings code utilize multiple GPUs?
I followed issue (#601) and got an understanding that the batch mode was introduced.
Is there any alternate way I can create embeddings for 11m docs utilizing multiple GPUs?
Here is the code I am using:
`
DPR's update_embeddings() is currently not supporting multiple GPUs. However, it makes totally sense to enable multiple GPUs (at least via DataParallell) - I'll add it to our next sprint unless you want to provide a PR yourself here.
Regarding your problems with the single GPU: The GPU memory shouldn't be a problem here. Can you share the error message you get there? As a temporary, hacky workaround you could also try to save the FAISS Index every ~ 1 Mio documents. Then you at least wouldn't need to start from scratch if errors happen so late. Could be something along these lines:
...
for batch in wiki_doc_batches:
dpr_document_store.write_documents(wiki_dict)
dpr_document_store.update_embeddings(retriever, update_existing_embeddings=False)
dpr_document_store.save("/home/ubuntu/FAISS_saves/wiki_all_docs")
The error I am getting is killed after around 25 hours.
It would be great if you can add DataParallel to update_embeddings().
Let me try out the temporary workaround and see if it works.
Thanks for all your help!
I am trying to create DPR embeddings for the whole Wikipedia dataset (11 million documents).
First I ran the code on a single 16GB GPU and 61 GB ram. In tqdm I can see that the whole "document_store.update_embeddings()" with a batch size of 32 would take total 30 hours. However, after 20 hours the process gets "killed" everytime, I am guessing due to low RAM.
So, I again ran the code, but this time on 12GB x 8 = 96 GB GPU(8 Tesla K80 GPUs) and 488GB of RAM machine. But now, the tqdm process with batch size of 32 shows 157 hours! I ran it multiple times to make sure. So I am confused, does the update embeddings code utilize multiple GPUs?
I followed issue (#601) and got an understanding that the batch mode was introduced.
Is there any alternate way I can create embeddings for 11m docs utilizing multiple GPUs?
Here is the code I am using:
`
`
The text was updated successfully, but these errors were encountered: