You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using the enwiki-latest-pages-articles.xml.bz2 Wikimedia dump. I tried several methods of updating the embedding for the entire dump but still no luck. I've removed all redirects which comes to around 6,311,807 documents, and after i split them up with sliding window (split_by="word", split_length=512, split_overlap=258), right now the total number of documents are 11,507,338.
The problem is that every time I run document_store.update_embeddings using DensePassageRetriever i get stuck at 512000/11507338 then the process gets killed due to an error:
Might relate to #1318.
I'm currently using the
enwiki-latest-pages-articles.xml.bz2
Wikimedia dump. I tried several methods of updating the embedding for the entire dump but still no luck. I've removed all redirects which comes to around 6,311,807 documents, and after i split them up with sliding window (split_by="word"
,split_length=512
,split_overlap=258
), right now the total number of documents are 11,507,338.The problem is that every time I run
document_store.update_embeddings
usingDensePassageRetriever
i get stuck at512000/11507338
then the process gets killed due to an error:As suggested by other issues, I've tried lowering the
batch_size
but still no luck. Current state:Are there any alternative ways to
update_embeddings
with a data of this size?The text was updated successfully, but these errors were encountered: