Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

[RAG] Handle nan scores. #3818

Merged
merged 1 commit into from
Jul 20, 2021
Merged

[RAG] Handle nan scores. #3818

merged 1 commit into from
Jul 20, 2021

Conversation

klshuster
Copy link
Contributor

Patch description
Handles a bug caught in #3806; if the index building goes wrong, there's a possibility that the context/document scores are nan's; this seems to occur if there's an issue with the clustering of the vectors (see linked FAISS issue in #3806)

Testing steps
Reproduced the error with steps in #3806; verified that, with this fix, the error is no longer an issue.

'\n[ Document scores are NaN; please look into the built index. ]\n'
'[ If using a compressed index, try building an exact index: ]\n'
'[ $ python index_dense_embeddings --indexer-type exact... ]'
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error message also mention something about replacing some dummy numbers for score?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think it's necessary

'[ If using a compressed index, try building an exact index: ]\n'
'[ $ python index_dense_embeddings --indexer-type exact... ]'
)
scores.fill_(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why 1 is the error value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i picked an arbitrary number so that the scores play nicely with the rest of the system

the scores are literally a tensor of nans:

ipdb> scores
tensor([[nan, nan, nan, nan, nan]], device='cuda:0', dtype=torch.float16)

so needed to pick something that worked

@klshuster
Copy link
Contributor Author

I'm pretty sure the failing tests are unrelated to this PR; they are currently being hammered out elsewhere, so will merge as is

@klshuster klshuster merged commit 2fc0731 into master Jul 20, 2021
@klshuster klshuster deleted the rag_fix_nan branch July 20, 2021 19:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants