Unexpected output from the search #3705

Jae0Kang · 2024-07-24T08:01:56Z

Jae0Kang
Jul 24, 2024

Summary

Platform

OS: Linux

Faiss version:

Installed from:

Faiss compilation options:

Running on:

CPU
GPU

Interface:

C++
Python

Reproduction instructions

I'm conducting an experiment using FAISS to compare the performance of pre-filtering and post-filtering with the HNSW index. However, I'm encountering unexpected results during post-filtering.

Experiment Details:

Input Size: 105,100
Dimension: 2048
Filtering Condition: 22,069 items satisfy this condition.
Expected Output: 200 specific indexes should be returned after searching.
Search Process: I start with k=200 and double it if I don't retrieve 200 results, continuing until k reaches half the input size (52,550).
Issue:
The accuracy, defined as the intersection of expected and actual results divided by the expected results, is only 0.1. This low accuracy persists across different tests. The search function does not consistently return the desired indexes, even though they are connected in the graph structure. For instance, if index 4 is an answer and is connected to index 10, the search sometimes fails to return index 4, regardless of how much I increase k.

Question:
Why might the search function be failing to return the expected results, even when the nodes are connected in the graph? Any advice on how to improve the accuracy of my experiment would be appreciated.

rafayaar · 2024-09-19T03:33:35Z

rafayaar
Sep 19, 2024

@Jae0Kang

In HNSW, neighbors are connected based on proximity within graphs. Even though index 4 may be connected to index 10, the graph traversal can miss certain connections due to the local nature of the search. This happens especially in larger graphs with high dimensionality (2048 is considered high dimension), where shortcuts between clusters or dense regions can be missed. Increasing k helps, but it doesn’t guarantee that all expected results will be found.

I suggest increasing efSearch. A higher value of efSearch allows the search to explore more neighbors, potentially improving recall.
I would also suggest to recheck how you are filtering the search results.

I hope this helps!

Summary

Platform

OS: Linux

Faiss version:

Installed from:

Faiss compilation options:

Running on:

CPU

GPU

Interface:

C++

Python

Reproduction instructions

I'm conducting an experiment using FAISS to compare the performance of pre-filtering and post-filtering with the HNSW index. However, I'm encountering unexpected results during post-filtering.

Experiment Details:

Input Size: 105,100 Dimension: 2048 Filtering Condition: 22,069 items satisfy this condition. Expected Output: 200 specific indexes should be returned after searching. Search Process: I start with k=200 and double it if I don't retrieve 200 results, continuing until k reaches half the input size (52,550). Issue: The accuracy, defined as the intersection of expected and actual results divided by the expected results, is only 0.1. This low accuracy persists across different tests. The search function does not consistently return the desired indexes, even though they are connected in the graph structure. For instance, if index 4 is an answer and is connected to index 10, the search sometimes fails to return index 4, regardless of how much I increase k.

Question: Why might the search function be failing to return the expected results, even when the nodes are connected in the graph? Any advice on how to improve the accuracy of my experiment would be appreciated.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected output from the search #3705

{{title}}

Replies: 1 comment

{{title}}

Summary

Platform

Reproduction instructions

Select a reply

Unexpected output from the search #3705

Jae0Kang Jul 24, 2024

Summary

Platform

Reproduction instructions

Replies: 1 comment

rafayaar Sep 19, 2024

Summary

Platform

Reproduction instructions

Jae0Kang
Jul 24, 2024

rafayaar
Sep 19, 2024