-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: faster IVF & PQ #328
refactor: faster IVF & PQ #328
Conversation
crates/service/src/utils/cells.rs
Outdated
@@ -23,4 +23,8 @@ impl<T: ?Sized> SyncUnsafeCell<T> { | |||
pub fn get_mut(&mut self) -> &mut T { | |||
self.value.get_mut() | |||
} | |||
|
|||
pub fn get_ref(&self) -> &T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not expose this function.
if dis * dis < weight[j] { | ||
weight[j] = dis * dis; | ||
unsafe { | ||
(&mut *lowerbound.get())[(j, i)] = dis; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's absolutely unsound. Please use a structure that is like Square<Atomic<F32>>
.
if o.compare_exchange(next, i, Release, Relaxed).is_ok() { | ||
break; | ||
unsafe { | ||
(&mut *idx.get())[i as usize] = result.1 as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsound. Use idx: Vec<AtomicUsize>
instead.
The multi-threaded parts are guaranteed to have no data races in logic. Do I still have to remove all these unsafe blocks? |
013bf2d
to
7fef203
Compare
Performance BenchmarkDataset: gist-960-euclidean-l2, n = 1,000,000, d = 960
* Indices with 'opt' are optimized indices in this PR, num_threads in index build stage is set to 96 |
PTAL @usamoi |
Thanks. Can you fix the lint error in CI? Also what's the PQ ratio in your benchmark? |
Why does num_threads=96 seem to have little acceleration? Is it due to the kmeans computation not paralleled? How did you configure the num_threads? |
The CI error is cause by the unused struct SyncUnsafeCell which is not written by me. I cannot decide whether to remove it. The PQ ratio is x4. |
It seems to me that a lot of time is spent on I/O so the speed up for computation cannot contribute much to overall performance. |
Doesn't make sense. 1M 960-dim vectors should be easy to fit in the memory. Also can you try benchmarking with higher PQ ratio for the results? |
You can just remove unused code. |
The index build time includes time to read the whole dataset from disk and time to save the whole index to disk, so it's not about whether vectors can fit in memory. I will try for higher PQ ratio. |
It still doesn't make sense. Normal disk can achieve >500MB/s throughput for sequential write or read. 1M 960dim float is less than 4GB in space, which only accounts for less than 10s to read or write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you add some comments in the Kmeans and ivf part? Give a simple explanation on each part. And feel free to turn this PR in to ready mode, so we can start review
Update Benchmark
|
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
fix PQ training for IVF residuals Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
fix PQ training for IVF residuals Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
PTAL @usamoi |
let width = self.dims.div_ceil(self.ratio); | ||
let s = i as usize * width as usize; | ||
let e = (i + 1) as usize * width as usize; | ||
&self.codes[s..e] | ||
} | ||
|
||
pub fn set_codes(&mut self, codes: MmapArray<u8>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not expose it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, PTAL
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
…ecto.rs into refactor/faster-ivf-pq
Please fix the CI check |
The IVFPQ uses table lookup for distance computation in the search stage now. Currently only supports L2 distance. Will add support for IP distance next. Cosine distance will remain using the original distance computation method. The following table is the performance benchmark.
|
During IVFPQ search with by_residual, we compute
At search time, the tables for term 2 and term 3 are added up. This is faster when the length of the lists is > ksub * M. ref: faiss |
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
…ecto.rs into refactor/faster-ivf-pq
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Is it ready for merging? |
It is ready for merging now |
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Signed-off-by: whateveraname <12011319@mail.sustech.edu.cn>
Work done
TODO