Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve PQ computing distances #3150

Merged
merged 7 commits into from
Nov 22, 2024
Merged

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Nov 21, 2024

this is done by make the compiler know the size of distance table slice

5242880,L2,PQ=96,DIM=1536
                        time:   [148.44 ms 149.47 ms 150.50 ms]
                        change: [-53.716% -53.486% -53.252%] (p = 0.00 < 0.10)
                        Performance has improved.

5242880,Cosine,PQ=96,DIM=1536
                        time:   [191.84 ms 192.21 ms 192.75 ms]
                        change: [-46.738% -46.621% -46.461%] (p = 0.00 < 0.10)
                        Performance has improved.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.94%. Comparing base (1d3b204) to head (ca32b66).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3150      +/-   ##
==========================================
- Coverage   77.95%   77.94%   -0.01%     
==========================================
  Files         242      242              
  Lines       81904    81910       +6     
  Branches    81904    81910       +6     
==========================================
- Hits        63848    63846       -2     
- Misses      14890    14892       +2     
- Partials     3166     3172       +6     
Flag Coverage Δ
unittests 77.94% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@BubbleCal BubbleCal marked this pull request as ready for review November 21, 2024 04:05
FixedSizeListArray::try_new_from_values(codebook, DIM as i32).unwrap(),
DistanceType::Cosine,
);
c.bench_function(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just run a loop over, i.e. for dt in [DistanceType::L2, Cosine, Dot]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


const PQ: usize = 96;
const DIM: usize = 1536;
const TOTAL: usize = 5 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets all use more realistic numbers?, i.e., 8K - 16K, so that we can measure other piece i.e., table construction, and code transpose more realistically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, done

);
c.bench_function(
format!("{},L2,4bitPQ={},DIM={}", TOTAL, PQ, DIM).as_str(),
|b| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very curios what is this number compares to the 8 bit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's slightly slower than 8bit

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal requested a review from eddyxu November 21, 2024 05:17
@@ -80,9 +80,11 @@ pub(super) fn compute_l2_distance(
// so code[i * num_vectors + j] is the code of i-th sub-vector of the j-th vector.
let num_vectors = code.len() / num_sub_vectors;
let mut distances = vec![0.0_f32; num_vectors];
let num_centroids = 2_usize.pow(num_bits);
// it must be 8
const NUM_CENTROIDS: usize = 2_usize.pow(8);
Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like this would be cleaner if we refactor compute_l2_distance to a trait ProductQuantizedL2Distance<const BITS: usize>. We can probably avoid branching in pq dist calculator this way too. Just a ticket for now is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was thinking of this way as well, but 4bit PQ impl is diff from 8bit, so still 2 methods

@chebbyChefNEQ
Copy link
Contributor

another qq: would SQ benefit from the same optimization? Let's try it?

@BubbleCal BubbleCal merged commit d79e870 into lancedb:main Nov 22, 2024
26 checks passed
@BubbleCal
Copy link
Contributor Author

another qq: would SQ benefit from the same optimization? Let's try it?

no, distance computing for SQ is not with the same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants