-
Notifications
You must be signed in to change notification settings - Fork 15.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the cosine_similarity_top_k function performance #8151
Optimize the cosine_similarity_top_k function performance #8151
Conversation
The mypy isn't able to correctly detect the return type, the types should still be compatible. How should I fix the mypy warming? |
Manually verified that the types work
@misrasaurabh1 is attempting to deploy a commit to the LangChain Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
scores = score_array.flatten()[top_idxs].tolist() | ||
return ret_idxs, scores | ||
score_array[score_array < score_threshold] = 0 | ||
top_k = min(top_k or len(score_array), np.count_nonzero(score_array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will np.count_nonzero be strictly less than len(score_array)? in which case latter isn't needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that top_k can also be None
, so the or len(score_array)
is such that the other np.count_nonzero(score_array)
always wins min
. A None
in min
leads to an exception.
When top_k
is not None
, the len
is just ignored.
thanks @misrasaurabh1! |
Optimizing important numerical code and making it run faster. Performance went up by 1.48x (148%). Runtime went down from 138715us to 56020us Optimization explanation: The `cosine_similarity_top_k` function is where we made the most significant optimizations. Instead of sorting the entire score_array which needs considering all elements, `np.argpartition` is utilized to find the top_k largest scores indices, this operation has a time complexity of O(n), higher performance than sorting. Remember, `np.argpartition` doesn't guarantee the order of the values. So we need to use argsort() to get the indices that would sort our top-k values after partitioning, which is much more efficient because it only sorts the top-K elements, not the entire array. Then to get the row and column indices of sorted top_k scores in the original score array, we use `np.unravel_index`. This operation is more efficient and cleaner than a list comprehension. The code has been tested for correctness by running the following snippet on both the original function and the optimized function and averaged over 5 times. ``` def test_cosine_similarity_top_k_large_matrices(): X = np.random.rand(1000, 1000) Y = np.random.rand(1000, 1000) top_k = 100 score_threshold = 0.5 gc.disable() counter = time.perf_counter_ns() return_value = cosine_similarity_top_k(X, Y, top_k, score_threshold) duration = time.perf_counter_ns() - counter gc.enable() ``` @hwaking @hwchase17 @jerwelborn Unit tests pass, I also generated more regression tests which all passed.
Optimizing important numerical code and making it run faster. Performance went up by 1.48x (148%). Runtime went down from 138715us to 56020us Optimization explanation: The `cosine_similarity_top_k` function is where we made the most significant optimizations. Instead of sorting the entire score_array which needs considering all elements, `np.argpartition` is utilized to find the top_k largest scores indices, this operation has a time complexity of O(n), higher performance than sorting. Remember, `np.argpartition` doesn't guarantee the order of the values. So we need to use argsort() to get the indices that would sort our top-k values after partitioning, which is much more efficient because it only sorts the top-K elements, not the entire array. Then to get the row and column indices of sorted top_k scores in the original score array, we use `np.unravel_index`. This operation is more efficient and cleaner than a list comprehension. The code has been tested for correctness by running the following snippet on both the original function and the optimized function and averaged over 5 times. ``` def test_cosine_similarity_top_k_large_matrices(): X = np.random.rand(1000, 1000) Y = np.random.rand(1000, 1000) top_k = 100 score_threshold = 0.5 gc.disable() counter = time.perf_counter_ns() return_value = cosine_similarity_top_k(X, Y, top_k, score_threshold) duration = time.perf_counter_ns() - counter gc.enable() ``` @hwaking @hwchase17 @jerwelborn Unit tests pass, I also generated more regression tests which all passed.
Optimizing important numerical code and making it run faster.
Performance went up by 1.48x (148%). Runtime went down from 138715us to 56020us
Optimization explanation:
The
cosine_similarity_top_k
function is where we made the most significant optimizations.Instead of sorting the entire score_array which needs considering all elements,
np.argpartition
is utilized to find the top_k largest scores indices, this operation has a time complexity of O(n), higher performance than sorting. Remember,np.argpartition
doesn't guarantee the order of the values. So we need to use argsort() to get the indices that would sort our top-k values after partitioning, which is much more efficient because it only sorts the top-K elements, not the entire array. Then to get the row and column indices of sorted top_k scores in the original score array, we usenp.unravel_index
. This operation is more efficient and cleaner than a list comprehension.The code has been tested for correctness by running the following snippet on both the original function and the optimized function and averaged over 5 times.
@hwaking @hwchase17 @jerwelborn
Unit tests pass, I also generated more regression tests which all passed.