-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batched implementation of mauve #1
Conversation
Hi @wade3han, thanks for the fantastic work! The implementation looks correct to me from a quick glance. Would it be possible for you to make a couple of changes before we merge it in?
Thank you so much for your contribution -- we really appreciate it! |
I reflected your comments! It was honor to contribute to this wonderful work 😄 |
Hi @wade3han, It would be nice to see that the features from your batched implementation are close to the features from the original representation. If the features are close, the actual value of MAUVE will of course be close. Here is a concrete example on show to do that. Could you cover a case like this with your test? p_features_original = ... # Original code without batching, shape = (n, d)
p_features_batched = ... # Your new code with batching, shape = (n, d)
norm_of_difference = np.linalg.norm(p_features_original - p_features_batched, axis=1) # shape = (n,)
# ensure that new features are close to old features
assert norm_of_difference.max() < 1e-5 * np.linalg.norm(p_features_original, axis=1).max()
# repeat the same for q_features Thanks! |
I misunderstood what you asked for me at first! I tried to check if the features are close. However, I found a slight difference between features, and now I am trying to find the reason. So far, this is what I figured out: GPT output differs when the batch size changes. I also reported this phenomenon on HuggingFace. (huggingface/transformers#14743) There was also a similar issue before, yet it is not resolved yet. (huggingface/transformers#2401) Any ideas? Thanks 🙏 |
Hi @wade3han, thanks for the great detective work! This might just be a floating point error, which can be quite large for
Based on these two, we can decide whether the difference you see is purely due to numerical errors or if there is some other underlying bug. I am happy to accept the pull request if the problem is purely due to numerical errors. Thanks again for your fantastic work! |
Sorry for late response!
|
Hi @wade3han, Thank you for the wonderful detective work! Two minor (and last) changes before we merge the changes in:
Thanks again -- we really appreciate your work! |
I reflected your feedbacks :) Thanks! |
Merged! Thank you for the contribution! |
Here, I write a simplified version of batched MAUVE implementation, and there is some place to clean up the code yet.
If you think this implementation is precise, then it would be nice to replace original MAUVE implementation with batched version.
I tested with the sample data and it gives me 0.991679398536574 of mauve score.