-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MS MARCO Document Retrieval Replication #137
Conversation
Weird this is showing such large differences, were you able to isolate which PR could have possibly caused this? I'll look into it in a bit, although might take a while with some paper deadlines around. @richard3983 or @MXueguang perhaps one of you can check this out since you added this functionality. |
https://colab.research.google.com/drive/1g_cmhSnPUZmatu1eQezMLgRBZzLV71LR?usp=sharing seems the results are inconsistent. this is my colab link
|
I think this might be related to #132? Seems like the weights aren't properly loaded here either. Not sure if that's expected From the colab link above: EDIT: EDIT: EDIT: EDIT: The last replication done was on commit e815051, which used |
Reverting probably will, the issue is we should not revert because T5 implementation has since been "fixed" in hgf (earlier there were more inconsistencies). The weird bit here is our MS MARCO Passage results are consistent, is it the case that the tokenization differences have somehow not been fixed for Document Ranking @MXueguang ? |
I am replicating on different commits, need to make sure if the issue comes from 4.0.0 upgrading. or locate the commit that this issue starts. The T5 model is fixed, and based on the (huggingface/transformers#8933), (huggingface/transformers#8518), that warning should not be a big issue. If the issue is not 4.0.0 upgrading, since the Passage result never changed, the inconsistencies may come from the tokenization steps for doc in our implementation. I think we need to locate the exact commit that causes this issue first. I tried to revert back to the commit of 4.0.0 upgrading and it has the issue. now I am running on the commit right before the upgrading see if that was good. |
the inconsistency seems caused by the upgrading of transformers, with transformers v4:
with transformers v2:
seems caused by upstream? @ronakice |
My replication result is identical to @KaiSun314 . |
@MXueguang yes, let me try reproduce the results one more time tho. |
Thanks @Dahlia-Chehata ! @KaiSun314 if you can make a separate PR now to say you've matched the new results it will be great :) |
Yep, PR has been created |
Re-Ranking with monoT5
Some values are not exactly the same as documented.
No issues found in replication.
First Half
Second Half