Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS MARCO Document Retrieval Replication #137

Merged
merged 2 commits into from
Jan 27, 2021

Conversation

Dahlia-Chehata
Copy link
Contributor

Re-Ranking with monoT5

Some values are not exactly the same as documented.
No issues found in replication.

First Half

precision@1 0.24
recall@3    0.44
recall@50   0.84
recall@1000 0.88
mrr         0.39115
mrr@10      0.38205

Second Half

precision@1 0.24
recall@3    0.36
recall@50   0.76
recall@1000 0.88
mrr         0.32717
mrr@10      0.30778

@ronakice
Copy link
Member

ronakice commented Jan 5, 2021

Weird this is showing such large differences, were you able to isolate which PR could have possibly caused this? I'll look into it in a bit, although might take a while with some paper deadlines around. @richard3983 or @MXueguang perhaps one of you can check this out since you added this functionality.

@MXueguang
Copy link
Member

https://colab.research.google.com/drive/1g_cmhSnPUZmatu1eQezMLgRBZzLV71LR?usp=sharing

seems the results are inconsistent. this is my colab link

first half

2021-01-07 20:08:00 [INFO] evaluate_document_ranker: precision@1 0.2
2021-01-07 20:08:00 [INFO] evaluate_document_ranker: recall@3    0.56
2021-01-07 20:08:00 [INFO] evaluate_document_ranker: recall@50   0.84
2021-01-07 20:08:00 [INFO] evaluate_document_ranker: recall@1000 0.88
2021-01-07 20:08:00 [INFO] evaluate_document_ranker: mrr         0.38882
2021-01-07 20:08:00 [INFO] evaluate_document_ranker: mrr@10      0.38271

@yuxuan-ji
Copy link
Contributor

yuxuan-ji commented Jan 8, 2021

I think this might be related to #132? Seems like the weights aren't properly loaded here either. Not sure if that's expected

From the colab link above:

image

EDIT:
This happens even when I manually import the model (not using MonoT5.get_model) on transformers-4.0.0
image

EDIT:
From looking at a old replication I did, it seems like this didn't happen before:
image

EDIT:
Seems like it's not an issue?
huggingface/transformers#8933

EDIT:
From what I understand from huggingface/transformers#8518, it seems like it could cause very small difference in outputs.

The last replication done was on commit e815051, which used transformers==2.10.0, so I'd check if reverting transformers would fix this

@ronakice
Copy link
Member

ronakice commented Jan 8, 2021

Reverting probably will, the issue is we should not revert because T5 implementation has since been "fixed" in hgf (earlier there were more inconsistencies). The weird bit here is our MS MARCO Passage results are consistent, is it the case that the tokenization differences have somehow not been fixed for Document Ranking @MXueguang ?

@MXueguang
Copy link
Member

I am replicating on different commits, need to make sure if the issue comes from 4.0.0 upgrading. or locate the commit that this issue starts.

The T5 model is fixed, and based on the (huggingface/transformers#8933), (huggingface/transformers#8518), that warning should not be a big issue. If the issue is not 4.0.0 upgrading, since the Passage result never changed, the inconsistencies may come from the tokenization steps for doc in our implementation.

I think we need to locate the exact commit that causes this issue first. I tried to revert back to the commit of 4.0.0 upgrading and it has the issue. now I am running on the commit right before the upgrading see if that was good.

@MXueguang
Copy link
Member

MXueguang commented Jan 10, 2021

the inconsistency seems caused by the upgrading of transformers,
I compared the first 10 passages's (in first msmaro-doc) next_token_logits from decoder:

with transformers v4:

tensor([[-29.3127, -18.0408, -19.5026,  ..., -49.6962, -49.7317, -49.7931],
        [-29.6830, -18.3973, -19.9881,  ..., -49.9085, -49.9592, -50.0153],
        [-30.4825, -18.3005, -20.1569,  ..., -51.0060, -51.0359, -51.1103],
        ...,
        [-64.3875, -35.6707, -40.9387,  ..., -84.8546, -84.7865, -85.1493],
        [-65.1458, -36.1620, -41.0997,  ..., -85.3089, -85.2801, -85.6151],
        [-54.9698, -31.8845, -34.6949,  ..., -77.4517, -77.4881, -77.6714]],

with transformers v2:

tensor([[-29.3127, -18.0408, -19.5026,  ..., -49.6963, -49.7317, -49.7931],
        [-29.6830, -18.3973, -19.9881,  ..., -49.9085, -49.9592, -50.0154],
        [-30.4825, -18.3005, -20.1569,  ..., -51.0060, -51.0359, -51.1103],
        ...,
        [-64.3875, -35.6707, -40.9387,  ..., -84.8546, -84.7865, -85.1493],
        [-65.1458, -36.1620, -41.0997,  ..., -85.3089, -85.2801, -85.6150],
        [-54.9698, -31.8845, -34.6948,  ..., -77.4517, -77.4880, -77.6713]],
       device='cuda:0')

seems caused by upstream? @ronakice

@KaiSun314
Copy link
Collaborator

I tried replicating PyGaggle: Baselines on MS MARCO Document Retrieval but got different results. Here are the results I got:

fh
Screenshot from 2021-01-13 13-06-36

sh
Screenshot from 2021-01-14 12-02-39

@MXueguang
Copy link
Member

My replication result is identical to @KaiSun314 .
@Dahlia-Chehata are you replicated from current HEAD of master? and installed pygaggle in a new venv?

@Dahlia-Chehata
Copy link
Contributor Author

@MXueguang yes, let me try reproduce the results one more time tho.

@ronakice
Copy link
Member

Thanks @Dahlia-Chehata ! @KaiSun314 if you can make a separate PR now to say you've matched the new results it will be great :)

@ronakice ronakice merged commit 46bb71d into castorini:master Jan 27, 2021
@KaiSun314
Copy link
Collaborator

Yep, PR has been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants