Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev data results. #12

Open
PrayushiFaldu opened this issue Aug 20, 2023 · 5 comments
Open

Dev data results. #12

PrayushiFaldu opened this issue Aug 20, 2023 · 5 comments

Comments

@PrayushiFaldu
Copy link

I trained model on GrailQA and tested on dev data. Here is the (EM) result:
Overall : 75.9
IID : 81.8
Comp : 68.82
Zshot : 76.29

Though overall number is same as mentioned in paper but IID, Comp and Zshot numbers seems to have different trend than test data. Can you please confirm if the above numbers are correct?

Also Can you please elaborate on how "grail_combined_tiara.json" was created?

@entslscheia
Copy link
Collaborator

entslscheia commented Aug 20, 2023

grail_combined_tiara.json is just the entity linking results from the TIARA paper. They provided entity linking results for the dev set and test set separately, and here I combined these two files into one for convenience.

Regarding the trend on different generalization splits, one thing I have observed is that it might be sensitive to the number of training epochs, i.e., different training epochs might lead to similar overall performance while the trend on different split might be different. Actually, I think this is a quite intriguing problem to look into; you can easily boost the performance by properly ensembling models trained with different epochs, and finding a principled way to do this may provide valuable insights to the community.

The past weeks have been quite hectic. I am also investigating some other possibilities here like changes I made during code cleaning (see my response in #10).

@PrayushiFaldu
Copy link
Author

@entslscheia Thankyou for the response.
I can see literals also in grail_combined_tiara.json. Does TIARA also give literals output?

@AiRyunn
Copy link

AiRyunn commented Aug 22, 2023

I was also trying to reproduce the results as well and would appreciate more information about your training setup. Could you share some details about the specific model and parameters, such as the number of GPUs and total epochs? Also, how was evaluation performed on the dev set? (Was it the direct output with the train module?) Thank you!

I recently trained a BERT-base model with 1 epoch on grailqa, but the metrics.json file shows a higher EM than what's reported in the paper:

  "validation_EM": 0.840828402366864,
  "validation_EM_iid": 0.8548009367681498,
  "validation_EM_comp": 0.8213333333333334,
  "validation_EM_zero": 0.8423423423423423,

This looks strange to me, and I'm having trouble identifying the problem.

@PrayushiFaldu
Copy link
Author

@AiRyunn Yes this happened with me also. Metrics reported by train module are very different from actual numbers. Though I dont know the exact reason for this but you can use predict module to get the actual numbers (the ones I reported above).

@entslscheia
Copy link
Collaborator

@AiRyunn @PrayushiFaldu This is because we assume perfect entity linking during training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants