-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev data results. #12
Comments
Regarding the trend on different generalization splits, one thing I have observed is that it might be sensitive to the number of training epochs, i.e., different training epochs might lead to similar overall performance while the trend on different split might be different. Actually, I think this is a quite intriguing problem to look into; you can easily boost the performance by properly ensembling models trained with different epochs, and finding a principled way to do this may provide valuable insights to the community. The past weeks have been quite hectic. I am also investigating some other possibilities here like changes I made during code cleaning (see my response in #10). |
@entslscheia Thankyou for the response. |
I was also trying to reproduce the results as well and would appreciate more information about your training setup. Could you share some details about the specific model and parameters, such as the number of GPUs and total epochs? Also, how was evaluation performed on the dev set? (Was it the direct output with the I recently trained a BERT-base model with 1 epoch on grailqa, but the
This looks strange to me, and I'm having trouble identifying the problem. |
@AiRyunn Yes this happened with me also. Metrics reported by train module are very different from actual numbers. Though I dont know the exact reason for this but you can use predict module to get the actual numbers (the ones I reported above). |
@AiRyunn @PrayushiFaldu This is because we assume perfect entity linking during training |
I trained model on GrailQA and tested on dev data. Here is the (EM) result:
Overall : 75.9
IID : 81.8
Comp : 68.82
Zshot : 76.29
Though overall number is same as mentioned in paper but IID, Comp and Zshot numbers seems to have different trend than test data. Can you please confirm if the above numbers are correct?
Also Can you please elaborate on how "grail_combined_tiara.json" was created?
The text was updated successfully, but these errors were encountered: