Dev data results. #12

PrayushiFaldu · 2023-08-20T05:11:55Z

I trained model on GrailQA and tested on dev data. Here is the (EM) result:
Overall : 75.9
IID : 81.8
Comp : 68.82
Zshot : 76.29

Though overall number is same as mentioned in paper but IID, Comp and Zshot numbers seems to have different trend than test data. Can you please confirm if the above numbers are correct?

Also Can you please elaborate on how "grail_combined_tiara.json" was created?

entslscheia · 2023-08-20T10:48:50Z

grail_combined_tiara.json is just the entity linking results from the TIARA paper. They provided entity linking results for the dev set and test set separately, and here I combined these two files into one for convenience.

Regarding the trend on different generalization splits, one thing I have observed is that it might be sensitive to the number of training epochs, i.e., different training epochs might lead to similar overall performance while the trend on different split might be different. Actually, I think this is a quite intriguing problem to look into; you can easily boost the performance by properly ensembling models trained with different epochs, and finding a principled way to do this may provide valuable insights to the community.

The past weeks have been quite hectic. I am also investigating some other possibilities here like changes I made during code cleaning (see my response in #10).

PrayushiFaldu · 2023-08-20T12:12:35Z

@entslscheia Thankyou for the response.
I can see literals also in grail_combined_tiara.json. Does TIARA also give literals output?

AiRyunn · 2023-08-22T23:46:38Z

I was also trying to reproduce the results as well and would appreciate more information about your training setup. Could you share some details about the specific model and parameters, such as the number of GPUs and total epochs? Also, how was evaluation performed on the dev set? (Was it the direct output with the train module?) Thank you!

I recently trained a BERT-base model with 1 epoch on grailqa, but the metrics.json file shows a higher EM than what's reported in the paper:

  "validation_EM": 0.840828402366864,
  "validation_EM_iid": 0.8548009367681498,
  "validation_EM_comp": 0.8213333333333334,
  "validation_EM_zero": 0.8423423423423423,

This looks strange to me, and I'm having trouble identifying the problem.

PrayushiFaldu · 2023-08-23T04:42:34Z

@AiRyunn Yes this happened with me also. Metrics reported by train module are very different from actual numbers. Though I dont know the exact reason for this but you can use predict module to get the actual numbers (the ones I reported above).

entslscheia · 2023-08-23T12:07:38Z

@AiRyunn @PrayushiFaldu This is because we assume perfect entity linking during training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev data results. #12

Dev data results. #12

PrayushiFaldu commented Aug 20, 2023

entslscheia commented Aug 20, 2023 •

edited

Loading

PrayushiFaldu commented Aug 20, 2023

AiRyunn commented Aug 22, 2023

PrayushiFaldu commented Aug 23, 2023

entslscheia commented Aug 23, 2023

Dev data results. #12

Dev data results. #12

Comments

PrayushiFaldu commented Aug 20, 2023

entslscheia commented Aug 20, 2023 • edited Loading

PrayushiFaldu commented Aug 20, 2023

AiRyunn commented Aug 22, 2023

PrayushiFaldu commented Aug 23, 2023

entslscheia commented Aug 23, 2023

entslscheia commented Aug 20, 2023 •

edited

Loading