You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello!
I appreciate and inspire by your great work on Extractive Summarization.
So, i had run your script on your github and got Rouge following Rouge Scores:
For Transformer
On Paper : ROUGE-F(1/2/3/l): 43.25/20.24/39.63
When I ran, and the best score : ROUGE-F(1/2/3/l): 43.04/20.19/39.48
The values written here are the best scores I got from the model following the instructions and paper shows average of 3, which would be much lower in my case, too. This part is same as issue #100 .
But if i used the upper bound of each rouge score , for example:
Than if we average those rouge score ->
ROUGE-F1: (43.278+43.058+43.187)/3 = 43.1743333333
ROUGE-F2: (20.441 + 20.414 + 20.297)/3 = 20.384
ROUGE-FL: (39.709 + 39.662 + 39.541)/3 = 39.6373333333
This score seems to be more acceptable than the futher result 43.04/20.19/39.48 which is the single model checkpoint result without averaging.
This new result is doing average and also close to the score on paper.
So, i got several questions:
Q1: Can the train setting on readme get the same score from paper? If not, may i ask for the settings in order to reproduce a better score?
Q2: May i ask for the three model checkpoint which you had selected for testing phase?
Q3: Is the score should be calculate as i futher discuss(using upper bound from rouge score range) ?
Hope to get your response !
Thanks
The text was updated successfully, but these errors were encountered:
Hello!
I appreciate and inspire by your great work on Extractive Summarization.
So, i had run your script on your github and got Rouge following Rouge Scores:
For Transformer
On Paper : ROUGE-F(1/2/3/l): 43.25/20.24/39.63
When I ran, and the best score : ROUGE-F(1/2/3/l): 43.04/20.19/39.48
The values written here are the best scores I got from the model following the instructions and paper shows average of 3, which would be much lower in my case, too. This part is same as issue #100 .
What i have done for reproducing:
I have used one Nvidia 1080 GPU .
But if i used the upper bound of each rouge score , for example:
Than if we average those rouge score ->
ROUGE-F1: (43.278+43.058+43.187)/3 = 43.1743333333
ROUGE-F2: (20.441 + 20.414 + 20.297)/3 = 20.384
ROUGE-FL: (39.709 + 39.662 + 39.541)/3 = 39.6373333333
This score seems to be more acceptable than the futher result 43.04/20.19/39.48 which is the single model checkpoint result without averaging.
This new result is doing average and also close to the score on paper.
So, i got several questions:
Q1: Can the train setting on readme get the same score from paper? If not, may i ask for the settings in order to reproduce a better score?
Q2: May i ask for the three model checkpoint which you had selected for testing phase?
Q3: Is the score should be calculate as i futher discuss(using upper bound from rouge score range) ?
Hope to get your response !
Thanks
The text was updated successfully, but these errors were encountered: