The reported scores of GENIE are not fair #57

BaohaoLiao · 2023-03-13T13:31:29Z

I have a question about your evaluation.

From your paper: "In the inference process, we randomly sample 10 Gaussian noises for iteration denoising, and use the highest score as the final generated result." I also check your file https://github.com/microsoft/ProphetNet/blob/master/GENIE/integration/eval_split.py.

For each source sentence, you generate 10 hypotheses. And then you compute the Rouge score between each hypothesis and target sentence. Finally, you take the hypothesis with the best score as the final generation. You do this for each source sentence and combine all hypotheses with the best score as the whole generation file.

My question is: is it a fair or standard way for generation? For inference, the target sentences are blind. We can't use them as a hint for generation.

lzh0525 · 2023-03-20T07:24:38Z

Thank you for your interest in our work.

The results in the main table are not fair enough to compare, which is also mentioned in the Chapter 4.5. Strictly speaking, there is currently no very fair and rigorous method to compare AR and diffusion. However, these experiments can reflect the potential of diffusion models to generate comparable AR effects, while also reflecting general trends.

In fact, we recognize this problem and propose an fair evaluation method in this article, using LLM to evaluate 10 samples generated by AR and 10 samples generated by GENIE. From the results shown in Table 4 and Table 5, it can be seen that the overall quality of the diffusion model is slightly lower than AR, but diffusion model can generate more diverse samples, which is also very important in practical applications of text generation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The reported scores of GENIE are not fair #57

The reported scores of GENIE are not fair #57

BaohaoLiao commented Mar 13, 2023

lzh0525 commented Mar 20, 2023 •

edited

Loading

The reported scores of GENIE are not fair #57

The reported scores of GENIE are not fair #57

Comments

BaohaoLiao commented Mar 13, 2023

lzh0525 commented Mar 20, 2023 • edited Loading

lzh0525 commented Mar 20, 2023 •

edited

Loading