Repeated Question in Arabic MMLU #1596

Yazeed7 · 2024-03-18T10:18:07Z

I was checking the English to Arabic translation of MMLU (ammlu), and I noticed that a question related to masking IPs was repeated 1,568 times in the Arabic dataset. Each time, it overwrites the original question, while still maintaining the original ground-truth.

Interestingly, the question is phrased slightly differently each time, below is a sample with frequency counts:

count, question
  309, كيف يتم اكتشاف تزوير عنوان IP؟
  217, كيف يتم اكتشاف تزييف عنوان IP؟
   95, كيف يتم الكشف عن تزوير عنوان IP؟
   85, كيف يتم الكشف عن تزييف عنوان IP؟
   66, كيف يتم كشف تزوير عنوان IP؟

This does not appear in the original English dataset

count, question
    8, Which of the following is true?
    6, Which of the following statements is false?
    6, Which of the following statements is correct?

    4, Which of the following statements is true?
    4, Which of the following statements is correct?

My guess is that this might be an artifact from the GPT translation, where it might have been prompted with the IP masking question as an example.

I found that 1,568 out of all 14,042 questions contained a substring related to IP from the list ['IP', 'بروتوكول الإنترنت','آي بي'].

I found two datasets on Hugging Face:

Hennara/ammlu: The dataset used in LM Evaluation Harness. Claims that it was translated by GPT-4.
FreedomIntelligence/MMLU_Arabic: Claims that it was translated by GPT-3.5-Turbo

However, both datasets seem identical.

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-03-18T11:57:05Z

@khalil-Hennara do you have any insights here?

It may be worth removing the task in light of this.

Yazeed7 mentioned this issue Mar 18, 2024

Repeated Question in Arabic MMLU FreedomIntelligence/AceGPT#14

Open

haileyschoelkopf mentioned this issue Jun 11, 2024

Remove AMMLU Due to Translation #1948

Merged

haileyschoelkopf closed this as completed Jun 11, 2024

This was referenced Jun 18, 2024

Added ArabicMMLU #1986

Closed

Added ArabicMMLU #1987

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated Question in Arabic MMLU #1596

Repeated Question in Arabic MMLU #1596

Yazeed7 commented Mar 18, 2024

haileyschoelkopf commented Mar 18, 2024

Repeated Question in Arabic MMLU #1596

Repeated Question in Arabic MMLU #1596

Comments

Yazeed7 commented Mar 18, 2024

haileyschoelkopf commented Mar 18, 2024