Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with converting the German test set #2

Open
nvanva opened this issue Aug 9, 2024 · 1 comment
Open

problems with converting the German test set #2

nvanva opened this issue Aug 9, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@nvanva
Copy link
Member

nvanva commented Aug 9, 2024

Hi! We experience problems with the script that converts the German test set. We couldn't reproduce our results, and after doing lots of debugging found that the outputs of the official conversion script surprise.py depends on the version of Pandas (even the number of examples is different!). To make sure run this:

for version in 1.4.4 2.2.0; do pip install pandas==$version; python surprise.py --dwug_path dwug_de_sense/; mv axolotl.test.surprise.gold.tsv pandas-$version-axolotl.test.surprise.gold.tsv; done
wc *tsv

This may indicate using some buggy-prone calls to pandas. Of course, we can stick to the version from requirements.txt, but better avoid such constructions or check that they do not lead to other undesirable effects.

@akutuzov
Copy link
Member

akutuzov commented Aug 9, 2024

I did not look deep into the issue, but Pandas is famous for fluctuating data loading behavior (especially between major versions, like 1 and 2).
That's why we specify the version we use in requirements.txt.

@akutuzov akutuzov assigned akutuzov and MariaFjodorowa and unassigned akutuzov Aug 15, 2024
@akutuzov akutuzov added the question Further information is requested label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants