-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PABEE & PL CI failure #6433
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6433 +/- ##
==========================================
- Coverage 79.89% 77.37% -2.52%
==========================================
Files 153 153
Lines 27902 27902
==========================================
- Hits 22291 21588 -703
- Misses 5611 6314 +703
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Ah, PL test still failing! |
@LysandreJik Don't worry. It seems I mistype some parameter name |
Oops! @sshleifer could you have a look at the PL example? I've tried tweaking the parameters but it doesn't seem to work. |
@stas00 Can you take a look? @sshleifer is on a vacation. Lots of thanks! |
Yes, of course, I will be able to investigate in a few hours. |
(we are talking about I'm able to reproduce the problem of low
The original pre-PR changes gives acc/f1=1.0 on my machine. If you have a look at #6034 I tried various hparams to no avail, it was working fine on my machine, but CI kept on failing. It was just very counterproductive trying to experiment w/o being able to reproduce it locally, so after some time I gave up. So the test is not ideal, but at least it's testing that it runs. @sshleifer said he was able to match the CI's low accuracy on his hardware (pre this PR). |
Thank you for explaining what is happening, @JetRunner I have no perms to push, so try to use this:
I get acc/f1 of 1.0 with this config, the key was more So you uncovered that these tests are very unreliable as they don't clean up after themselves and re-runs give invalid results. It's enough to get one run that succeeded, all the subsequent test re-runs will succeed at the moment. At the very least pl_glue needs to support That explains why I couldn't get CI to work, as mine probably wasn't working all along, other than succeeding once and then always reporting the old success. So I was getting false positives. Should transformers warn a user when a pre-existing dir filled with outdated data is found or plainly refuse to run? |
@stas00 this perm also outputs |
PABEE's bug is fixed in #6453. The reproducible low acc is still existing for PL. |
#6421