-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Dataparallel Metric Calculation #992
Conversation
Hello @pruksmhc! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
You can repair most issues by installing black and running: Comment last updated at 2020-02-19 18:50:10 UTC |
@pyeres Mind taking the lead on this? |
…taparallel_metric_calculation Conflicts: jiant/trainer.py jiant/utils/utils.py
…com/nyu-mll/jiant into fix_dataparallel_metric_calculation
Hey @pruksmhc, fyi, running ReCoRD like this:
It looks like there's an exception related to DataParallel:
|
Hm, let me debug this, thanks for catching! |
Alright, I tested it on ReCoRD and STS-B as well as RTE now. @pyeres ready for another look |
If I understand these changes correctly, in Instead of adding special keys to the |
Separate from the implementation details above, because multi-GPU support touches many code paths (and address a bug), with your final PR can you please show a performance comparison of multi-GPU vs single-GPU on some task(s). For PR #990 @HaokunLiu ran a performance comparison across a a set of common tasks — running these should make for a good comparison (what Haokun already reported in #990 vs. including your changes and using a single-gpu vs. including your changes and using multi-gpu). If you have trouble getting cluster time, just pass me a script and I can run some of these experiments. |
Makes sense! |
Hm, for some reason, I'm getting accuracy of 0.56 using 1 GPU in jiant master, but 0.527 with 1 GPU in this branch. Debugging |
…com/nyu-mll/jiant into fix_dataparallel_metric_calculation
Ready! @pyeres |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the open comment in the conversation thread for a small change requests, and please let me know when you're ready for a final look and approval.
Heads up: for CCG it looks like there's an issue with the calculation of training and validation accuracy (accuracy is stuck at zero while training and validation loss is decreasing). If it would be helpful I'm happy to debug this further, please let me know. |
These changes have been successfully benchmarked on tasks from GLUE, SuperGLUE and the benchmarks provided with PR #990 in both single-GPU and multi-GPU mode:
*performance on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark tests for this branch look good. Thanks for making this work!
Thanks for running the extensive benchmark tests! |
* moving update_metrics to outside scope of dataparallel * fixing micro_avg calculation * undo debugging * Fixing tests, moving update_metrics out of other tasks * remove extraneous change * fix multiple choice dataparallel forward * adding update_metrics abstraction * delete update_metrics_ notation * spelling check * black formatting * fixing tests * Adding batch_size constraints to multi-GPU setting * adding documentation * adding batch size test * black correct version * Fixing batch size assertion * generalize batch size assertion for more than 2 GPU setting * reducing label loops in code * fixing span forward * Fixing span prediction forward for multi-GPU * fix commonsenseQA forward * adding function documentation * resolving nits, fixing seq_gen forward * remove nit * fixing batch_size assert and SpanPrediction task * Remove debugging * Fix batch size mismatch multi-GPU test * Fix order of assert checking for batch size mismatch * reverse batch size checking * fix SpanPrediction update_metrics for single-GPU * fix update_diagnostic_metric and ccg acc Co-authored-by: Check your git settings! <chris@chris-laptop>
Addressing #947
The problem is there is a race condition with dataparallelism to do with scorers and variable updating where metrics can exceed 1.0. To fix this, this PR moves task.update_metrics() in trainer.py as opposed to the various forward functions in jiant/models.py.