Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code #928

pruksmhc · 2019-10-08T14:51:27Z

There was a stray map_location that doesn't currently work with 2 GPUs. I also did some minor cleanup for the cuda parsing code.

pep8speaks · 2019-10-08T14:56:59Z

Hello @pruksmhc! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file jiant/__main__.py:

Line 1:24: W291 trailing whitespace

In the file jiant/utils/utils.py:

Line 39:66: W291 trailing whitespace
Line 40:49: W291 trailing whitespace
Line 41:54: W291 trailing whitespace
Line 42:69: W291 trailing whitespace
Line 45:15: W291 trailing whitespace
Line 47:58: W291 trailing whitespace
Line 48:25: W291 trailing whitespace
Line 59:46: W291 trailing whitespace
Line 63:38: W291 trailing whitespace
Line 68:45: W291 trailing whitespace
Line 406:71: W291 trailing whitespace
Line 407:72: W291 trailing whitespace
Line 408:17: W291 trailing whitespace
Line 409:68: W291 trailing whitespace

You can repair most issues by installing black and running: black -l 100 ./*. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.

Comment last updated at 2019-10-16 11:31:10 UTC

jiant/utils/utils.py

jiant/utils/options.py

W4ngatang

lgtm but left two comments

* add commonsenseqa task * add hellaswag task * dabug * from #928 * add special tokens to CommensenseQA input * format * revert irrelevant change * Typo fix * delete * rename stuff * Update qa.py * black

* Readme update for bert npi paper (#915) * Update README.md * minor fix * Typo fix * typo fix * Fixing index problem & minor pytorch_transformers_interface cleanup (#916) * update boundry func with offsets * update tasks that use indexes * remove outdated temporary fix * Prepare for 1.2.1 release. * QA-SRL (#716) * Initial QASRL * Updated pred writing for QASRL * Add validation shuffle to QASRL * Remove tqdm, modify class check in preds * qasrl rebase cleanup * Update QA-SRL to new repo changes * Removing src * QASRL Cleanup * updating to new model format * csv to tsv * QASRL update * Implementing Data Parallel (#873) * implemented data parallel * black style * Resolve last of merge marks * deleting irrelevant logs * adding new way to get attribute * updating to master * torch.Tensor -> torch.tensor for n_exs * black style * black style * Merge master * adapting other tasks to multiple GPU" * adding helper function for model attributes * adding get_model_attribute to main.py * deleting unecessary n_inbput for span_module * black style * revert comment change * fixing batch size keys * opt_params -> optimizer_params * Remove extraneous cahnges * changed n_exs to one-liner * adapting args.cuda to multi-GPU setting * adding use_cuda variable * Fixing parsing for case of args.cuda=subset * fixing tests * fixing nits, cleaning up parse_cuda function * additional nit * deleted extra space * Revert nit * refactoring into get_batch_size * removing use_cuda * adding options.py * removing use_cuda in tests, deleting extra changes * change cuda default * change parse_cuda_list_args import * jiant.options -> jiant.utils.options * change demo.conf cuda setting * fix bug -> make parse_cuda return int if only one gpu * fix bug * fixed tests * revert test_retokenize change * cleaning up code * adding addiitonal jiant.options * Separating cuda_device = int case with multiple cuda_device case * deleting remains of uses_cuda * remove time logging * remove use_cuda from evaluate * val_interval -> validation_interval * adding cuda comment to tutorial * fixed typo * replace correct_sent_indexing with non inplace version (#921) * replace correct_sent_indexing with non inplace version * Update modules.py * Update modules.py * Abductive NLI (aNLI) (#922) * anli * anli fix * Adding aNLI link, additional test/dev warning * SocialIQA (#924) * black style * adding SocialQA * black style * black style * fixed socialQA task * black style * Update citation * Nit * senteval * socialIQA naming * reverse unnecessary add * Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code (#928) * black style * remove * cleaning up code around cuda-parsing * adding defaulting to -1 if there is no cuda devices detected * fixing nits, throw error instead of log warning for cuda not found * Updating CoLA inference script (#931) * Adding Senteval Tasks (#926) * black style * adding initial senteval, senteval preprocessing script * black * adding senteval to registry * fixing bigram-shift * adding label_namespace arg, fixing the ksenteval tasks * revert extra changes * black style * change name -> senteval-probing * fixing senteval-probing tasks * renamed senteval -> sentevalprobing * delete extra imports * black style * renaming files and cleaning up preprocessing code * nit * black * deleting pdb * Senteval -> SE shorthand * fixing code style * Speed up retokenization (#935) * black style * pre-loading tokenizer before retokenization function * Scitail (#943) * scitail * Scitail * Scitail * update Scitail, removed config * update Scitail, removed config * Add corrected data stastistics (#941) Thanks to #936, we've discovered errors in our data statistics reporting in the edge probing paper. This table contains the corrected values. As there is more space here, the full (unrounded) values are reported instead. This was generated by a script that read the stats.tsv file and the diff vs. the paper should match my comment on the issue yesterday. * CommonsenseQA+hellaswag (#942) * add commonsenseqa task * add hellaswag task * dabug * from #928 * add special tokens to CommensenseQA input * format * revert irrelevant change * Typo fix * delete * rename stuff * Update qa.py * black * fix name (#945) * CCG update (#948) * generalize ccg to other transformer models * debug * I don't know who broke this at what time, but let's just fix it here now * Fixing senteval-probing preprocessing (#951) * Copying configs from superglue * adding senteval probing config commands * adding meta-script for transfer and probing exps * Adding meta bash script fixed * give_permissions script * small fix transfer_analysis.sh (#946) model_*.th might indicate several models; fixed to model_*.best.th * lr_patience fix * target_task training -> pretrain training * adding edgeprobing configs and command * adding edge probing conf * fix load_target_train bug * add hyperparameter sweeping * val_interval change * adding sweep function * Task specific val_intervals * add reload_vocab to hyperparameter sweep * adding batch_size specification * fixing senteval-word-content * fixing senteval preprocess script * revert extra delete * remove extra files * black format * black formatting trainer.py * remove load_data() * removing extra changes * Adding tokenizer alignment function (#953) * Copying configs from superglue * adding senteval probing config commands * adding meta-script for transfer and probing exps * Adding meta bash script fixed * give_permissions script * small fix transfer_analysis.sh (#946) model_*.th might indicate several models; fixed to model_*.best.th * lr_patience fix * target_task training -> pretrain training * adding edgeprobing configs and command * adding edge probing conf * fix load_target_train bug * add hyperparameter sweeping * val_interval change * adding sweep function * Task specific val_intervals * add reload_vocab to hyperparameter sweep * adding batch_size specification * fixing senteval-word-content * fixing senteval preprocess script * revert extra delete * remove extra files * black format * black formatting trainer.py * remove load_data() * removing extra changes * adding alignment mapping function * fix comment nits * comment nit * adding example of token_alignment * Function words probing (#949) * add nli prob task template * Create acceptablity_probing.py * specify nli probing tasks * port acceptablity probing tasks * add directory name * debug * debug * format * black * revert unintended change * CosmosQA (#952) * misc run scripts * cosmosqa * cosmosqa * cosmosqa * cosmosqa run * cleaned up repo * cleaned up repo * reformatted * qqp fix (#956) * QAMR + QA-SRL Update (#932) * qamr * tokenization * temp qamr * qamr * QASRL * Undo slicing * quick hack to bypass bad qasrl examples * f1 em fix * tokenization fixes * average * New tokenization aligner * update example counts * Cleanup * Typography * Set _unk_id in Roberta module (#959) Currently the `_unk_id` for Roberta is not set correctly, which triggers the assertion error on line 118. * Fixing load_target_train_checkpoint with mixing setting (#960) * adding loading for mix * black style * update pytorch and numpy version requirements (#965) * CCG update (#955) * generalize ccg to other transformer models * debug * I don't know who broke this at what time, but let's just fix it here now * ccg lazy iterator * debug * clean up * debug * debug ccg, minor cleanup * add adversarial_nli tasks (#966) * Update README.md * Citation fix

…rsing related code (#928) * black style * remove * cleaning up code around cuda-parsing * adding defaulting to -1 if there is no cuda devices detected * fixing nits, throw error instead of log warning for cuda not found

* add commonsenseqa task * add hellaswag task * dabug * from #928 * add special tokens to CommensenseQA input * format * revert irrelevant change * Typo fix * delete * rename stuff * Update qa.py * black

pruksmhc added 7 commits July 8, 2019 23:15

black style

74a4d03

Merge branch 'master' of https://github.com/nyu-mll/jiant

055972c

Merge branch 'master' of https://github.com/nyu-mll/jiant

cd03448

Merge branch 'master' of https://github.com/nyu-mll/jiant

123fb1c

Merge branch 'master' of https://github.com/nyu-mll/jiant

721bff6

Merge branch 'master' of https://github.com/nyu-mll/jiant

a9dee4d

remove

3547c9c

pruksmhc requested review from iftenney, sleepinyourhat and W4ngatang as code owners October 8, 2019 14:51

cleaning up code around cuda-parsing

132a3dc

pruksmhc changed the title ~~Fixing bug with restoring checkpoint with two gpus~~ Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code Oct 8, 2019

HaokunLiu added a commit that referenced this pull request Oct 8, 2019

from #928

7f4c3c0

Yada Pruksachatkun and others added 2 commits October 14, 2019 18:52

Merge branch 'master' into fix_load_with_two_gpus

77cdd12

adding defaulting to -1 if there is no cuda devices detected

5e8596b

W4ngatang reviewed Oct 15, 2019

View reviewed changes

jiant/utils/utils.py Outdated Show resolved Hide resolved

W4ngatang reviewed Oct 15, 2019

View reviewed changes

jiant/utils/options.py Outdated Show resolved Hide resolved

W4ngatang approved these changes Oct 15, 2019

View reviewed changes

fixing nits, throw error instead of log warning for cuda not found

cb3585c

pruksmhc merged commit f0ef3f7 into master Oct 16, 2019

jeswan mentioned this pull request Sep 17, 2020

[CLOSED] Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code nyu-mll/jiant-v1-legacy#928

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

jeswan deleted the fix_load_with_two_gpus branch September 22, 2020 03:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code #928

Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code #928

pruksmhc commented Oct 8, 2019 •

edited

Loading

pep8speaks commented Oct 8, 2019 •

edited

Loading

W4ngatang left a comment

Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code #928

Fixing bug with restoring checkpoint with two gpus + cleaning CUDA parsing related code #928

Conversation

pruksmhc commented Oct 8, 2019 • edited Loading

pep8speaks commented Oct 8, 2019 • edited Loading

Comment last updated at 2019-10-16 11:31:10 UTC

W4ngatang left a comment

Choose a reason for hiding this comment

pruksmhc commented Oct 8, 2019 •

edited

Loading

pep8speaks commented Oct 8, 2019 •

edited

Loading