CI test failures after BB2 merge #3823

mojtaba-komeili · 2021-07-20T15:42:06Z

Patch description
Resolving the issues with the failing unit tests on CircleCI that were introduced after merging BB2 and its dependent projects: Personal Knowledge, and Wizard of Internet.

klshuster · 2021-07-20T15:53:43Z

are we prepared to require torch 1.8 or higher?

mojtaba-komeili · 2021-07-20T15:57:53Z

are we prepared to require torch 1.8 or higher?

Probably not, I was just checking with another version because of Roller's note about the version 1.7 version.

klshuster · 2021-07-20T17:04:02Z

requirements.txt

@@ -50,4 +49,4 @@ Unidecode==1.1.1
 urllib3>=1.26.5
 websocket-client==0.56.0
 websocket-server==0.4
-jsonlines==1.2.0
+jsonlines==1.2.0


can we put this back in alphabetical order?

Pseudo-alphabetical order lol. there are some intentional non-monotonic places

ok yeah i think we might've broken some stuff with the order wrong... sorry @mojtaba-komeili could you just revert to what it was before?

stephenroller · 2021-07-20T19:28:44Z

tests/nightly/gpu/test_bb2.py



 @testing_utils.skipUnlessGPU
 @unittest.skipIf(LOCAL, "Skipping Test because its slow and mem intensive")
+@unittest.skipUnless(TRANSFORMER_INSTALLED, "Needs transformer, not installed.")


this is a bit funny because our GPU tests should always have transformers?

yeah im positive this should not be necessary

stephenroller · 2021-07-20T19:30:14Z

The policy is usually past two versions of pytorch officially... So yeah requiring 1.8+ is viable now, but afaik, there's nothing we have that's 1.7 incompatible... If we're going to do that, I'd suggest a different PR to carve out 1.8/1.9 tests, and then rebase this

klshuster · 2021-07-20T19:34:17Z

tests/nightly/gpu/test_bb2.py

+if TRANSFORMER_INSTALLED:
+    SEARCH_QUERY_MODEL = ZOO_MEMORY_DECODER
+    PERSONA_SUMMARY_MODEL = ZOO_QUERY_GENERATOR
+    ZOO_BB2 = 'zoo:blenderbot2/blenderbot2_400M/model'
+    ZOO_BB2_3B = 'zoo:blenderbot2/blenderbot2_3B/model'
+    SEARCH_SERVER = '<SERVER_API>'
+    common_opt = {
+        'model': 'projects.blenderbot2.agents.blenderbot2:BlenderBot2RagAgent',
+        # rag args
+        'init_opt': 'arch/bart_large',
+        'generation_model': 'bart',
+        'retriever_debug_index': 'compressed',
+        'label_truncate': 128,
+        'text_truncate': 512,
+        'batchsize': 4,
+        'fp16': True,
+        'model_parallel': True,
+        # train args
+        'task': 'convai2,wizard_of_wikipedia',
+        'num_examples': 8,
+    }
+
+    def _test_bb2_rag(retrieval_method: KnowledgeAccessMethod, **kwargs):
+        opt = copy.deepcopy(common_opt)
+        opt['knowledge_access_method'] = retrieval_method.value
+        opt.update(dict(kwargs))
+        print(' '.join([f'--{k} {v}' for k, v in opt.items()]))
+        testing_utils.eval_model(opt, skip_test=True)
+        torch.cuda.empty_cache()
+
+    def _test_bb2_fid(retrieval_method: KnowledgeAccessMethod, **kwargs):
+        opt = copy.deepcopy(common_opt)
+        opt['model'] = 'projects.blenderbot2.agents.blenderbot2:BlenderBot2FidAgent'
+        opt['knowledge_access_method'] = retrieval_method.value
+        opt.update(dict(kwargs))
+        testing_utils.eval_model(opt, skip_test=True)
+        torch.cuda.empty_cache()


why is this protected?

Because if we can't import projects.blenderbot2 we don't have constants such as ZOO_MEMORY_DECODER so we need to skip everything here.

Actually does CircleCI even run tests in nightly/gpu if it is not running the GPU tests (that have transformer)? I tried to debug running local pytest with transformer not installed, and they failed. But now thinking maybe it doesn't work like that.

this marker runs the nightly gpu tests: https://github.com/facebookresearch/ParlAI/blob/master/.circleci/config.yml#L388

you can see that the deps under torchgpu1.7 are installed, which includes transformers: https://github.com/facebookresearch/ParlAI/blob/master/.circleci/config.yml#L95

klshuster · 2021-07-20T19:34:27Z

tests/nightly/gpu/test_bb2.py



 @testing_utils.skipUnlessGPU
 @unittest.skipIf(LOCAL, "Skipping Test because its slow and mem intensive")
+@unittest.skipUnless(TRANSFORMER_INSTALLED, "Needs transformer, not installed.")


yeah im positive this should not be necessary

testing with torch 1.8

84887d2

facebook-github-bot added the CLA Signed label Jul 20, 2021

mojtaba-komeili added 2 commits July 20, 2021 09:04

reverting the requirements

2ff1876

moved retriver tests to nightly with GPU

d3f6967

mojtaba-komeili requested review from klshuster and stephenroller July 20, 2021 16:07

mojtaba-komeili added 2 commits July 20, 2021 09:10

removed the end line

0925b8a

skipping transformer imports

adb59d1

klshuster reviewed Jul 20, 2021

View reviewed changes

mojtaba-komeili added 2 commits July 20, 2021 11:59

skipping if not transformers

9fb1ad8

sorted the requiements list

10db85e

stephenroller reviewed Jul 20, 2021

View reviewed changes

klshuster reviewed Jul 20, 2021

View reviewed changes

mojtaba-komeili and others added 13 commits July 20, 2021 12:35

remvoed the extra skips in tests + regen the glue tests

326cf15

fairscale 0.3.7

d0173df

reordering requirements

7fab3a1

nit fix

826bb18

reverted transformer skips

c977858

skip transformer not installed

0fe1781

long cpu test: reducing bs

8a35a25

turn on fp16

8be9b83

mv searchquery retrievers test and bump down gpu mem usage

3d9a81c

fix bsz

f29b110

less memory for tests?

f45487e

sgd

204e776

update reqs

58a0ecc

klshuster approved these changes Jul 22, 2021

View reviewed changes

klshuster merged commit dd16d3f into master Jul 22, 2021

klshuster deleted the bb2-testfails branch July 22, 2021 18:24

klshuster mentioned this pull request Jul 22, 2021

Bump tqdm #3838

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI test failures after BB2 merge #3823

CI test failures after BB2 merge #3823

mojtaba-komeili commented Jul 20, 2021

klshuster commented Jul 20, 2021 •

edited by mojtaba-komeili

Loading

mojtaba-komeili commented Jul 20, 2021

klshuster Jul 20, 2021

stephenroller Jul 20, 2021

klshuster Jul 20, 2021

stephenroller Jul 20, 2021

klshuster Jul 20, 2021

stephenroller commented Jul 20, 2021

klshuster Jul 20, 2021

mojtaba-komeili Jul 20, 2021

mojtaba-komeili Jul 20, 2021

klshuster Jul 20, 2021

klshuster Jul 20, 2021

CI test failures after BB2 merge #3823

CI test failures after BB2 merge #3823

Conversation

mojtaba-komeili commented Jul 20, 2021

klshuster commented Jul 20, 2021 • edited by mojtaba-komeili Loading

mojtaba-komeili commented Jul 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenroller commented Jul 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klshuster commented Jul 20, 2021 •

edited by mojtaba-komeili

Loading