Eliminate `init_cuda_buffer` and `dummy_batch` #3412

stephenroller · 2021-01-26T17:28:08Z

[Let's put put this off until we implement sharded_ddp and others, unsure how they're going to work together yet, #3415 ]

In early implementations of distributed training, it was necessary to do a "dummy pass" if a worker OOMed, in order to ensure that it correctly synced with others. You can see this code here:

ParlAI/parlai/core/torch_generator_agent.py

Lines 751 to 773 in 67433e3

    
               oom_sync = False 
        
           except RuntimeError as e: 
        
               # catch out of memory exceptions during fwd/bck (skip batch) 
        
               if 'out of memory' in str(e): 
        
                   oom_sync = True 
        
                   logging.error( 
        
                       'Ran out of memory, skipping batch. ' 
        
                       'if this happens frequently, decrease batchsize or ' 
        
                       'truncate the inputs to the model.' 
        
                   ) 
        
                   self.global_metrics.add('skipped_batches', SumMetric(1)) 
        
               else: 
        
                   raise e 
        
           if oom_sync: 
        
               # moved outside of the try-except because the raised exception in scope 
        
               # actually prevents from the data being freed, which can sometimes cause 
        
               # us to OOM during our OOM handling. 
        
               # https://github.com/pytorch/pytorch/issues/18853#issuecomment-583779161 
        
               # gradients are synced on backward, now this model is going to be 
        
               # out of sync! catch up with the other workers 
        
               self._init_cuda_buffer(8, 8, True)

ParlAI/parlai/core/torch_generator_agent.py

Lines 625 to 648 in 67433e3

    
               def _init_cuda_buffer(self, batchsize, maxlen, force=False): 
        
                   """ 
        
                   Pre-initialize CUDA buffer by doing fake forward pass. 
        
                   This is also used in distributed mode to force a worker to sync with others. 
        
                   """ 
        
                   if self.use_cuda and (force or not hasattr(self, 'buffer_initialized')): 
        
                       try: 
        
                           self._control_local_metrics(disabled=True) 
        
                           loss = 0 * self.compute_loss(self._dummy_batch(batchsize, maxlen)) 
        
                           self._control_local_metrics(enabled=True) 
        
                           self._temporarily_disable_local_metrics = False 
        
                           self.backward(loss) 
        
                           self.buffer_initialized = True 
        
                       except RuntimeError as e: 
        
                           if 'out of memory' in str(e): 
        
                               m = ( 
        
                                   'CUDA OOM: Lower batch size (-bs) from {} or lower ' 
        
                                   ' max sequence length (-tr) from {}' 
        
                                   ''.format(batchsize, maxlen) 
        
                               ) 
        
                               raise RuntimeError(m) 
        
                           else: 
        
                               raise e

This is slightly annoying as a user, as any time you add a custom field, you will need to implement _dummy_batch yourself.

DDP allows for a manual sync (see DDP.join) which has the same effect as this method. We should switch to this, and then remove all instances of _init_cuda_buffer and _dummy_batch from all our codebases.

The text was updated successfully, but these errors were encountered:

github-actions · 2021-02-26T00:27:07Z

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

stephenroller · 2021-07-14T01:48:14Z

Implemented in #3732

stephenroller assigned kauterry Jan 26, 2021

github-actions bot added the stale label Feb 26, 2021

stephenroller added never-stale and removed stale labels Feb 28, 2021

stephenroller closed this as completed Jul 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate `init_cuda_buffer` and `dummy_batch` #3412

Eliminate `init_cuda_buffer` and `dummy_batch` #3412

stephenroller commented Jan 26, 2021 •

edited

Loading

github-actions bot commented Feb 26, 2021

stephenroller commented Jul 14, 2021

Eliminate init_cuda_buffer and dummy_batch #3412

Eliminate init_cuda_buffer and dummy_batch #3412

Comments

stephenroller commented Jan 26, 2021 • edited Loading

github-actions bot commented Feb 26, 2021

stephenroller commented Jul 14, 2021

Eliminate `init_cuda_buffer` and `dummy_batch` #3412

Eliminate `init_cuda_buffer` and `dummy_batch` #3412

stephenroller commented Jan 26, 2021 •

edited

Loading