Accelerate Error #2216

nickjtay · 2023-12-05T07:30:26Z

My notebook was working and then stopped working when I duplicated my notebook to test out a small revision that required bitsandbytes. I installed bitsandbytes to the same virtual environment, which I wouldn't expect to cause any issues. I went back to the original notebook and it no longer ran successfully. I'm now getting the error below. I have since uninstalled bitsandbytes and restarted the kernel. I'm not sure what happened and I cannot track down someone experiencing this issue on stackoverflow or elsewhere.

LoRA-multi-gpu-working.zip

Error:

---------------------------------------------------------------------------
ProcessRaisedException                    Traceback (most recent call last)
File ~/Projects/llmtest1/lib/python3.10/site-packages/accelerate/launchers.py:186, in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
    185 try:
--> 186     start_processes(launcher, args=args, nprocs=num_processes, start_method="fork")
    187 except ProcessRaisedException as e:

File ~/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:202, in start_processes(fn, args, nprocs, join, daemon, start_method)
    201 # Loop on join until it returns True or raises an exception.
--> 202 while not context.join():
    203     pass

File ~/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:163, in ProcessContext.join(self, timeout)
    162 msg += original_trace
--> 163 raise ProcessRaisedException(msg, error_index, failed_process.pid)

ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/utils/launch.py", line 562, in __call__
    self.launcher(*args)
  File "/tmp/ipykernel_5874/1694138925.py", line 8, in training_loop
    accelerator = Accelerator(mixed_precision=mixed_precision)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/accelerator.py", line 371, in __init__
    self.state = AcceleratorState(
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/state.py", line 758, in __init__
    PartialState(cpu, **kwargs)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/state.py", line 218, in __init__
    if not check_cuda_p2p_ib_support():
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/utils/environment.py", line 71, in check_cuda_p2p_ib_support
    device_name = torch.cuda.get_device_name()
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 419, in get_device_name
    return get_device_properties(device).name
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 449, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method


The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[3], line 2
      1 args = ("fp16", 42, 64)
----> 2 notebook_launcher(training_loop, args, num_processes=2)

File ~/Projects/llmtest1/lib/python3.10/site-packages/accelerate/launchers.py:189, in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
    187 except ProcessRaisedException as e:
    188     if "Cannot re-initialize CUDA in forked subprocess" in e.args[0]:
--> 189         raise RuntimeError(
    190             "CUDA has been initialized before the `notebook_launcher` could create a forked subprocess. "
    191             "This likely stems from an outside import causing issues once the `notebook_launcher()` is called. "
    192             "Please review your imports and test them when running the `notebook_launcher()` to identify "
    193             "which one is problematic and causing CUDA to be initialized."
    194         ) from e
    195     else:
    196         raise RuntimeError(f"An issue was found when launching the training: {e}") from e

RuntimeError: CUDA has been initialized before the `notebook_launcher` could create a forked subprocess. This likely stems from an outside import causing issues once the `notebook_launcher()` is called. Please review your imports and test them when running the `notebook_launcher()` to identify which one is problematic and causing CUDA to be initialized.

The text was updated successfully, but these errors were encountered:

muellerzr · 2023-12-05T09:47:58Z

Again, please state info about your env as I asked in the other issue.

Bits and bytes and other libraries similar will init CUDA on import. You need to hide this import inside your training function so it gets imported after you’ve launched your notebook launcher. Later versions of accelerate will warn if this happens

nickjtay · 2023-12-05T19:42:02Z

Thank you, that makes sense, but I had removed bitsandbytes and rebooted. Not sure why it says there is no default config, because I walked through the config wizard and successfully had accelerate working.

Accelerate version: 0.25.0
Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.24.2
PyTorch version (GPU?): 1.13.1+cu117 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 62.69 GB
GPU type: NVIDIA GeForce RTX 3060
Accelerate default config:
Not found

nickjtay · 2023-12-06T02:12:04Z

I reconfigured accelerate, but I'm still getting the same error.

Accelerate version: 0.25.0
Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.24.2
PyTorch version (GPU?): 1.13.1+cu117 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 62.69 GB
GPU type: NVIDIA GeForce RTX 3060
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp8
- use_cpu: False
- debug: True
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0,1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []

muellerzr · 2023-12-06T02:20:52Z

Can you try installing from accelerate main? pip install git+https://github.com/huggingface/accelerate

nickjtay · 2023-12-06T05:45:52Z

Still experiencing the error. I rebooted as well. I'm also not running any other notebooks or processes which would use CUDA. I don't see anything in the code that would initialize CUDA before the notebook_launcher() function runs, either.

Accelerate version: 0.25.0.dev0
Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.24.2
PyTorch version (GPU?): 1.13.1+cu117 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 62.69 GB
GPU type: NVIDIA GeForce RTX 3060
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp8
- use_cpu: False
- debug: True
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0,1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []

geronimi73 · 2023-12-06T14:49:58Z

RuntimeError: CUDA has been initialized before the notebook_launcher could create a forked subprocess. This likely stems from an outside import causing issues once the notebook_launcher() is called. Please review your imports and test them when running the notebook_launcher() to identify which one is problematic and causing CUDA to be initialized.

from accelerate import Accelerator initializes CUDA, you have to move it into training_loop.
If the error persists, move all the other imports except from accelerate import notebook_launcher too

muellerzr · 2023-12-06T14:58:06Z

from accelerate import Accelerator initializes CUDA, you have to move it into training_loop.

That shouldn't be the case/shouldn't be happening 👀

I ran the notebook launcher just fine, can you give me the output from pip freeze @geronimi73?

As all the imports in Accelerate are very cuda-careful for this exact reason

geronimi73 · 2023-12-06T15:09:58Z

nevermind!

from accelerate import Accelerator initializes CUDA, you have to move it into training_loop.

this was definitely the case with accelerate-0.21.0. after a pip update to accelerate-0.25.0: gone.

sorry for the distraction

edit:
but I checked the code from @nickjtay in my notebook and it seems that the peft import initializes cuda

import torch
display(torch.cuda.is_initialized())
from peft import (
    get_peft_config,
    get_peft_model,
    get_peft_model_state_dict,
    set_peft_model_state_dict,
    LoraConfig,
    PeftType,
    PrefixTuningConfig,
    PromptEncoderConfig,
)
display(torch.cuda.is_initialized())

output

False
True

freeze.txt

muellerzr · 2023-12-06T17:39:41Z

Yes iirc I opened an issue on the peft side for this. Nothing we can do, they have to do things about that :) (So just import it during your training func)

BenjaminBossan · 2023-12-06T17:50:54Z

Yes, we should revisit this in PEFT!

nickjtay · 2023-12-06T17:55:56Z

Following the above regarding moving accelerate to the loop and after rebooting my machine to clear the RAM on the GPUs I am still getting the error message. Should I be moving peft to the loop as well?

import argparse
import os

import torch
from torch.optim import AdamW
from torch.utils.data import DataLoader
from peft import (
    get_peft_config,
    get_peft_model,
    get_peft_model_state_dict,
    set_peft_model_state_dict,
    LoraConfig,
    PeftType,
    PrefixTuningConfig,
    PromptEncoderConfig,
)

import evaluate
from datasets import load_dataset
from transformers import AutoModelForSequenceClassification, AutoTokenizer, get_linear_schedule_with_warmup, set_seed
from tqdm import tqdm
from accelerate import notebook_launcher    

def training_loop(mixed_precision="fp16", seed:int=42, batch_size:int=32):
    from accelerate import Accelerator, DistributedType
    from accelerate.utils import set_seed
    
    set_seed(seed)
    model_name_or_path = "google/flan-t5-small"
    task = "mrpc"
    
    accelerator = Accelerator(mixed_precision=mixed_precision)
    
    if any(k in model_name_or_path for k in ("gpt", "opt", "bloom")):
        padding_side = "left"
    else:
        padding_side = "right"

    def collate_fn(examples):
        return tokenizer.pad(examples, padding="longest", return_tensors="pt")

    def collate_fn(examples):
        max_length = 128 if accelerator.distributed_type == DistributedType.TPU else None
        if accelerator.mixed_precision == "fp8":
            pad_to_multiple_of = 16
        elif accelerator.mixed_precision != "no":
            pad_to_multiple_of = 8
        else:
            pad_to_multiple_of = None

        return tokenizer.pad(
            examples,
            padding="longest",
            max_length=max_length,
            pad_to_multiple_of=pad_to_multiple_of,
            return_tensors="pt",
        )        
        
    def tokenize_function(examples):
        outputs = tokenizer(examples["sentence1"], examples["sentence2"], truncation=True, max_length=None)
        return outputs

    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, padding_side=padding_side)
    
    if getattr(tokenizer, "pad_token_id") is None:
        tokenizer.pad_token_id = tokenizer.eos_token_id

    datasets = load_dataset("glue", task)
    metric = evaluate.load("glue", task)

    tokenized_datasets = datasets.map(
        tokenize_function,
        batched=True,
        remove_columns=["idx", "sentence1", "sentence2"],
    )

    tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
    
    train_dataloader = DataLoader(
        tokenized_datasets["train"], 
        shuffle=True, 
        collate_fn=collate_fn, 
        batch_size=batch_size)
    eval_dataloader = DataLoader(
        tokenized_datasets["validation"], 
        shuffle=False, 
        collate_fn=collate_fn, 
        batch_size=batch_size)
    
    peft_type = PeftType.LORA
    num_epochs = 5
    
    peft_config = LoraConfig(
        task_type="SEQ_CLS", 
        inference_mode=False, 
        r=8, lora_alpha=16, 
        lora_dropout=0.1)
    lr = 3e-4
    
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name_or_path, 
        return_dict=True)
    model = model.to(accelerator.device)
    model = get_peft_model(model, peft_config)
    
    optimizer = AdamW(params=model.parameters(), lr=lr)
    
    lr_scheduler = get_linear_schedule_with_warmup(
        optimizer=optimizer,
        num_warmup_steps=0.06 * (len(train_dataloader) * num_epochs),
        num_training_steps=(len(train_dataloader) * num_epochs),
    )    
    
    model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
        model, optimizer, train_dataloader, eval_dataloader, lr_scheduler
    )

    for epoch in range(num_epochs):
        model.train()
        for step, batch in enumerate(tqdm(train_dataloader)):
            batch.to(accelerator.device)
            outputs = model(**batch)
            loss = outputs.loss
            accelerator.backward(loss)
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()

        model.eval()
        for step, batch in enumerate(tqdm(eval_dataloader)):
            batch.to(accelerator.device)
            with torch.no_grad():
                outputs = model(**batch)
            predictions = outputs.logits.argmax(dim=-1)
            predictions, references = predictions, batch["labels"]
            metric.add_batch(
                predictions=predictions,
                references=references,
            )

Error Message:

---------------------------------------------------------------------------
ProcessRaisedException                    Traceback (most recent call last)
File ~/Projects/llmtest1/lib/python3.10/site-packages/accelerate/launchers.py:186, in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
    185 try:
--> 186     start_processes(launcher, args=args, nprocs=num_processes, start_method="fork")
    187 except ProcessRaisedException as e:

File ~/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:202, in start_processes(fn, args, nprocs, join, daemon, start_method)
    201 # Loop on join until it returns True or raises an exception.
--> 202 while not context.join():
    203     pass

File ~/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py:163, in ProcessContext.join(self, timeout)
    162 msg += original_trace
--> 163 raise ProcessRaisedException(msg, error_index, failed_process.pid)

ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/utils/launch.py", line 562, in __call__
    self.launcher(*args)
  File "/tmp/ipykernel_4901/2016609644.py", line 11, in training_loop
    accelerator = Accelerator(mixed_precision=mixed_precision)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/accelerator.py", line 371, in __init__
    self.state = AcceleratorState(
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/state.py", line 758, in __init__
    PartialState(cpu, **kwargs)
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/state.py", line 218, in __init__
    if not check_cuda_p2p_ib_support():
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/accelerate/utils/environment.py", line 71, in check_cuda_p2p_ib_support
    device_name = torch.cuda.get_device_name()
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 419, in get_device_name
    return get_device_properties(device).name
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 449, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/nickjtay/Projects/llmtest1/lib/python3.10/site-packages/torch/cuda/__init__.py", line 284, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method


The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[3], line 2
      1 args = ("fp16", 42, 64)
----> 2 notebook_launcher(training_loop, args, num_processes=2)

File ~/Projects/llmtest1/lib/python3.10/site-packages/accelerate/launchers.py:189, in notebook_launcher(function, args, num_processes, mixed_precision, use_port, master_addr, node_rank, num_nodes)
    187 except ProcessRaisedException as e:
    188     if "Cannot re-initialize CUDA in forked subprocess" in e.args[0]:
--> 189         raise RuntimeError(
    190             "CUDA has been initialized before the `notebook_launcher` could create a forked subprocess. "
    191             "This likely stems from an outside import causing issues once the `notebook_launcher()` is called. "
    192             "Please review your imports and test them when running the `notebook_launcher()` to identify "
    193             "which one is problematic and causing CUDA to be initialized."
    194         ) from e
    195     else:
    196         raise RuntimeError(f"An issue was found when launching the training: {e}") from e

RuntimeError: CUDA has been initialized before the `notebook_launcher` could create a forked subprocess. This likely stems from an outside import causing issues once the `notebook_launcher()` is called. Please review your imports and test them when running the `notebook_launcher()` to identify which one is problematic and causing CUDA to be initialized.

Accelerate version: 0.25.0.dev0
Platform: Linux-6.2.0-37-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Numpy version: 1.24.2
PyTorch version (GPU?): 1.13.1+cu117 (True)
PyTorch XPU available: False
PyTorch NPU available: False
System RAM: 62.69 GB
GPU type: NVIDIA GeForce RTX 3060
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp8
- use_cpu: False
- debug: True
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0,1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []

BenjaminBossan · 2023-12-06T18:03:11Z

Should I be moving peft to the loop as well?

Yes, please test that as well and let us know if it solves the problem.

nickjtay · 2023-12-06T18:03:27Z

Nevermind, I see, moving both modules into the loop solved it. Thank you!

muellerzr added the solved The bug or feature request has been solved, but the issue is still opened label Dec 6, 2023

nickjtay closed this as completed Dec 6, 2023

BenjaminBossan mentioned this issue Dec 7, 2023

Lazy import of bitsandbytes huggingface/peft#1230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate Error #2216

Accelerate Error #2216

nickjtay commented Dec 5, 2023

muellerzr commented Dec 5, 2023

nickjtay commented Dec 5, 2023 •

edited

Loading

nickjtay commented Dec 6, 2023

muellerzr commented Dec 6, 2023

nickjtay commented Dec 6, 2023 •

edited

Loading

geronimi73 commented Dec 6, 2023

muellerzr commented Dec 6, 2023 •

edited

Loading

geronimi73 commented Dec 6, 2023 •

edited

Loading

muellerzr commented Dec 6, 2023

BenjaminBossan commented Dec 6, 2023

nickjtay commented Dec 6, 2023 •

edited

Loading

BenjaminBossan commented Dec 6, 2023

nickjtay commented Dec 6, 2023

Accelerate Error #2216

Accelerate Error #2216

Comments

nickjtay commented Dec 5, 2023

muellerzr commented Dec 5, 2023

nickjtay commented Dec 5, 2023 • edited Loading

nickjtay commented Dec 6, 2023

muellerzr commented Dec 6, 2023

nickjtay commented Dec 6, 2023 • edited Loading

geronimi73 commented Dec 6, 2023

muellerzr commented Dec 6, 2023 • edited Loading

geronimi73 commented Dec 6, 2023 • edited Loading

muellerzr commented Dec 6, 2023

BenjaminBossan commented Dec 6, 2023

nickjtay commented Dec 6, 2023 • edited Loading

BenjaminBossan commented Dec 6, 2023

nickjtay commented Dec 6, 2023

nickjtay commented Dec 5, 2023 •

edited

Loading

nickjtay commented Dec 6, 2023 •

edited

Loading

muellerzr commented Dec 6, 2023 •

edited

Loading

geronimi73 commented Dec 6, 2023 •

edited

Loading

nickjtay commented Dec 6, 2023 •

edited

Loading