MULTI_GPU DATA_PARALLEL #1287

gunewar · 2022-10-13T01:14:24Z

Describe the bug
I tried to use darts with multi GPU but keep getting "RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method" error with jpynb.
I also tried with py file but this time model fits but in the second prediction system exits without error or warnings.
"pred_series = model_nbeats.historical_forecasts(

series,

past_covariates=train_features,


num_samples=1,

start=0.7,

forecast_horizon=6,

stride=10,

retrain=False,

overlap_end=False,

last_points_only=True, 

verbose=True,"

)
To Reproduce
model_nbeats = NBEATSModel(
input_chunk_length=1440,
output_chunk_length=6,
generic_architecture=True,
num_stacks=50,
num_blocks=1,
num_layers=4,
layer_widths=512,
n_epochs=1,
nr_epochs_val_period=1,
batch_size=1024,
model_name="nbeats_run",
force_reset=True,
random_state = None,
pl_trainer_kwargs={
"accelerator": "gpu",
"devices": [0,1], # use all available GPUs
#"auto_select_gpus": True,
"strategy":"ddp_notebook_find_unused_parameters_false",
},
)

This is my model

model_nbeats.fit(train,verbose=True,past_covariates=train_features,
num_loader_workers=2)

And this is how i fit
Expected behavior
I guess there is some problem with "/miniconda3/envs/darts/lib/python3.9/site-packages/darts/utils/torch.py:112, in random_method..decorator(self, *args, **kwargs)
110 with fork_rng():
111 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
--> 112 return decorated(self, *args, **kwargs)"

random decorator in darts torch.py
System (please complete the following information):

Python version: [3.9]
darts version [0.22.0]

Additional context
Can you publish documentation for using gpus , data_parallel and distributed_data_parallel

The text was updated successfully, but these errors were encountered:

dennisbader · 2022-10-13T06:28:50Z

Hi @gunewar, unfortunately I don't have the hardware to test this using multiple GPUs.

Let's start from your .py file: when you say it exits after 2nd prediction without error or warning, do you mean it exits after the 2nd out of some_n historical forecasts?
Does the normal model.predict() work?
What exactly is the error message you get running it as a jupyter notebook?

gunewar · 2022-10-13T20:55:05Z

Hi Dennis my py file is below:

############################################################################################

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from darts import TimeSeries

from darts.models import NBEATSModel

from darts.dataprocessing.transformers import Scaler, MissingValuesFiller

from darts.metrics import mape, r2_score

import matplotlib

import time as time

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

def display_forecast(pred_series, ts_transformed, forecast_type, start_date=None):

plt.figure(figsize=(8, 5))

if start_date:

    ts_transformed = ts_transformed.drop_before(start_date)

ts_transformed.univariate_component(0).plot(label="actual")

pred_series.plot(label=("historic " + forecast_type + " forecasts"))

plt.title(

    "R2: {}".format(r2_score(ts_transformed.univariate_component(0), pred_series))

)

plt.legend()

df = pd.read_csv(r'/home/data.csv')

df.drop(df.index[range(680_000)], inplace=True)

df['date'] = pd.to_datetime(df['date'])

#df.set_index('date', inplace=True)

df = df.reset_index(drop=True)

df=df.dropna()

df.columns

df.shape

df_day_avg = df

filler = MissingValuesFiller()

scaler = Scaler()

series = scaler.fit_transform(

filler.transform(

    TimeSeries.from_dataframe(

        df_day_avg, "date", ["value"],fill_missing_dates=True, freq="min",

    )

)

).astype(np.float32)

series_feaures = scaler.fit_transform(

filler.transform(

    TimeSeries.from_dataframe(

        df_day_avg, "date",fill_missing_dates=True, freq="min",

    )

)

).astype(np.float32)

train, val = series.split_after(0.7)

#train_features, val_features = series_feaures.split_after(0.7)

train_features = series_feaures

import torch

print(torch.cuda.device_count())

print(torch.cuda.is_available())

print(torch.cuda.current_device())

model_nbeats = NBEATSModel(

input_chunk_length=144,

output_chunk_length=6,

generic_architecture=True,

num_stacks=10,

num_blocks=1,

num_layers=4,

layer_widths=512,

n_epochs=1,

nr_epochs_val_period=1,

batch_size=1024,

model_name="nbeats_run",

pl_trainer_kwargs={

  "accelerator": "gpu",

  "strategy": "ddp",

  "devices": -1,

  "auto_select_gpus": True

  



},

)

model_nbeats.fit(train,verbose=True,past_covariates=train_features)

####################################################################

Model fit works well, after than when predict starts i debugged code
####################################################################################
pred_series = model_nbeats.historical_forecasts(

series,

past_covariates=train_features,


num_samples=1,

start=0.7,

forecast_horizon=6,

stride=10,

retrain=False,

overlap_end=False,

last_points_only=True, 

verbose=True,

)
###########################################################################

in the second predict loop system ends the process without an error code

################################################################################

when i try to test in jupyter notebook after the model introduction with code below
##########################################################

model_nbeats = NBEATSModel(
input_chunk_length=1440,
output_chunk_length=6,
generic_architecture=True,
num_stacks=50,
num_blocks=1,
num_layers=4,
layer_widths=512,
n_epochs=1,
nr_epochs_val_period=1,
batch_size=1024,
model_name="nbeats_run",
force_reset=True,
random_state = None,
pl_trainer_kwargs={
"accelerator": "gpu",
"devices": [0,1], # use all available GPUs
#"auto_select_gpus": True,
#"strategy":"ddp_notebook_find_unused_parameters_false",
},
)##############################################################
Darts give a warning as
##################################################

/home/solymr/miniconda3/envs/darts/lib/python3.9/site-packages/torch/random.py:99: UserWarning: CUDA reports that you have 2 available devices, and you have used fork_rng without explicitly specifying which devices are being used. For safety, we initialize every CUDA device by default, which can be quite slow if you have a lot of GPUs. If you know that you are only making use of a few CUDA devices, set the environment variable CUDA_VISIBLE_DEVICES or the 'devices' keyword argument of fork_rng with the set of devices you are actually using. For example, if you are using CPU only, set CUDA_VISIBLE_DEVICES= or devices=[]; if you are using GPU 0 only, set CUDA_VISIBLE_DEVICES=0 or devices=[0]. To initialize all devices and suppress this warning, set the 'devices' keyword argument to range(torch.cuda.device_count()).
warnings.warn(

##############################################################################

When i ran model fit with
###################################################################
model_nbeats.fit(train,verbose=True,past_covariates=train_features,
num_loader_workers=2)
###################################################################

system error message
#########################################################
2022-10-13 01:44:41 pytorch_lightning.utilities.rank_zero INFO: GPU available: True (cuda), used: True
2022-10-13 01:44:41 pytorch_lightning.utilities.rank_zero INFO: TPU available: False, using: 0 TPU cores
2022-10-13 01:44:41 pytorch_lightning.utilities.rank_zero INFO: IPU available: False, using: 0 IPUs
2022-10-13 01:44:41 pytorch_lightning.utilities.rank_zero INFO: HPU available: False, using: 0 HPUs
2022-10-13 01:44:41 pytorch_lightning.utilities.distributed INFO: Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
2022-10-13 01:44:41 pytorch_lightning.utilities.distributed INFO: Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/2
2022-10-13 01:44:41 pytorch_lightning.utilities.rank_zero INFO: ----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 2 processes

Output exceeds the size limit. Open the full output data in a text editor

ProcessRaisedException Traceback (most recent call last)
/home/solymr/Desktop/Work_Space/DARTS.MODELS/Yuppii_Darts.ipynb Cell 26 in <cell line: 1>()
----> 1 model_nbeats.fit(train,verbose=True,past_covariates=train_features,
2 num_loader_workers=2)

File ~/miniconda3/envs/darts/lib/python3.9/site-packages/darts/utils/torch.py:112, in random_method..decorator(self, *args, **kwargs)
110 with fork_rng():
111 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
--> 112 return decorated(self, *args, **kwargs)

File ~/miniconda3/envs/darts/lib/python3.9/site-packages/darts/models/forecasting/torch_forecasting_model.py:739, in TorchForecastingModel.fit(self, series, past_covariates, future_covariates, val_series, val_past_covariates, val_future_covariates, trainer, verbose, epochs, max_samples_per_ts, num_loader_workers)
731 logger.info(f"Train dataset contains {len(train_dataset)} samples.")
733 super().fit(
734 series=seq2series(series),
735 past_covariates=seq2series(past_covariates),
736 future_covariates=seq2series(future_covariates),
737 )
--> 739 return self.fit_from_dataset(
740 train_dataset, val_dataset, trainer, verbose, epochs, num_loader_workers
741 )

File ~/miniconda3/envs/darts/lib/python3.9/site-packages/darts/utils/torch.py:112, in random_method..decorator(self, *args, **kwargs)
110 with fork_rng():
111 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
...
torch._C._cuda_setDevice(device)
File "/home/esc/miniconda3/envs/darts/lib/python3.9/site-packages/torch/cuda/init.py", line 207, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
############################################################################################

solalatus · 2022-12-05T13:25:32Z

I am also having problems with multi GPU training.

I have tried with Jupyter and without it (py file), all DDP variants give this:

RuntimeError                              Traceback (most recent call last)
<ipython-input-19-92384a3569c2> in <module>
      1 print("starting training...")
----> 2 stuff = model_nhits.fit(train_datasets, val_series=valid_datasets, verbose=True)

~/.local/lib/python3.8/site-packages/darts/utils/torch.py in decorator(self, *args, **kwargs)
    110         with fork_rng():
    111             manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
--> 112             return decorated(self, *args, **kwargs)
    113 
    114     return decorator

~/.local/lib/python3.8/site-packages/darts/models/forecasting/torch_forecasting_model.py in fit(self, series, past_covariates, future_covariates, val_series, val_past_covariates, val_future_covariates, trainer, verbose, epochs, max_samples_per_ts, num_loader_workers)
    737         )
    738 
--> 739         return self.fit_from_dataset(
    740             train_dataset, val_dataset, trainer, verbose, epochs, num_loader_workers
    741         )

~/.local/lib/python3.8/site-packages/darts/utils/torch.py in decorator(self, *args, **kwargs)
    110         with fork_rng():
    111             manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
--> 112             return decorated(self, *args, **kwargs)
    113 
    114     return decorator

~/.local/lib/python3.8/site-packages/darts/models/forecasting/torch_forecasting_model.py in fit_from_dataset(self, train_dataset, val_dataset, trainer, verbose, epochs, num_loader_workers)
    892 
    893         # Train model
--> 894         self._train(train_loader, val_loader)
    895         return self
    896 

~/.local/lib/python3.8/site-packages/darts/models/forecasting/torch_forecasting_model.py in _train(self, train_loader, val_loader)
    914         self.load_ckpt_path = None
    915 
--> 916         self.trainer.fit(
    917             self.model,
    918             train_dataloaders=train_loader,

~/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    580             raise TypeError(f"Trainer.fit() requires a LightningModule, got: {model.__class__.__qualname__}")
    581         self.strategy._lightning_module = model
--> 582         call._call_and_handle_interrupt(
    583             self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    584         )

~/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     34     try:
     35         if trainer.strategy.launcher is not None:
---> 36             return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
     37         else:
     38             return trainer_fn(*args, **kwargs)

~/.local/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py in launch(self, function, trainer, *args, **kwargs)
     94         self._check_torchdistx_support()
     95         if self._start_method in ("fork", "forkserver"):
---> 96             _check_bad_cuda_fork()
     97 
     98         # The default cluster environment in Lightning chooses a random free port number

~/.local/lib/python3.8/site-packages/lightning_lite/strategies/launchers/multiprocessing.py in _check_bad_cuda_fork()
    192     if _IS_INTERACTIVE:
    193         message += " You will have to restart the Python kernel."
--> 194     raise RuntimeError(message)

RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call torch.cuda.* functions, have moved the model to the device, or allocated memory on the GPU any other way? Please remove any such calls, or change the selected strategy. You will have to restart the Python kernel.```

solalatus · 2022-12-05T13:30:19Z

I also tried to use DP in py file, then the message ends as (apologies, trace was cut by ssh client):

modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/tensorml/lib/python3.9/site-packages/darts/models/forecasting/nhits.py", line 179, in forward
    x = self.layers(x)
  File "/home/user/anaconda3/envs/tensorml/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/tensorml/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/user/anaconda3/envs/tensorml/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/anaconda3/envs/tensorml/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_addmm)```

solalatus · 2022-12-05T13:30:25Z

I also tried installing Deepspeed and using it's strategies, also error.

Fresh pip install darts in a Conda environment.
Python 3.8.10 and Darts 0.22.0

Any advice would be highly appreciated!

(By the way: same code runs like charm in the same environment, on single GPU)

hrzn · 2022-12-06T14:54:56Z

Unfortunately we haven't yet had the chance to properly test Darts on a multi-GPU setup. There's a discussion going on here about this too: https://gitter.im/u8darts/darts?at=63847beabcdb0060b8408787

gunewar · 2022-12-09T16:17:52Z

Hi Julien , i am from Bilkent University and i can open the server for you with multi gpu ( 2 * 1080ti ) for your work . Please contact me from skype live:f337e4b1889436b3 or sakaryaemre@gmail.com if you are interested.

solalatus · 2023-01-03T15:32:51Z

Any news on this? Could anyone train eg. an N-HiTS model on multi GPU?

hrzn · 2023-01-05T15:58:57Z

I added this to our backlog so one of us can take a look when we have some time (thanks for the kind proposal @gunewar !). In the meantime any PR/fix proposal is welcome.

hrzn · 2023-01-06T10:36:57Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MULTI_GPU DATA_PARALLEL #1287

MULTI_GPU DATA_PARALLEL #1287

gunewar commented Oct 13, 2022

dennisbader commented Oct 13, 2022

gunewar commented Oct 13, 2022 •

edited

Loading

solalatus commented Dec 5, 2022 •

edited

Loading

solalatus commented Dec 5, 2022

solalatus commented Dec 5, 2022

hrzn commented Dec 6, 2022

gunewar commented Dec 9, 2022 •

edited

Loading

solalatus commented Jan 3, 2023

hrzn commented Jan 5, 2023

hrzn commented Jan 6, 2023

solalatus commented Jan 17, 2023 •

edited

Loading

solalatus commented Jan 19, 2023 •

edited

Loading

hrzn commented Jan 25, 2023

MULTI_GPU DATA_PARALLEL #1287

MULTI_GPU DATA_PARALLEL #1287

Comments

gunewar commented Oct 13, 2022

dennisbader commented Oct 13, 2022

gunewar commented Oct 13, 2022 • edited Loading

Output exceeds the size limit. Open the full output data in a text editor

solalatus commented Dec 5, 2022 • edited Loading

solalatus commented Dec 5, 2022

solalatus commented Dec 5, 2022

hrzn commented Dec 6, 2022

gunewar commented Dec 9, 2022 • edited Loading

solalatus commented Jan 3, 2023

hrzn commented Jan 5, 2023

hrzn commented Jan 6, 2023

solalatus commented Jan 17, 2023 • edited Loading

solalatus commented Jan 19, 2023 • edited Loading

hrzn commented Jan 25, 2023

gunewar commented Oct 13, 2022 •

edited

Loading

solalatus commented Dec 5, 2022 •

edited

Loading

gunewar commented Dec 9, 2022 •

edited

Loading

solalatus commented Jan 17, 2023 •

edited

Loading

solalatus commented Jan 19, 2023 •

edited

Loading