Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #90

Open
TanvirHafiz opened this issue Nov 3, 2024 · 2 comments
Open

CUDA out of memory #90

TanvirHafiz opened this issue Nov 3, 2024 · 2 comments

Comments

@TanvirHafiz
Copy link

I have 24GB of ram, and i tried to generate a 512 by 512 image and i still get this!

Traceback (most recent call last):
File "F:\OmniGen\venv\lib\site-packages\gradio\queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "F:\OmniGen\venv\lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "F:\OmniGen\venv\lib\site-packages\gradio\blocks.py", line 2018, in process_api
result = await self.call_function(
File "F:\OmniGen\venv\lib\site-packages\gradio\blocks.py", line 1567, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "F:\OmniGen\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "F:\OmniGen\venv\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "F:\OmniGen\venv\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "F:\OmniGen\venv\lib\site-packages\gradio\utils.py", line 846, in wrapper
response = f(*args, **kwargs)
File "F:\OmniGen\app.py", line 22, in generate_image
output = pipe(
File "F:\OmniGen\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "F:\OmniGen\OmniGen\pipeline.py", line 278, in call
samples = scheduler(latents, func, model_kwargs, use_kv_cache=use_kv_cache, offload_kv_cache=offload_kv_cache)
File "F:\OmniGen\OmniGen\scheduler.py", line 162, in call
pred, cache = func(z, timesteps, past_key_values=cache, **model_kwargs)
File "F:\OmniGen\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "F:\OmniGen\OmniGen\model.py", line 387, in forward_with_separate_cfg
temp_out, temp_pask_key_values = self.forward(x[i], timestep[i], input_ids[i], input_img_latents[i], input_image_sizes[i], attention_mask[i], position_ids[i], past_key_values=past_key_values[i], return_past_key_values=True, offload_model=offload_model)
File "F:\OmniGen\OmniGen\model.py", line 338, in forward
output = self.llm(inputs_embeds=input_emb, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, offload_model=offload_model)
File "F:\OmniGen\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "F:\OmniGen\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "F:\OmniGen\OmniGen\transformer.py", line 156, in forward
self.get_offlaod_layer(layer_idx, device=inputs_embeds.device)
File "F:\OmniGen\OmniGen\transformer.py", line 52, in get_offlaod_layer
self.evict_previous_layer(layer_idx)
File "F:\OmniGen\OmniGen\transformer.py", line 43, in evict_previous_layer
param.data = param.data.to("cpu", non_blocking=True)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

@FurkanGozukara
Copy link

I don't know how you installed, because repo doesn't have proper installation instructions, but very likely installation error

i made 1-click Python 3.10 installers and works as low as 5.5 gb : #86

@TanvirHafiz
Copy link
Author

TanvirHafiz commented Nov 3, 2024 via email

Rypo added a commit to Rypo/OmniGen that referenced this issue Nov 28, 2024
Removes non_blocking argument from all device to cpu transfers. In certain environments (e.g. WSL) large transfers will throw a CUDA memory error regardless of VRAM available.

Adjusts stream synchronize for modest performance gains with cpu_offload.

fixes VectorSpaceLab#90, fixes VectorSpaceLab#117
Rypo added a commit to Rypo/OmniGen that referenced this issue Dec 2, 2024
Removes non_blocking argument from all device to cpu transfers. In certain environments (e.g. WSL) large transfers will throw a CUDA memory error regardless of VRAM available.

Adjusts stream synchronize for modest performance gains with cpu_offload.

fixes VectorSpaceLab#90, fixes VectorSpaceLab#117
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants