-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory #90
Comments
I don't know how you installed, because repo doesn't have proper installation instructions, but very likely installation error i made 1-click Python 3.10 installers and works as low as 5.5 gb : #86 |
Cool I will try yours
…On Sun, 3 Nov 2024, 5:23 pm Furkan Gözükara, ***@***.***> wrote:
I don't know how you installed, because repo doesn't have proper
installation instructions, but very likely installation error
i made 1-click Python 3.10 installers and works as low as 5.5 gb : #86
<#86>
—
Reply to this email directly, view it on GitHub
<#90 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A3IB2HS5BIYRLRNK5XB5QXLZ6YBTTAVCNFSM6AAAAABRCTVA26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTGM4TAOJZGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Rypo
added a commit
to Rypo/OmniGen
that referenced
this issue
Nov 28, 2024
Removes non_blocking argument from all device to cpu transfers. In certain environments (e.g. WSL) large transfers will throw a CUDA memory error regardless of VRAM available. Adjusts stream synchronize for modest performance gains with cpu_offload. fixes VectorSpaceLab#90, fixes VectorSpaceLab#117
Rypo
added a commit
to Rypo/OmniGen
that referenced
this issue
Dec 2, 2024
Removes non_blocking argument from all device to cpu transfers. In certain environments (e.g. WSL) large transfers will throw a CUDA memory error regardless of VRAM available. Adjusts stream synchronize for modest performance gains with cpu_offload. fixes VectorSpaceLab#90, fixes VectorSpaceLab#117
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have 24GB of ram, and i tried to generate a 512 by 512 image and i still get this!
Traceback (most recent call last):
File "F:\OmniGen\venv\lib\site-packages\gradio\queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "F:\OmniGen\venv\lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "F:\OmniGen\venv\lib\site-packages\gradio\blocks.py", line 2018, in process_api
result = await self.call_function(
File "F:\OmniGen\venv\lib\site-packages\gradio\blocks.py", line 1567, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "F:\OmniGen\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "F:\OmniGen\venv\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "F:\OmniGen\venv\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "F:\OmniGen\venv\lib\site-packages\gradio\utils.py", line 846, in wrapper
response = f(*args, **kwargs)
File "F:\OmniGen\app.py", line 22, in generate_image
output = pipe(
File "F:\OmniGen\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "F:\OmniGen\OmniGen\pipeline.py", line 278, in call
samples = scheduler(latents, func, model_kwargs, use_kv_cache=use_kv_cache, offload_kv_cache=offload_kv_cache)
File "F:\OmniGen\OmniGen\scheduler.py", line 162, in call
pred, cache = func(z, timesteps, past_key_values=cache, **model_kwargs)
File "F:\OmniGen\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "F:\OmniGen\OmniGen\model.py", line 387, in forward_with_separate_cfg
temp_out, temp_pask_key_values = self.forward(x[i], timestep[i], input_ids[i], input_img_latents[i], input_image_sizes[i], attention_mask[i], position_ids[i], past_key_values=past_key_values[i], return_past_key_values=True, offload_model=offload_model)
File "F:\OmniGen\OmniGen\model.py", line 338, in forward
output = self.llm(inputs_embeds=input_emb, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, offload_model=offload_model)
File "F:\OmniGen\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "F:\OmniGen\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "F:\OmniGen\OmniGen\transformer.py", line 156, in forward
self.get_offlaod_layer(layer_idx, device=inputs_embeds.device)
File "F:\OmniGen\OmniGen\transformer.py", line 52, in get_offlaod_layer
self.evict_previous_layer(layer_idx)
File "F:\OmniGen\OmniGen\transformer.py", line 43, in evict_previous_layer
param.data = param.data.to("cpu", non_blocking=True)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.The text was updated successfully, but these errors were encountered: