Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model is successfully compiled, but OOM when loading #991

Open
jiangwei221 opened this issue Feb 9, 2024 · 0 comments
Open

Model is successfully compiled, but OOM when loading #991

jiangwei221 opened this issue Feb 9, 2024 · 0 comments

Comments

@jiangwei221
Copy link

Hi AIT team:

I'm working on compiling a generative video model into AIT.

I can successfully compile the model, as you can see here:

2024-02-09 07:20:35,614 INFO <aitemplate.compiler.transform.memory_planning> max_blob=19546740864 constant_offset=7630531776
2024-02-09 07:20:35,939 INFO <aitemplate.backend.codegen> generated 1027 function srcs
2024-02-09 07:20:40,944 INFO <aitemplate.backend.codegen> generated 8 library srcs
2024-02-09 07:20:40,949 INFO <aitemplate.backend.builder> Using 64 CPU for building
2024-02-09 09:40:18,778 INFO <aitemplate.compiler.compiler> compiled the final .so file elapsed time: 2:19:37.829273

However, when I try to load the model into the same GPU, it reports OOM error:

  Device:                                                                                                                                                                                              [79/1955]
     ASCII string identifying device: NVIDIA GeForce RTX 3090
     Major compute capability: 8
     Minor compute capability: 6
     UUID: GPU-aca8dfe8-0c10-ed38-e488-8117bfbc3566
     Unique identifier for a group of devices on the same multi-GPU board: 0
     PCI bus ID of the device: 46
     PCI device ID of the device: 0
     PCI domain ID of the device: 0
  Memory limits:
     Constant memory available on device in bytes: 65536
     Global memory available on device in bytes: 25438126080
     Size of L2 cache in bytes: 6291456
     Shared memory available per block in bytes: 49152
     Shared memory available per multiprocessor in bytes: 102400
[14:54:05] model_container.cu:87: Init AITemplate Runtime with 1 concurrency
[14:54:05] model_interface.cu:91: Error: DeviceMalloc(&result, n_bytes) API call failed: out of memory at model_interface.cu, line49
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 238, in __init__
    self.DLL.AITemplateModelContainerCreate(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/model.py", line 196, in _wrapped_func
    raise RuntimeError(f"Error in function: {method.__name__}")
RuntimeError: Error in function: AITemplateModelContainerCreate

The file size of test.so is 7.4G, and I have 24GB on my 3090.
I think it is related to the dynamic shape, when I compile with low resolution height/width, the model can be loaded. But when I compile with high resolution h/w, it gives me OOM.
Do you have any suggestions on this issue? Will removing reshape/permute help? Or can you provide some insight why the dynamic dimension range would affect the memory consumption?

Thanks and happy lunar new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant