Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS 15.0 (24A335) M1 buffer is not large enough and resource_tracker: There appear to be %d #107

Open
guoreex opened this issue Sep 17, 2024 · 29 comments

Comments

@guoreex
Copy link

guoreex commented Sep 17, 2024

I'm not sure if this question is appropriate to ask here, I'm not a professional programmer, if anyone is willing to offer help and guidance, I would be very grateful.

Two weeks ago, I started using the GGUF model, and it can work normally. Today, I upgraded the system of the MacBook pro m1 computer to the latest version of macOS 15.0 (24A335). An error prompt occurred when running GGUF workflow in comfyUI:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 63700992 bytes
'/Users/***/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

My system information:
Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ]
pytorch version: 2.6.0.dev20240916
ComfyUI Revision: 2701 [7183fd16] | Released on '2024-09-17'

I didn't know if this is related to updating the system.
thx

@city96
Copy link
Owner

city96 commented Sep 17, 2024

Could you test with the FP16/FP8 model and the default nodes w/o the custom node pack? Might be more appropriate for the ComfyUI repo if it still happens with those since the error makes it sound like it's not a problem with this node pack, I could be wrong though.

Also makes it sound like you can set the env variable export TOKENIZERS_PARALLELISM=false to possibly fix it? Might be worth testing.

@guoreex
Copy link
Author

guoreex commented Sep 17, 2024

Thank you for your reply.

My computer only has 16G ram, which is not enough to run the FP8 model.

set export TOKENIZERS_PARALLELISM=false There are still mistakes:

...
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 323.94775390625 True
Requested to load FluxClipModel_
Loading 1 new model
Requested to load Flux
Loading 1 new model
loaded completely 0.0 6456.9610595703125 True
  0%|                                                              | 0/4 [00:00<?, ?it/s]/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 63700992 bytes
'
/Users/***/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

The error occurs after starting the generation calculation.

@city96
Copy link
Owner

city96 commented Sep 17, 2024

Well, at least there's a progress bar now lol, buffer error is still there though...

I don't have any apple device to test on, but looks like there's a similar issue on the pytorch tracker with a linked PR, not sure if the cause is the same though. Might be worth keeping an eye on and testing on latest nightly once it gets merged? pytorch/pytorch#136132

@tombearx
Copy link

Still have the issue using today's nightly build. Any one else?

@thenabytes
Copy link

thenabytes commented Sep 23, 2024

M2 Macbook Air, 16GB RAM
Sequoia 15.0
Python version: 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
pytorch version: 2.6.0.dev20240923
ComfyUI Revision: 2724 [3a0eeee3] | Released on '2024-09-23'

Requested to load Flux
Loading 1 new model
loaded completely 0.0 7867.7110595703125 True
  0%|                                                     | 0/20 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes

@jonny7737
Copy link

jonny7737 commented Sep 25, 2024

M2 Max Mac Studio, 64GB RAM
Sequoia 15.0
Python 3.11.9

Only when running GGUF models (fp16 fp8 work fine)

/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes

<<Slight correction: flux1-dev-Q8_0.GGUF WORKS!!>>
Correcting the correction: Q8 does not work (working test was before Sequoia)

@tombearx
Copy link

M2 Max Mac Studio, 64GB RAM Sequoia 15.0 Python 3.11.9

Only when running GGUF models (fp16 fp8 work fine)

/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes

Slight correction: flux1-dev-Q8_0.GGUF WORKS!!

Does Q8 work? What PyTorch version are you using?

@jonny7737
Copy link

M2 Max Mac Studio, 64GB RAM Sequoia 15.0 Python 3.11.9
Only when running GGUF models (fp16 fp8 work fine)
/AppleInternal/Library/BuildRoots/5a8a3fcc-55cb-11ef-848e-8a553ba56670/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:891: failed assertion `[MPSNDArray, initWithBufferImpl:offset:descriptor:isForNDArrayAlias:isUserBuffer:] Error: buffer is not large enough. Must be 77856768 bytes
Slight correction: flux1-dev-Q8_0.GGUF WORKS!!

Does Q8 work? What PyTorch version are you using?

I just retested Q8 and it does not work :( Working test was before Sequoia. Sorry for the false hope.

@jonny7737
Copy link

This is the only GGUF that I have found to work since Sequoia update:

https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-F16.gguf

@tombearx
Copy link

Guys, I've tested torch==2.4.1 and it works for gguf Q8.

@jonny7737
Copy link

jonny7737 commented Sep 30, 2024

What is the mac config for your test?
M? xxGB

Can't install pytorch==2.4.1 because it requires python < 3.9

@tombearx
Copy link

Strange, I use python 3.11.

M1 Max, 32gb

@jonny7737
Copy link

I use 3.11 as well but the install of torch 2.4.1 failed due to python version.
Very strange. I'll try again.
Thanks.

@bauerwer
Copy link

bauerwer commented Sep 30, 2024

Same issue here, flux GGUF's bail out with a mem allocation error in MPS (Error: buffer is not large enough. Must be 77856768 bytes). Worked on Mac OS 14.x but not anymore on Mac OS 15.x. same issue with torch 2.4.1 and 2.6.0.dev20240924 (nightly from last week).
As reference and as I can run heavier flux (M3 Max, 128GB RAM), the direct flux models work fine. Would love to run GGUFs though due to less RAM and speed.

@jonny7737
Copy link

FINALLY!!!
After 6 tries to get pytorch 2.4.1 to install, the install completed successfully.
A simple test with a Q5 GGUF model and it did not abort comfyui.
But, the image generated at an absolutely appauling 45 seconds per iteration.

It works but is not usable.

@cchance27
Copy link

cchance27 commented Oct 11, 2024

Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.

I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time

This is super annoying because the 2.6 nightly finally added support for autocast on MPS

@craii
Copy link

craii commented Oct 12, 2024

Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.

I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time

This is super annoying because the 2.6 nightly finally added support for autocast on MPS

thank you bro! By using pytorch 2.4.1, It works again!

@craii
Copy link

craii commented Oct 12, 2024

Thers something going on with every nightly build thats the issue for some reason the 2.6 nightlies all break the GGUF code for some reason running 32gb that works fine with Q8 on 2.4.1 fails every time with this semaphore error when moved to nightly.

I can't say if its seqouia + 2.6 nightlies but can confirm sequoia + 2.4.1 + gguf works fine, sequoia + 2.6 + gguf bails every time

This is super annoying because the 2.6 nightly finally added support for autocast on MPS

@city96
Hello Bro. I think this could be added to readme as a temporary fix guide.

@city96
Copy link
Owner

city96 commented Oct 15, 2024

@craii Added it under the installation section w/ a link to this issue thread.

@cchance27
Copy link

appauling 45 seconds per iteration.

Just so you know i haven't tested them all Q8_0 on M3 and torch 2.4.1 i get ~16-17s/it ... on Q5 and Q8_4 (i've been playing with custom quants) and they are 40-50s/it its insane not sure why it's so bad, but ya, Q8_0 loads and runs fastest so far.

@Vargol
Copy link

Vargol commented Oct 15, 2024

Q8 is faster because it can run fully on the GPU units, the others use a shift function that has to fallback to running on the CPU.

For example if Comfy is not hiding it in the terminal you should something see this

: The operator 'aten::__rshift__.Tensor' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)

when using the other models, this was taken from a Q6_K run in InvokeAI.

@jack813
Copy link

jack813 commented Oct 21, 2024

After Ptorch nightly 2.6.0 dev20241020 version, the problem has been fixed. I can run GGUF's quantized Flux.1 Dev Q4_0 version on my Mac book m1 Pro Memory: 16GB
截屏2024-10-21 12 00 15
截屏2024-10-21 12 17 21

@jonny7737
Copy link

After Ptorch nightly 2.6.0 dev20241020 version, the problem has been fixed. I can run GGUF's quantized Flux.1 Dev Q4_0 version on my Mac book m1 Pro Memory: 16GB 截屏2024-10-21 12 00 15 截屏2024-10-21 12 17 21

@jonny7737
Copy link

M2 Max 64GB after installing the 241020 nightly, GGUF seems to work again. Thanks for the heads up.

@jeanjerome
Copy link

I also managed to get a GGUF working with pytorch 2.6.0.dev20241020 py3.10_0 pytorch-nightly on Sequoia 15.0.1.

@ReZeroS
Copy link

ReZeroS commented Oct 25, 2024

conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightly

@jeanjerome
Copy link

Or simply conda install pytorch torchvision torchaudio -c pytorch-nightly (https://developer.apple.com/metal/pytorch/)

@craii
Copy link

craii commented Oct 25, 2024

M3 24GB works properly on Q4 schnell model after pytorch dev-20241020-nightly installed. But it seems to consume much more memory when given the same parameters to generate pictures(Now it takes 2529gb while only 1720gb was taken before)

@cchance27
Copy link

Just use 2.4.1 not nightly, report the regression to PyTorch team they already fixed some of the other regressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests