You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the command prints out, FasterTransformer converted output weight folder is 25GB, and original huggingface model's size is also 25GB.
Problem occurs when I load it using tritonserver and fastertransformer_backend.
When I load it using fp16, it just loads fine.
I0906 06:12:34.269131 83 libfastertransformer.cc:438] Before Loading Weights:
after allocation : free: 78.56 GB, total: 79.15 GB, used: 0.60 GB
I0906 06:12:56.704958 83 libfastertransformer.cc:448] After Loading Weights:
after allocation : free: 54.54 GB, total: 79.15 GB, used: 24.61 GB
But when I load it with bf16, it suddenly takes up twice the memory.
I0906 06:10:11.016121 83 libfastertransformer.cc:438] Before Loading Weights:
after allocation : free: 78.56 GB, total: 79.15 GB, used: 0.60 GB
I0906 06:11:07.674020 83 libfastertransformer.cc:448] After Loading Weights:
after allocation : free: 30.52 GB, total: 79.15 GB, used: 48.63 GB
I guess taking twice the memory means that is is loaded as fp32,
so does it mean then you can't load a model saved as fp16 into bf16,
or is it that just Gpt-NeoX model doesn't support bf16 format?
Description
Hello, I'm not sure whether this is FasterTransformer's issue or backend's issue, but still I'm reporting it here.
As the title says, I have my model trained originally with fp16 on huggingface, and I converted it to FasterTransformer weight format.
This is the command I used to convert, and the size of the result folder.
$ du -h -d 1 25G ./1-gpu 25G .
As the command prints out, FasterTransformer converted output weight folder is 25GB, and original huggingface model's size is also 25GB.
Problem occurs when I load it using tritonserver and fastertransformer_backend.
When I load it using fp16, it just loads fine.
But when I load it with bf16, it suddenly takes up twice the memory.
I guess taking twice the memory means that is is loaded as fp32,
so does it mean then you can't load a model saved as fp16 into bf16,
or is it that just Gpt-NeoX model doesn't support bf16 format?
Reproduced Steps
In config.pbtxt
For fp16
For bf16
The text was updated successfully, but these errors were encountered: