Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to Add Option to Disable mmap in transformers | Loading models is taking too much time through due to mmap on storage over network case. #33366

Closed
2 of 4 tasks
mrrfr opened this issue Sep 7, 2024 · 2 comments
Labels
bug Core: Modeling Internals of the library; Models.

Comments

@mrrfr
Copy link

mrrfr commented Sep 7, 2024

System Info

  • Ubuntu 22.04
  • Troch 2.4.0
  • Cuda 12.4
  • Transformers 4.44.2
  • Python 3.11
  • Diffusers 0.30.2

Who can help?

I will say mainly @ArthurZucker but its more general issue as its involving the base class of transformers pretrained model.

Here's an issue explanation :

I am currently using the transformers library to load CLIPTextModel in a Kubernetes environment where I mount an S3 bucket via the S3 CSI driver as a persistent volume to access models. While accessing large files (around 30 GB), I am experiencing severe performance issues, and after investigating, I believe the root cause is related to the forced usage of mmap when loading model weights.

It seems that the current implementation in this section of the code forces the use of mmap without providing an option to disable it. This behavior is highly problematic in storage-over-network use cases, as each mmap call introduces significant latency and performance bottlenecks due to the overhead of network access.

I think the feature was introduced here => #28331

It would be extremely useful if there were a flag or option to disable mmap usage when loading models, allowing users to load the files directly into memory instead. This would enable users like me, to avoid the network-bound performance issues.

I've already tried to find a workaround playing with env variable to disable mmap, but the issue is that i loss so much performance.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

It's quite hard to reproduce as u need to have AWS Account and CSI Driver. But I belive this issue can be reproduced on any storage over network case.

Anyway here the doc for the driver i used, if needed it can be deployed quite fast on a k8s cluster with the doc https://github.com/awslabs/mountpoint-s3-csi-driver?tab=readme-ov-file

Here u can find a deployement manifest https://github.com/awslabs/mountpoint-s3-csi-driver/blob/main/examples/kubernetes/static_provisioning/static_provisioning.yaml

To reproduce you just have to put the models on the S3 bucket, and try to load them through CLIPTextModel.from_pretrained.

Expected behavior

Loading should be fast.

@mrrfr mrrfr added the bug label Sep 7, 2024
@LysandreJik LysandreJik added the Core: Modeling Internals of the library; Models. label Sep 9, 2024
@LysandreJik
Copy link
Member

Hey @mrrfr, this is the case only for files that are saved in the .bin format, which are unsafe. Would it be possible for you to use .safetensors files, which are safer and don't use mmap to load?

Copy link

github-actions bot commented Oct 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Core: Modeling Internals of the library; Models.
Projects
None yet
Development

No branches or pull requests

2 participants