-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
langchain_huggingface: Fix multiple GPU usage bug in from_model_id function #23628
langchain_huggingface: Fix multiple GPU usage bug in from_model_id function #23628
Conversation
- Avoid the ambiguity of device_map variable by raising error when people try to pass device_map as key-value pair in model_kwargs - removing the unused device_map variable in the hf_pipeline function call
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
…OL/langchain into fix-hfpipeline-multigpu
…d assignment ("dict[Any, Any] | None") [index]
@Jofthomas could you take a look and mark ready to review by the langchain team if it looks good on the huggingface side? |
@efriis I wanted to kindly follow up on the pull request I submitted over a month ago. I understand that everyone is busy, but I would greatly appreciate it if you could take a moment to check it and let me know if there's anything further I need to address. |
Hey, Sorry for the wait, I will check it |
Looking at the If this is incorrect, should probably fix that in |
Thanks for the reply. Actually, my PR addresses the issue you mentioned. The current implementation has a problem where the The I tried my best to clarify the reasoning, but the overlap in variable names between these two functions might make it a bit confusing. |
doesn't this seem like a bug in As a holdover, can't you just pass |
@efriis I don't believe this is a bug in Additionally, I believe my implementation won't negatively impact existing users who pass
|
yes but doesn't this documentation make https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline |
(not recreating the model object - just the routing of the parameter) |
@efriis, I’m not entirely clear on the point you made. Both As a result, regardless of the value or data type of I understand that In summary, the existing langchain/libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py Lines 99 to 130 in 038c287
langchain/libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py Lines 216 to 225 in 038c287
|
I think you might be confusing the In order to help me understand your case, could you write 2 code snippets:
|
@efriis First of all, i didn't mix up the
from transformers import pipeline
pipe = pipeline(model="meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto")
# checking GPU memory usage by nvidia-smi, the model is properly assigned across multiple GPU
from langchain_huggingface import HuggingFacePipeline
pipe = HuggingFacePipeline.from_model_id(
model_id = "meta-llama/Meta-Llama-3-8B-Instruct",
task="text-generation",
device_map="auto"
)
# following message are printed in terminal showing that no device is used
# Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU. Lastly, for your question, the answer is no for current implementation and that's exactly why i am making this PR to fix this counter-intuitive bug. Also, I realized that this bug persists across 2 HuggingFacePipeline implementations , one from langchain_huggingface and the other from langchain_community. |
thank you for clarifying! I misunderstood the original point - sorry about that! if we pass a model object instead of a model string to I'll reopen and plan on merging after 0.3 changes go in. main thing that would be great to get into the 0.3 releases would be to raise an exception instead of a warning for when both are set |
if model_kwargs is None: | ||
model_kwargs = {} | ||
model_kwargs["device_map"] = device_map | ||
device = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as the comment above, this line was used as a temporary fix. If the default value of device is changed, this line is no longer needed
|
||
if device_map is not None: | ||
if device is not None: | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.warning( | |
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we simply escalate it from warning to error, it would always be triggered when we set device_map
as the default value of device is -1(cpu) in current implementation. I checked the documentation of transformer and realized that the default value of device for pipeline
changed from -1 to None. I tested both value and they have same behavior loading the model into CPU, the only difference is their logging message in terminal
pipe = HuggingFacePipeline.from_model_id(model_id = "microsoft/phi-2",task="text-generation",device=None)
# message : Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
pipe = HuggingFacePipeline.from_model_id(model_id = "microsoft/phi-2",task="text-generation",device=-1)
# message : Device has 1 GPUs available. Provide device={deviceId} to `from_model_id` to use availableGPUs for execution. deviceId is -1 (default) for CPU and can be a positive integer associated with CUDA device id.
Despite the log message for device=-1
looks better, given that it is a legacy value that could cause conflicts , updating the default to None will align the behavior with the latest practices and prevent unexpected errors in the future. So, I think we should also change the default value of device in line 77 while changing this from warning to error.
@@ -218,7 +230,6 @@ def from_model_id( | |||
model=model, | |||
tokenizer=tokenizer, | |||
device=device, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device=device, | |
device=device, | |
device_map=device_map, |
would it make sense to keep this passthrough incase transformers uses it in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar reason as above. There is also a similar parameter validation to check if both device
and device_map
are not None in pipeline
. So, It would always be triggered as the current default value for device is -1 and device_map is also not None (either be 'auto' or a dictionary). So, we cant simply keep it without considering the device
parameter value.
libs/partners/huggingface/langchain_huggingface/llms/huggingface_pipeline.py
Outdated
Show resolved
Hide resolved
…ce_pipeline.py Co-authored-by: Erick Friis <erickfriis@gmail.com>
Co-authored-by: Erick Friis <erickfriis@gmail.com>
@efriis sorry i am not very familiar with git and accidentally commit all your suggestions. In short, your suggestions might cause bug without changing the default value of another parameter |
@efriis I’ve reviewed your suggestions and made the necessary changes accordingly. Please have a look and let me know if anything else needs further adjustment. |
@efriis |
hey! sorry for the delay here - having some trouble getting a hold of the huggingface team regarding their open PRs, and I'm not as familiar with huggingface's multiple GPU setup, which also doesn't get tested in our CI to confirm this works or not. @Jofthomas if you could please take a look. Otherwise going to ship it as-is early next week |
Thanks for the update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accelerate/transoformers maintainer here! LGTM! As you said, we need to pass the device_map in the _model_kwargs if we are going to load the model before passing it to the pipeline.
When using the from_model_id function to load a Hugging Face model for text generation across multiple GPUs, the model defaults to loading on the CPU despite multiple GPUs being available using the expected format
Currently, to enable multiple GPU , we have to pass in variable in this format instead
This issue arises due to improper handling of the device and device_map parameters.
This error occurs because, in from_model_id, the default values in from_model_id for device and device_map are -1 and None, respectively. It would passes the statement (
device_map is not None and device < 0
) and keep the device as -1 so the pipeline function later raises an error when trying to move a GPU-loaded model back to the CPU.langchain/libs/community/langchain_community/llms/huggingface_pipeline.py
Lines 204 to 213 in 19eb82e
If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.