Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: standardize devices parameter and device initialization #3062

Merged
merged 3 commits into from
Aug 31, 2022
Merged

feat: standardize devices parameter and device initialization #3062

merged 3 commits into from
Aug 31, 2022

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Aug 18, 2022

Related Issues

Proposed Changes:

  • Solved it by making sure all components have devices constructor parameter and that device selection is uniform across all components i.e. by using initialize_device_settings helper function
  • These changes require still unreleased transformers v4.21.2, when released we should pin this version in our setup.cfg

How did you test it?

Not tested yet - in preparation

Todo

Update to transformers v4.21.2
Run test suite on Apple Silicone using devices parameter in components where appropriate

@vblagoje vblagoje requested review from a team as code owners August 18, 2022 14:50
@vblagoje vblagoje requested review from bogdankostic and removed request for a team August 18, 2022 14:50
@@ -64,6 +65,8 @@ def __init__(
you have at least `embedding_dim`*`scoring_batch_size`*4 bytes available in GPU memory.
Since the data is originally stored in CPU memory there is little risk of overruning memory
when running on CPU.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider changing the doc string to something like List of torch devices (e.g. cuda, cpu, mps) to limit inference to certain devices ... since a torch.device does not have to be a GPU. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I just read the docstring for initialize_device_settings and see that devices is only used if use_cuda is True so I think we can leave this unchanged. Perhaps add a sentence explaining that this list is only used if use_gpu is set to True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, excellent idea

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this one but it doesn't seem like it is going to work out of the box. For Tutorial 3, during pipeline run I get the following exception:

  File "/Users/vblagoje/workspace/haystack/haystack/pipelines/base.py", line 512, in run
    node_output, stream_id = self._run_node(node_id, node_input)
  File "/Users/vblagoje/workspace/haystack/haystack/pipelines/base.py", line 439, in _run_node
    return self.graph.nodes[node_id]["component"]._dispatch_run(**node_input)
  File "/Users/vblagoje/workspace/haystack/haystack/nodes/base.py", line 201, in _dispatch_run
    return self._dispatch_run_general(self.run, **kwargs)
  File "/Users/vblagoje/workspace/haystack/haystack/nodes/base.py", line 245, in _dispatch_run_general
    output, stream = run_method(**run_inputs, **run_params)
  File "/Users/vblagoje/workspace/haystack/haystack/nodes/reader/base.py", line 84, in run
    results = predict(query=query, documents=documents, top_k=top_k)
  File "/Users/vblagoje/workspace/haystack/haystack/nodes/reader/base.py", line 145, in wrapper
    ret = fn(*args, **kwargs)
  File "/Users/vblagoje/workspace/haystack/haystack/nodes/reader/farm.py", line 867, in predict
    predictions = self.inferencer.inference_from_objects(
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/infer.py", line 596, in inference_from_objects
    return self.inference_from_dicts(
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/infer.py", line 578, in inference_from_dicts
    return Inferencer.inference_from_dicts(
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/infer.py", line 349, in inference_from_dicts
    return list(predictions)
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/infer.py", line 424, in _inference_with_multiprocessing
    predictions = self._get_predictions_and_aggregate(dataset, tensor_names, baskets)
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/infer.py", line 512, in _get_predictions_and_aggregate
    logits = self.model.forward(
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/model/adaptive_model.py", line 484, in forward
    output_tuple = self.language_model.forward(
  File "/Users/vblagoje/workspace/haystack/haystack/modeling/model/language_model.py", line 364, in forward
    return self.model(**params, return_dict=return_dict)
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 841, in forward
    embedding_output = self.embeddings(
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/transformers/models/roberta/modeling_roberta.py", line 128, in forward
    inputs_embeds = self.word_embeddings(input_ids)
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/Users/vblagoje/miniconda3/envs/haystack/lib/python3.8/site-packages/torch/nn/functional.py", line 2206, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

And yes, I have initialised FarmReader with:

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", devices=[torch.device("mps")])

Do you @sjrl have M1/M2 silicone to try it out as well? Use vblagoje:t_fix_devices branch, of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @vblagoje what version of PyTorch do you have installed? It looks like this issue could be related to the issue described here: https://discuss.pytorch.org/t/torch-embedding-fails-with-runtimeerror-placeholder-storage-has-not-been-allocated-on-mps-device/152124/6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced this issue to QA inferencer class constructor not passing devices parameter.

Nice, great catch!

I'll add the devices parameter in infer.py but we'll have to cleanup it up soon after we integrate this PR.

What cleanup do you mean exactly here?

But even with this fix my kernel dies due to some MPS torch unsupported ops.

Does this mean we need to wait on PyTorch to support more ops with MPS before we can use a FARMReader on MPS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup related to multiprocessing. I'll create an issue about it today. Yes, unfortunately, I was not able to run FarmReader. Here is the error:

NotImplementedError: The operator 'aten::cumsum.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Using CPU as fallback blew up the kernel. Seems risky even if it worked. So it seems like we'll have to wait a bit more.

Copy link
Member Author

@vblagoje vblagoje Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can like/vote for the comments mentioning aten::cumsum.out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using CPU as fallback blew up the kernel. Seems risky even if it worked. So it seems like we'll have to wait a bit more.

It seems like the fallback only works if you build pytorch from source pytorch/pytorch#77764 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a separate issue for aten::cumsum.out in PyTorch here: pytorch/pytorch#79112

@@ -68,8 +68,12 @@ def initialize_device_settings(
devices_to_use = [torch.device("cpu")]
n_gpu = 0
elif devices:
devices_to_use = devices
n_gpu = sum(1 for device in devices if "cpu" not in device.type)
if isinstance(devices[0], str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider replacing this if statement with any(isinstance(x, str) for x in devices) which will check if any of the devices provided are of type string. For example, if the user happens to provide [torch.device('cuda:0'), 'cuda:1'] then we would still construct a list of only torch.device types.

This will work because torch.device(torch.device('cuda')) still returns torch.device('cuda') so there is no danger there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, thank you @sjrl

@@ -64,6 +65,8 @@ def __init__(
you have at least `embedding_dim`*`scoring_batch_size`*4 bytes available in GPU memory.
Since the data is originally stored in CPU memory there is little risk of overruning memory
when running on CPU.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -64,6 +65,8 @@ def __init__(
you have at least `embedding_dim`*`scoring_batch_size`*4 bytes available in GPU memory.
Since the data is originally stored in CPU memory there is little risk of overruning memory
when running on CPU.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Unused if `use_gpu` is False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unused if `use_gpu` is False.
This parameter is not used if you set `use_gpu` to False.

@@ -70,11 +71,13 @@ def __init__(
:func:`~farm.infer.Inferencer.close_multiprocessing_pool` after you are
done using this class. The garbage collector will not do this for you!
:param disable_tqdm: Whether to disable tqdm logging (can get very verbose in multiprocessing)
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change it to:
Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -97,6 +98,9 @@ def __init__(
:param embed_title: Embedded the title of passage while generating embedding
:param prefix: The prefix used by the generator's tokenizer.
:param use_gpu: Whether to use GPU. Falls back on CPU if no GPU is available.
:param progress_bar: Whether to show a progress bar, defaults to True.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change to:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -341,6 +346,8 @@ def __init__(
:param min_length: Minimum length of generated text
:param num_beams: Number of beams for beam search. 1 means no beam search.
:param use_gpu: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -66,15 +66,14 @@ def __init__(
only predicts a single label. For multi-label predictions, no scaling is applied. Set this
to False if you do not want any scaling of the raw predictions.
:param progress_bar: Whether to show a progress bar while processing the documents.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -104,6 +105,8 @@ def __init__(
:param max_seq_len: Max sequence length of one input table for the model. If the number of tokens of
query + table exceed max_seq_len, the table will be truncated by removing rows until the
input size fits the model.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @agnieszka-m, but see #3062 (comment) for the wording. We shouldn't say GPUs as we are implying that only GPUs are the devices that can be used - but any device available on the machine and addressable from torch device (cuda, cpu, mps, xla etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply what it is now List of torch devices (e.g. cuda, cpu, mps) to limit inference to certain devices and not use all available ones (e.g. [torch.device('cuda:0')]). Unused if use_gpu is False. How would you reword it @agnieszka-m ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices. A list containing torch device objects and/or strings is supported (e.g. [torch.device('cuda:0'), "mps", "cuda:1"]). Specifying use_gpu=False cancels the devices parameter and falls back to a single cpu device. @agnieszka-m @sjrl ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I would only suggest a few edits

List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices. 
A list containing torch device objects and/or strings is supported (For example [torch.device('cuda:0'), "mps", "cuda:1"]). 
When specifying `use_gpu=False` the devices parameter is not used and a single cpu device is used for inference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect, I'll go with @sjrl version @agnieszka-m

@@ -66,10 +68,12 @@ def __init__(
:param max_seq_len: max sequence length of one input text for the model
:param doc_stride: length of striding window for splitting long texts (used if len(text) > max_seq_len)
:param batch_size: Number of documents to process at a time.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -88,10 +90,12 @@ def __init__(
Important: The summary will depend on the order of the supplied documents!
:param batch_size: Number of documents to process at a time.
:param progress_bar: Whether to show a progress bar.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@@ -63,10 +65,12 @@ def __init__(
:param clean_up_tokenization_spaces: Whether or not to clean up the tokenization spaces. (default True)
:param use_gpu: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
:param progress_bar: Whether to show a progress bar.
:param devices: List of GPU devices to limit inference to certain GPUs and not use all available ones (e.g. [torch.device('cuda:0')]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean the following:
List of GPU devices you want to limit inference to so as not to use all available GPUs. Example: [torch.device(cuda:0)].

@vblagoje
Copy link
Member Author

@sjrl, this one is ready for review now. cc @bogdankostic

@@ -341,6 +348,10 @@ def __init__(
:param min_length: Minimum length of generated text
:param num_beams: Number of beams for beam search. 1 means no beam search.
:param use_gpu: Whether to use GPU or the CPU. Falls back on CPU if no GPU is available.
:param devices: List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the missing docstring for the progress_bar?

self.model = _Text2SpeechModel.from_pretrained(
model_name_or_path, device=devices[0].type, **(transformers_params or {})
model_name_or_path, device=resolved_devices[0].type, **(transformers_params or {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand here it looks like only one device can be used to load the _Text2SpeechModel. If that is True should we enforce that devices can only have a length of 1 using an assert statement? Otherwise users might be confused as to why only one device is used if they provide multiple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more on this I wonder if it would be better to change the parameter from devices to device and make it type Union[str, torch.device] (so removing the List aspect altogether). Then we could just make sure to pass device as a list to initialize_device_settings(). For example,

resolved_devices, _ = initialize_device_settings(devices=[device], use_cuda=use_gpu, multi_gpu=False)

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this as well. But is it possible that in the future there'll be support for multiple devices for components? The best trade-off seems to be to allow a list but to log a warning that if multiple devices are passed the first on is selected. We can do this in all cases where only one device is currently supported. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds good to me. I like the idea of just logging a warning, but having the code still run.

devices, _ = initialize_device_settings(use_cuda=use_gpu, multi_gpu=False)
device = 0 if devices[0].type == "cuda" else -1
resolved_devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=False)
device = 0 if resolved_devices[0].type == "cuda" else -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that only cuda or cpu is supported for this document classifier? If so we may want to update the docstring to say that mps isn't supported for this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow good catch. No reason why not support mps here. I'll remove this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks to be present in quite a few places throughout Haystack just as a heads up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually, right now pipelines do not support devices other than cuda or cpu. HF unfortunately uses: torch.device("cpu" if device < 0 else f"cuda:{device}")

Copy link
Contributor

@sjrl sjrl Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's frustrating. They have comment related to this issue here: huggingface/transformers#17342 (comment) Do you know if this line from HF torch.device("cpu" if device < 0 else f"cuda:{device}") has not been updated yet? If so we could think about opening an issue.

Copy link
Contributor

@sjrl sjrl Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh it appears this has been fixed in the current main branch of HF https://github.com/huggingface/transformers/blob/d90a36d192e2981a41122c30a765c63158dd0557/src/transformers/pipelines/base.py#L763-L773
We will need to wait until they release a new version of the Transformers library I think before we can add support for mps here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: good news @sjrl HF will support MPS for pipeline in the next minor release. See huggingface/transformers#18494 for more details

):
super().__init__()

self.devices, _ = initialize_device_settings(use_cuda=use_gpu, multi_gpu=False)
self.devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=False)
Copy link
Contributor

@sjrl sjrl Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, consider adding an assert here for devices to be length 1 since we only support a single device for the model to be loaded onto.

Additionally below (on line 58) we have the line device=0 if self.devices[0].type == "cuda" else -1 which makes me concerned that mps would not be supported for loading this model.

Copy link
Member Author

@vblagoje vblagoje Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, not supported on mps right now. Will make notes for the components where only cpu and cuda are supported.

@@ -81,6 +86,10 @@ def __init__(
:type batch_size: int (optional)
:param progress_bar: Whether to show a progress bar, defaults to True.
:type progress_bar: bool (optional)
:param devices: List of torch devices (e.g. cuda, cpu, mps) to limit CrossEncoder inference to specific devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring for use_gpu

@@ -98,9 +107,9 @@ def __init__(
)
else:
raise ValueError("Provide either a QuestionGenerator or a non-empty list of questions/document pairs.")

self.devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding assert for devices to be of length 1 since we do not support multi GPU loading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, not assert but actual check as assert can be removed during code optimization.

devices, _ = initialize_device_settings(use_cuda=use_gpu, multi_gpu=False)
device = 0 if devices[0].type == "cuda" else -1
resolved_devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=False)
device = 0 if resolved_devices[0].type == "cuda" else -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about whether mps would be supported here.

"""
super().__init__()
self.devices, _ = initialize_device_settings(use_cuda=use_gpu, multi_gpu=False)
self.devices, _ = initialize_device_settings(devices=devices, use_cuda=use_gpu, multi_gpu=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above for adding assert to devices being length 1.

:param devices: List of torch devices (e.g. cuda, cpu, mps) to limit inference to specific devices.
A list containing torch device objects and/or strings is supported (For example
[torch.device('cuda:0'), "mps", "cuda:1"]). When specifying `use_gpu=False` the devices
parameter is not used and a single cpu device is used for inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate doc string of devices. It appears again a few lines down. Please delete one.

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, @vblagoje! And great review, @sjrl! I agree with all of your comments.

How should we proceed about the fact that "mps" is only supported as of the next release in transformers pipelines?

@vblagoje
Copy link
Member Author

Good job, @vblagoje! And great review, @sjrl! I agree with all of your comments.

How should we proceed about the fact that "mps" is only supported as of the next release in transformers pipelines?

Thanks Bogdan. It will be supported 4.22.2 which should be released imminently. After we bump to that HF release we'll integrate this change. Hopefully we can release it in 1.8. We'll see. Anyways, no rush to integrate this one until HF upgrade and thorough test on mps devices as well.

@vblagoje
Copy link
Member Author

@sjrl @bogdankostic please have a quick look at the last few commits and the changes made.

if len(resolved_devices) > 1:
logger.warning(
f"Multiple devices are not supported in {self.__class__.__name__} inference, "
f"using the first device {resolved_devices}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to resolved_devices[0] to grab first device

@sjrl
Copy link
Contributor

sjrl commented Aug 23, 2022

Looks good! One last comment. Also, I would consider adding a task to the description to remind us to pin the HF library to the new version in our setup.cfg because these changes will not be backwards compatible with older versions of the HF Transformers library.

@bogdankostic
Copy link
Contributor

Looks also good for me! Looking forward to using this soon 🚀

@vblagoje
Copy link
Member Author

HF released transformers v4.21.2 which includes pipeline fixes we were waiting for. Can we pin this transformers release for our 1.8 release @julian-risch @masci?

@julian-risch
Copy link
Member

julian-risch commented Aug 25, 2022

HF released transformers v4.21.2 which includes pipeline fixes we were waiting for. Can we pin this transformers release for our 1.8 release @julian-risch @masci?

Yes, sounds good. What you could do as preparation now already is to create a PR that pins the version to 4.21.2 and check whether any tests are failing, for example due to deprecated/discontinued features or breaking changes. Further, you can already have a look at the changelog and check for any changes that affect Haystack and the jump from the currently pinned version 4.20.1 to 4.21.2. If there are any relevant changes, please add them to the description of that PR.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CLAassistant
Copy link

CLAassistant commented Aug 29, 2022

CLA assistant check
All committers have signed the CLA.

@vblagoje
Copy link
Member Author

Although torch mps backend still does not have some crucial operators implemented and therefore we can not yet use Haystack on Apple Silicone - I think we should integrate this PR now. We are still early on our path to 1.9 release and we can catch potential issues sooner. I am mostly concerned about subtle bugs that might not be visible to our test suite. Once torch implements these operators we need we can try out Haystack on Apple Silicone and hopefully it will work. That way we'll have support in 1.9 release and not later. LMK what you guys think @masci @julian-risch @ZanSara @brandenchan @bogdankostic

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Thanks for hunting down all places in Haystack that uses gpus!

@vblagoje
Copy link
Member Author

@agnieszka-m , I think we addressed the issues you raised. If so, would you please approve as well - can't merge without your approval

@vblagoje vblagoje requested a review from agnieszka-m August 31, 2022 12:42
@vblagoje vblagoje merged commit 356537c into deepset-ai:main Aug 31, 2022
@masci masci changed the title Standardize devices parameter and device initialization feat: standardize devices parameter and device initialization Sep 21, 2022
@vblagoje vblagoje deleted the t_fix_devices branch February 28, 2023 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support M1 GPU in FARMReader
6 participants