-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting embedding of a sequence #89
Comments
Please try this:
|
It appears to me that AutoModel.from_pretrained() does not accept attn_implementation as a parameter, although the config does, so I got an error with your code. I tried changing it a bit by loading the model as follows : config = transformers.AutoConfig.from_pretrained(model_name_or_path,trust_remote_code=True) and it seems to load fine. Although, I still have two errors in the forward pass : huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
|
Hello, I am trying to get the output embeddings of my dataset from DNABERT2, and then use it with another model, using the following code :
import os
import transformers
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = transformers.AutoModel.from_pretrained(
"zhihan1996/DNABERT-2-117M",
trust_remote_code=True
)
model.config.use_cache = False
model.config.pretraining_tp = 1
model.eval()
tokenizer = transformers.AutoTokenizer.from_pretrained(
"zhihan1996/DNABERT-2-117M",
model_max_length=42,
padding_side="right",
use_fast=True,
trust_remote_code=True,
truncation=True,
padding='max_length',
max_length=40
)
seqs = ["ATCTAGCTAGACGTTACGCTACGCATGTACGTACGCTCAGTAGCATGCTAGCT","CGTAGGTCGTCTAGCTGATCAGTACGCATGCATAGCTAGCTGCATCGTAGCATCGATGATCGATCGATGATGC"]
model.to(device)
inputs = tokenizer(a, padding = 'max_length', truncation=True, max_length = 40, return_tensors='pt')
inputs = {key: value.to(device) for key, value in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
When I run, this code, I get the following error :
AssertionError Traceback (most recent call last)
Cell In[11], line 9
7 print(inputs)
8 with torch.no_grad():
----> 9 outputs = model(**inputs)
10 print(outputs)
File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:609, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs)
606 first_col_mask[:, 0] = True
607 subset_mask = masked_tokens_mask | first_col_mask
--> 609 encoder_outputs = self.encoder(
610 embedding_output,
611 attention_mask,
612 output_all_encoded_layers=output_all_encoded_layers,
613 subset_mask=subset_mask)
615 if masked_tokens_mask is None:
616 sequence_output = encoder_outputs[-1]
File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:447, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask)
445 if subset_mask is None:
446 for layer_module in self.layer:
--> 447 hidden_states = layer_module(hidden_states,
448 cu_seqlens,
449 seqlen,
450 None,
451 indices,
452 attn_mask=attention_mask,
453 bias=alibi_attn_mask)
454 if output_all_encoded_layers:
455 all_encoder_layers.append(hidden_states)
File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:328, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias)
306 def forward(
307 self,
308 hidden_states: torch.Tensor,
(...)
314 bias: Optional[torch.Tensor] = None,
315 ) -> torch.Tensor:
316 """Forward pass for a BERT layer, including both attention and MLP.
317
318 Args:
(...)
326 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
327 """
--> 328 attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
329 subset_idx, indices, attn_mask, bias)
330 layer_output = self.mlp(attention_output)
331 return layer_output
File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:241, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias)
219 def forward(
220 self,
221 input_tensor: torch.Tensor,
(...)
227 bias: Optional[torch.Tensor] = None,
228 ) -> torch.Tensor:
229 """Forward pass for scaled self-attention without padding.
230
231 Arguments:
(...)
239 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch)
240 """
--> 241 self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
242 attn_mask, bias)
243 if subset_idx is not None:
244 return self.output(index_first_axis(self_output, subset_idx),
245 index_first_axis(input_tensor, subset_idx))
File ~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/bert_layers.py:182, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias)
180 bias_dtype = bias.dtype
181 bias = bias.to(torch.float16)
--> 182 attention = flash_attn_qkvpacked_func(qkv, bias)
183 attention = attention.to(orig_dtype)
184 bias = bias.to(bias_dtype)
File ~/.local/lib/python3.10/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale)
1019 if qkv.stride(-1) != 1:
1020 qkv = qkv.contiguous()
-> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward(
1022 qkv[:, :, 0],
1023 qkv[:, :, 1],
1024 qkv[:, :, 2],
1025 bias=bias,
1026 causal=causal,
1027 softmax_scale=softmax_scale)
1028 ctx.save_for_backward(qkv, o, lse, bias)
1029 ctx.causal = causal
File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/d064dece8a8b41d9fb8729fbe3435278786931f1/flash_attn_triton.py:781, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale)
778 assert q.dtype == k.dtype == v.dtype, 'All tensors must have the same type'
779 assert q.dtype in [torch.float16,
780 torch.bfloat16], 'Only support fp16 and bf16'
--> 781 assert q.is_cuda and k.is_cuda and v.is_cuda
782 softmax_scale = softmax_scale or 1.0 / math.sqrt(d)
784 has_bias = bias is not None
AssertionError:
I guess this error is due to the structure of the network, that may require the data to be fed differently.
Could you tell me how I could simply get the output embeddings from DNABERT2 please ?
The text was updated successfully, but these errors were encountered: