Skip to content

Commit

Permalink
correctly truncates NER context_mask
Browse files Browse the repository at this point in the history
  • Loading branch information
Aethor committed Dec 26, 2024
1 parent 8fdda13 commit 2898194
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion renard/ner_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,9 +104,18 @@ def __getitem__(self, index: int) -> BatchEncoding:
truncation=True,
max_length=512, # TODO
is_split_into_words=True,
return_length=True,
)

batch["context_mask"] = self._context_mask[index]
length = batch["length"][0]
del batch["length"]
if self.tokenizer.truncation_side == "right":
batch["context_mask"] = self._context_mask[index][:length]
else:
assert self.tokenizer.truncation_side == "left"
batch["context_mask"] = self._context_mask[index][
len(batch["input_ids"]) - length :
]

return batch

Expand Down

0 comments on commit 2898194

Please sign in to comment.