overlapping sentences with long texts exceeding max_token_per_batch #37

davidberenstein1957 · 2023-06-19T05:58:08Z

Hi,

I used to work a lot with coreference for longer texts and I think it would be a nice addition to overlap sentences to have a more robust model w.r.t. longer texts. I also want to work on this.

Regards,
David

shon-otmazgin · 2023-06-19T07:33:57Z

Hello @davidberenstein1957,

To do overlap between sentences to have more attention between segments? if yes, recent works (I think the one using BERT for coreference) showed it is not necessary, also it comes with more computation time.

davidberenstein1957 · 2023-06-19T08:36:21Z

No overlap, as in that your entire text might not fit into (GPU) memory.

shon-otmazgin · 2023-06-19T08:39:55Z

can you share more details? if you set max_tokens_in_batch to your longest doc in the dataset is it still OOM?

davidberenstein1957 · 2023-06-19T09:00:30Z

Similarly, when you exceed the length of the 'max_tokens' for the transformer used, it might still be interesting to use the last 'x' sentences and use that prepended text for the next chunk, so that you can infer some knowledge from that batch and can later on merge the clusters if they contain the same spans.

shon-otmazgin · 2023-06-19T17:13:49Z

If I understand correctly, you want want to overlap between batches? I can't understand the benefit of it.

davidberenstein1957 · 2023-06-22T18:09:56Z

let's say you have a text of length 3x, where the maximum number of tokens in a single pass is 2x, then it might make sense to allow for passing this text in segments 1:2 and segments 2:3. Afterwards, you could re-align/merge coref clusters based on the overlapping sentences in segment 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overlapping sentences with long texts exceeding max_token_per_batch #37

overlapping sentences with long texts exceeding max_token_per_batch #37

davidberenstein1957 commented Jun 19, 2023

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 19, 2023

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 19, 2023 •

edited

Loading

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 22, 2023

overlapping sentences with long texts exceeding max_token_per_batch #37

overlapping sentences with long texts exceeding max_token_per_batch #37

Comments

davidberenstein1957 commented Jun 19, 2023

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 19, 2023

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 19, 2023 • edited Loading

shon-otmazgin commented Jun 19, 2023

davidberenstein1957 commented Jun 22, 2023

davidberenstein1957 commented Jun 19, 2023 •

edited

Loading