Bert getting wrong spans (GoldP task) #10

rahular · 2020-12-12T23:03:42Z

Similar to #9, I get wrong spans if I follow the instructions and run the original BERT code on the GoldP task. I solved it by normalizing the data while reading and replacing the non-breaking space (rampant especially in Russian) to the regular space u.normalize('NFKC', <text>).replace(u"\xa0", u" "). It would probably save some time for others if this is mentioned in the readme.
I would have sent a pull request, but the code changes have to be made in the BERT repository and not here. A potential solution would be to release pre-normalized UTF-8 encoded data.

The text was updated successfully, but these errors were encountered:

PluviophileYU · 2021-04-29T10:47:47Z

Hi, do you mean that sometimes the label for GoldP task is not correct?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert getting wrong spans (GoldP task) #10

Bert getting wrong spans (GoldP task) #10

rahular commented Dec 12, 2020 •

edited

Loading

PluviophileYU commented Apr 29, 2021

Bert getting wrong spans (GoldP task) #10

Bert getting wrong spans (GoldP task) #10

Comments

rahular commented Dec 12, 2020 • edited Loading

PluviophileYU commented Apr 29, 2021

rahular commented Dec 12, 2020 •

edited

Loading