You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to #9, I get wrong spans if I follow the instructions and run the original BERT code on the GoldP task. I solved it by normalizing the data while reading and replacing the non-breaking space (rampant especially in Russian) to the regular space u.normalize('NFKC', <text>).replace(u"\xa0", u" "). It would probably save some time for others if this is mentioned in the readme.
I would have sent a pull request, but the code changes have to be made in the BERT repository and not here. A potential solution would be to release pre-normalized UTF-8 encoded data.
The text was updated successfully, but these errors were encountered:
Similar to #9, I get wrong spans if I follow the instructions and run the original BERT code on the GoldP task. I solved it by normalizing the data while reading and replacing the non-breaking space (rampant especially in Russian) to the regular space
u.normalize('NFKC', <text>).replace(u"\xa0", u" ")
. It would probably save some time for others if this is mentioned in the readme.I would have sent a pull request, but the code changes have to be made in the BERT repository and not here. A potential solution would be to release pre-normalized
UTF-8
encoded data.The text was updated successfully, but these errors were encountered: