Remove the <UNK> token at the end of each sentence #22

jackyuanjie1990 · 2021-01-10T21:50:43Z

Hello,

Thank you for your implementation. However, I found one issue that every sentence ends up with the token (like below).
****source == a guy on a bike next to a
****target == a bicyclist passing a red commuter bus at a stop on a city
****predict == a man riding a bike on a city

I dig into the codes and found that the error happens at the function tokenize_and_map in data_handler.py
line.split(' ') can't remove the '\n', so that the last token of all the sentences contains '\n'
For example:
['a', 'very', 'clean', 'and', 'well', 'decorated', 'empty', 'bathroom\n']

To fix this bug, we just need to change line.split(' ') to line.split().
def tokenize_and_map(self,line):
return [self.vocab.get(token, self.UNK_TOKEN) for token in line.split()]

Thanks,

Jack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the <UNK> token at the end of each sentence #22

Remove the <UNK> token at the end of each sentence #22

jackyuanjie1990 commented Jan 10, 2021

Remove the <UNK> token at the end of each sentence #22

Remove the <UNK> token at the end of each sentence #22

Comments

jackyuanjie1990 commented Jan 10, 2021