You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your implementation. However, I found one issue that every sentence ends up with the token (like below).
****source == a guy on a bike next to a
****target == a bicyclist passing a red commuter bus at a stop on a city
****predict == a man riding a bike on a city
I dig into the codes and found that the error happens at the function tokenize_and_map in data_handler.py
line.split(' ') can't remove the '\n', so that the last token of all the sentences contains '\n'
For example:
['a', 'very', 'clean', 'and', 'well', 'decorated', 'empty', 'bathroom\n']
To fix this bug, we just need to change line.split(' ') to line.split().
def tokenize_and_map(self,line):
return [self.vocab.get(token, self.UNK_TOKEN) for token in line.split()]
Thanks,
Jack
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for your implementation. However, I found one issue that every sentence ends up with the token (like below).
****source == a guy on a bike next to a
****target == a bicyclist passing a red commuter bus at a stop on a city
****predict == a man riding a bike on a city
I dig into the codes and found that the error happens at the function tokenize_and_map in data_handler.py
line.split(' ') can't remove the '\n', so that the last token of all the sentences contains '\n'
For example:
['a', 'very', 'clean', 'and', 'well', 'decorated', 'empty', 'bathroom\n']
To fix this bug, we just need to change line.split(' ') to line.split().
def tokenize_and_map(self,line):
return [self.vocab.get(token, self.UNK_TOKEN) for token in line.split()]
Thanks,
Jack
The text was updated successfully, but these errors were encountered: