Fix tokenising when using using more than just a-zA-Z #37

robotdana · 2018-11-30T02:00:12Z

Previously: Händler would be tokenized as ndler or ändler depending on python version
Rather than the expected händler

Solution: use regexp rather than re.
This gives us the ability to use unicode character clasess such as [[:upper:]] and [[:lower:]]

Fixes #35

I'm usually a ruby developer not a python developer I don't know how to get the regex library working on 2.7 or how to compare the test strings in a unicode-aware way (they're different on my mac vs on travis, if one passes the other fails)

But it mostly works

Previously: `Händler` would be tokenized as `ndler` or `ändler` depending on python version Rather than the expected `händler` Solution: use `regexp` rather than `re`. This gives us the ability to use unicode character clasess such as `[[:upper:]]` and `[[:lower:]]` Fixes myint#35

myint · 2018-12-23T18:24:50Z

Thanks! I haven't tried the regex module before. I'll take a look when I have more time.

robotdana · 2019-09-22T09:15:24Z

If you're interested, i took the really long way round fixing this by creating my own spell checker https://github.com/robotdana/spellr

robotdana force-pushed the diacritics branch 3 times, most recently from 57da098 to c8bd64d Compare November 30, 2018 02:52

robotdana force-pushed the diacritics branch from c8bd64d to 08b4eff Compare November 30, 2018 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tokenising when using using more than just a-zA-Z #37

Fix tokenising when using using more than just a-zA-Z #37

robotdana commented Nov 30, 2018 •

edited

Loading

myint commented Dec 23, 2018

robotdana commented Sep 22, 2019 •

edited

Loading

Fix tokenising when using using more than just a-zA-Z #37

Are you sure you want to change the base?

Fix tokenising when using using more than just a-zA-Z #37

Conversation

robotdana commented Nov 30, 2018 • edited Loading

myint commented Dec 23, 2018

robotdana commented Sep 22, 2019 • edited Loading

robotdana commented Nov 30, 2018 •

edited

Loading

robotdana commented Sep 22, 2019 •

edited

Loading