Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of diacritrics #18

Open
matteocontrini opened this issue Nov 22, 2015 · 4 comments
Open

List of diacritrics #18

matteocontrini opened this issue Nov 22, 2015 · 4 comments

Comments

@matteocontrini
Copy link

Do you have a list of diacritrics that get converted?

In Italy, the fiscal code ("codice fiscale") has recently changed in a way that all the diacritrics are converted to ASCII characters. This table has been provided for the conversion.

How can I know if those characters are actually supported by your module, given that there's just a list of Unicodes in the source code?

Thanks

@andrewrk
Copy link
Owner

Do you need programmatic access to the list of diacritics or just want to evaluate the module?

@matteocontrini
Copy link
Author

I'll try creating a test.
I wanted to know if the module correctly handles all those cases, and it's not easy to know since there's not a list of supported diacritics.
But that's fine, I'll try parsing that PDF. I'll let you know

@matteocontrini
Copy link
Author

Ok, first of all, congratulations, because the module found was able to convert almost every character of that table.
But there are some that differ:

Ä gets converted to  A, document says AE
ä gets converted to  A, document says AE
Å gets converted to  A, document says AA
å gets converted to  A, document says AA
Ð gets converted to  DH, document says D
IJ gets converted to  IJ, document says IJ <-- 
ij gets converted to  IJ, document says IJ <-- these 2 are not converted
Ö gets converted to  O, document says OE
ö gets converted to  O, document says OE
Ø gets converted to  O, document says OE
ø gets converted to  O, document says OE
Ü gets converted to  U, document says UE
ü gets converted to  U, document says UE

Note that I uppercased the results becaues that's what the table gives me. The code.

I don't know which variant is the right one in the test results. I can tell you that the PDF table linked above is almost the same from here, which talks about some ISO standards.

@homersimpsons
Copy link

I think your diaritics 'translation' are about the Italian sound, while this implementation deals with visual

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants