-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
French language support #2
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The JSON files shouldn't be created in the bin
folder, though. Also, you have committed a whole lot of .class
files, I can clean them myself if you wish ;-)
When the PR files are cleaned up I will review it more in depth
Copied files from Mycroft + Some work on tokenizer
Hi! The part that I think will need the most work is the FrenchFormatter, as I didn't really modify it except to replace some (not all) strings, as I didn't want it to diverge too much from the English version. Thanks :) |
Any update on this? Because the translations are good. |
Sorry, I will soon take care of this, I've been really busy lately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! And sorry for getting to this so late.
Does French have both long-scale and short-scale ways of pronouncing big numbers? English has both, but for example Italian does not. So if French is more similar to Italian maybe you may want to copy some structure from ItalianFormatter
.
The code at the moment does not compile because the code uses subThousand
and appendSplitGroups
, but those functions have not been copied over from EnglishFormatter
. Was this done by accident or do you wish to implement them in a different way?
Also, I noticed you have already translated tokenizer.json
, the file containing word binding to make parsing easier. I think it's better to first implement formatting and after that works focus on parsing, though. So there is no need for you to work on that atm.
If anything I suggested is too Java-code-y to do, feel free to tell me :-)
"thousand_separator" | ||
], | ||
"values": [ | ||
" ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A space can't be a word, since spaces are word separators. If you want to say that thousands can be separated by spaces (i.e. nothing), you need to do so in Java code.
/* Please note that there is two way of saying years and centuries before 2000. For exemple: | ||
1. mille (thousand) neuf (nine) cent (hundred) quatre-vingt (90) quatre (4) | ||
2. dix-neuf (nineteen) cent (hundred) quatre-vingt (90) quatre (4). (Slightly old-fashioned but common for years before 1900) | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's perfectly ok for date time formatters to just return one possibility :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (z < 1000) { | ||
groupName = subThousand(z, i == 0 && ordi); | ||
} else { | ||
groupName = subThousand(z / 1000, false) + " thousand"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are untranslated: "thousand", "thousandth", "th", "zero", "point" and maybe others
numbers/src/test/java/org/dicio/numbers/lang/fr/FrenchFormatter.java
Outdated
Show resolved
Hide resolved
import org.dicio.numbers.util.MixedFraction; | ||
|
||
public class FrenchFormatter extends NumberFormatter { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also translate the following arrays from EnglishFormatter
into French, otherwise the code below won't work: NUMBER_NAMES, NUMBER_NAMES_SHORT_SCALE, NUMBER_NAMES_LONG_SCALE, ORDINAL_NAMES, ORDINAL_NAMES_SHORT_SCALE, ORDINAL_NAMES_LONG_SCALE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will work on that. Just one question though: what is the difference between NUMBER_NAMES in the Formatter and their copy in tokenizer.json? Should we try and pull their value from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer contains the code that allows parsing numbers. And each number may have different words. But when we do formatting, it works the other way around, and so there should be one unique mapping from number to word. The two configurations can definitely be merged in some way, but for the moment I decided to keep it simple.
numbers/src/test/java/org/dicio/numbers/lang/fr/FrenchFormatter.java
Outdated
Show resolved
Hide resolved
public String niceTime(LocalTime time, boolean speech, boolean use24Hour, boolean showAmPm) { | ||
// TODO Auto-generated method stub | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be taken care of separately?
So French has two ways in speaking but only one in writing |
I was not clear enough, what I meant is this: https://en.wikipedia.org/wiki/Long_and_short_scales |
…r.java Co-authored-by: Stypox <stypox@pm.me>
So in this case we have the same system as in Italian |
Hi! |
@MXC48 If you want to talk/split work you can contact me on discord :) |
Ok, thank you! Since French works like Italian and does not have both long scale and short scale numbers, you will not need many parts related to the |
Great! Just to be sure: should we duplicate numbers's name both in tokenizer.json as well as hardcoded into FrenchFormatter? |
Yeah, just answered now. For now numbers should be duplicated. Thanks to you! :-) |
I've fixed some of the problems and added a few fixmes here and there. I still have to add NUMBER_NAMES and ORDINAL_NAMES, maybe if you want to take care of it @MXC48 ? |
assertEquals("quatre-vingt-dix", pf.pronounceNumber(89.9).places(0).get()); | ||
assertEquals("moins un", pf.pronounceNumber(-0.5).places(0).get()); | ||
assertEquals("zéro", pf.pronounceNumber(-0.4).places(0).get()); | ||
assertEquals("six virgue trois", pf.pronounceNumber(6.28).places(1).get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the omission of the l in "virgule"
assertEquals("deux-cent-un-millionième", pf.pronounceNumber(201000000000.0).ordinal(T).shortScale(F).get()); | ||
//TODO: Check this for french correctness as well as short/long scale issues (billion/billard) | ||
assertEquals("neuf-cent-treize-milliard-quatre-vingt-million-six-cent-mille-soixante-mille-cent-soixante-quatrième", pf.pronounceNumber(913080600064.0).ordinal(T).shortScale(T).get()); | ||
assertEquals("neuf-cent-treize-mille-quatre-vingt-million-six-cent-mille-soixante-mille-soixante-quatrième", pf.pronounceNumber(913080600064.0).ordinal(T).shortScale(F).get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you can repeat several times in a row "mille"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be "neuf-cent-treize-milliard-quatre-vingt-million-six-cent-mille-soixante-quatrième"? "mille million" isn't very common is it? (Though saying 913080600064th itself isn't very common -_-')
assertEquals("neuf-cent-treize-mille-quatre-vingt-million-six-cent-mille-soixante-mille-soixante-quatrième", pf.pronounceNumber(913080600064.0).ordinal(T).shortScale(F).get()); | ||
assertEquals("trilliard-deux-millionième", pf.pronounceNumber(1000002000000.0).ordinal(T).shortScale(T).get()); | ||
assertEquals("millard-deux-millionième", pf.pronounceNumber(1000002000000.0).ordinal(T).shortScale(F).get()); | ||
assertEquals("quatre-triliard-un-millionième", pf.pronounceNumber(4000001000000.0).ordinal(T).shortScale(T).get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the omission of an l in trilliard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nowhere near a trilliad? that's just "quatre billion un millionième"?
assertEquals("infini", pf.pronounceNumber(Double.POSITIVE_INFINITY).get()); | ||
assertEquals("moins l'infini", pf.pronounceNumber(Double.NEGATIVE_INFINITY).scientific(F).get()); | ||
assertEquals("moins l''infini", pf.pronounceNumber(Double.NEGATIVE_INFINITY).scientific(T).get()); | ||
assertEquals("Non défini", pf.pronounceNumber(Double.NaN).get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"non défini" or the litteral translation "pas un nombre" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Non défini", as in 10 divided by 0.
What should be done to complete the addition in the main code? |
also interested in testing: gentle "up" |
49bc02d
to
b6a68df
Compare
bda9e31
to
d54de33
Compare
Hi!
I'm trying my hand at adding support for the french language.
I think that I have added everything indicated in the README.
Could you tell me what would be the next steps ?
I am familiar with java and I'll be willing to help, but I'd appreciate if you could point me in the right direction (for exemple, would I need to create a FrenchFormater and hardcode values into it?).
Thanks :)