-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have implemented parse_roman() function #64
base: master
Are you sure you want to change the base?
I have implemented parse_roman() function #64
Conversation
I have implemented roman_numeral() function with the use of regex to build the number.
Codecov Report
@@ Coverage Diff @@
## master #64 +/- ##
==========================================
- Coverage 98.78% 98.12% -0.66%
==========================================
Files 86 86
Lines 328 374 +46
Branches 60 78 +18
==========================================
+ Hits 324 367 +43
Misses 1 1
- Partials 3 6 +3
|
Now I think we need to make I’m thinking that we may also want to include a new parameter to those functions, For cases where users want to exclude a system, rather than include it, it may make sense to expose a public variable that contains the list of all supported numeral systems, e.g. Once these changes are done, I think we no longer need I’m also thinking that it may be possible to have slightly different implementations of |
Let’s say we introduce a parse_number('V', numeral_systems=['roman']) If we also create a all_numeral_systems_but_roman = [system for system in NUMERAL_SYSTEMS if system != 'roman']
parse_number('V', numeral_systems=all_numeral_systems_but_roman) If in the future we add support for additional number systems, that code would still work as intended. Whereas if users had to hardcode the list of systems, new systems added later would also be excluded. |
Added the _valid_input_by_numeral_system(), it will decide the numeral system based on the input string if not given by the user.
Hi @Gallaecio hope you and your family are safe. I added the |
added NUMERAL_SYSTEM list where you decide which numeral system should be used to parse.
Added more test cases to test the numeral system
Added Support for numeral systems
I have added all_numeral_systems_but_roman = [system for system in NUMERAL_SYSTEMS if system != 'roman']
all_numeral_systems_but_decimal = [system for system in NUMERAL_SYSTEMS if system != 'decimal']
>>>parse('Built in MMLXXVII.')
'Built in 2077.'
>>>parse( 'Built in MMLXXVII.', ['decimal'])
'Built in MMLXXVII.'
>>>parse('I was given two IV injections.', all_numeral_systems_but_roman)
'I was given 2 IV injections.'
>>>parse('I was given two IV injections.', all_numeral_systems_but_decimal)
1 was given two 4 injections.'
>>>parse('I was given two IV injections.')
'1 was given 2 4 injections.'
>>>parse('I have three apples.', all_numeral_systems_but_roman)
'I have 3 apples.' I have added all these examples as test cases. I wanted to ask that since |
It could allow users to fine-tune for performance if they know beforehand the numeral system of the input number, by making number-parser only use the desired number parser. Also, if they want parsing to simply fail for something like |
Thanks for looking into these, ill make the changes |
Made all changes except .lower()
Parse roman(numeral support)
@Gallaecio hope you and your family are safe. |
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Parse roman(regex approach)
Minor changes to parser.py and added more test cases for numeral support
Hi @Gallaecio hope you and your family are safe. Any thoughts on this? |
SENTENCE_SEPARATORS = [".", ","] | ||
SUPPORTED_LANGUAGES = ['en', 'es', 'hi', 'ru'] | ||
RE_BUG_LANGUAGES = ['hi'] | ||
NUMERAL_SYSTEMS = ('decimal', 'roman') | ||
ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s make this a private constant, so that we can freely rename it or move it in the future if we wish without breaking the API:
ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$" | |
_ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$" |
@@ -241,7 +244,11 @@ def parse_ordinal(input_string, language=None): | |||
return parse_number(output_string, language) | |||
|
|||
|
|||
def parse_number(input_string, language=None): | |||
def _is_roman(search_string): | |||
return re.search(ROMAN_REGEX_EXPRESSION, search_string, re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(?i)
is already in the pattern. Also, this function should return a boolean value.
return re.search(ROMAN_REGEX_EXPRESSION, search_string, re.IGNORECASE) | |
return bool(re.search(ROMAN_REGEX_EXPRESSION, search_string)) |
if _is_roman(input_string): | ||
numeral_systems = ['roman'] | ||
|
||
elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not _is_roman(input_string)
will always be true, given the preceding if
statement.
elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string): | |
elif language in SUPPORTED_LANGUAGES: |
Also, I am probably forgetting something, but why are we forcing decimal
for any supported language? I could understand doing it for en
given I
is an English word, but I don’t think we need to avoid Roman numerals in Spanish, for example.
elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string): | ||
numeral_systems = ['decimal'] | ||
|
||
for numeral_system in numeral_systems: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if numeral_systems
is None
.
return None | ||
number_built = _build_number(normalized_tokens, lang_data) | ||
if len(number_built) == 1: | ||
return int(number_built[0]) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those return None
seem like an issue now within a for
loop. It looks like they will prevent the for
loop to reach the next iteration (for the next numeral system).
Maybe you could move this code into a _parse_decimal
function, and only return if the result is not None
, else let the for
loop go to the next numeral system.
I have implemented the parse_roman() function.