I have implemented parse_roman() function #64

AmPhIbIaN26 · 2021-04-19T09:57:56Z

I have implemented the parse_roman() function.

I have implemented roman_numeral() function with the use of regex to build the number.

codecov · 2021-04-19T10:27:32Z

Codecov Report

Merging #64 (086f5ec) into master (c834854) will decrease coverage by 0.65%.
The diff coverage is 95.16%.

@@            Coverage Diff             @@
##           master      #64      +/-   ##
==========================================
- Coverage   98.78%   98.12%   -0.66%     
==========================================
  Files          86       86              
  Lines         328      374      +46     
  Branches       60       78      +18     
==========================================
+ Hits          324      367      +43     
  Misses          1        1              
- Partials        3        6       +3

Impacted Files	Coverage Δ
number_parser/parser.py	`97.57% <95.08%> (-0.78%)`	⬇️
number_parser/__init__.py	`100.00% <100.00%> (ø)`

Gallaecio · 2021-04-19T12:38:52Z

Now I think we need to make parse and parse_number use this function as well.

I’m thinking that we may also want to include a new parameter to those functions, numeral_systems, which allows to limit the numeral systems to support while parsing. We could make it so that by default all numeral systems are used, but you can use numeral_systems=['decimal'] to limit parsing to decimal numbers, or numeral_systems=['roman'] to limit it to roman numbers.

For cases where users want to exclude a system, rather than include it, it may make sense to expose a public variable that contains the list of all supported numeral systems, e.g. number_parser.NUMERAL_SYSTEMS, so that users can create an exclusion-based subset (e.g. [system for system in NUMERAL_SYSTEMS if system != 'roman']).

Once these changes are done, I think we no longer need parse_roman itself, and it could be renamed as _parse_roman to discourage people from using it.

I’m also thinking that it may be possible to have slightly different implementations of parse_roman for each of the user-facing number-parser functions, for performance-tuning. But I haven’t look into it in detail, and what’s most important is that things work as intended, we can worry about performance later.

AmPhIbIaN26 · 2021-04-25T23:54:53Z

I still didn't understand how the user would use this [system for system in NUMERAL_SYSTEMS if system != 'roman']

Let’s say we introduce a numeral_systems parameter to parse_number so that users may use:

parse_number('V', numeral_systems=['roman'])

If we also create a NUMERAL_SYSTEMS with all supported numeral systems, users that want to exclude a numeral system can do:

all_numeral_systems_but_roman = [system for system in NUMERAL_SYSTEMS if system != 'roman']
parse_number('V', numeral_systems=all_numeral_systems_but_roman)

If in the future we add support for additional number systems, that code would still work as intended. Whereas if users had to hardcode the list of systems, new systems added later would also be excluded.

Added the _valid_input_by_numeral_system(), it will decide the numeral system based on the input string if not given by the user.

AmPhIbIaN26 · 2021-04-28T12:48:28Z

Hi @Gallaecio hope you and your family are safe.

I added the _valid_input_by_numeral_system() (in this fork) still have to improve this so it is easier to add another numeral system. I will work on that now and also improving the performance of _parse_roman().

added NUMERAL_SYSTEM list where you decide which numeral system should be used to parse.

Added more test cases to test the numeral system

Added Support for numeral systems

AmPhIbIaN26 · 2021-05-03T20:59:42Z

I have added numeral_systems as a parameter to parse(), the current system works like this

all_numeral_systems_but_roman = [system for system in NUMERAL_SYSTEMS if system != 'roman']
all_numeral_systems_but_decimal = [system for system in NUMERAL_SYSTEMS if system != 'decimal']

>>>parse('Built in MMLXXVII.')
'Built in 2077.'

>>>parse( 'Built in MMLXXVII.', ['decimal'])
'Built in MMLXXVII.'

>>>parse('I was given two IV injections.', all_numeral_systems_but_roman)
'I was given 2 IV injections.'

>>>parse('I was given two IV injections.', all_numeral_systems_but_decimal)
1 was given two 4 injections.'

>>>parse('I was given two IV injections.')
'1 was given 2 4 injections.'

>>>parse('I have three apples.', all_numeral_systems_but_roman)
'I have 3 apples.'

I have added all these examples as test cases. I wanted to ask that since parse_number() only takes in 1 input how will having numeral_systems as a parameter help?

number_parser/parser.py

tests/data/permutations/all_roman_numbers.csv

Gallaecio · 2021-05-04T06:43:11Z

I have added all these examples as test cases. I wanted to ask that since parse_number() only takes in 1 input how will having numeral_systems as a parameter help?

It could allow users to fine-tune for performance if they know beforehand the numeral system of the input number, by making number-parser only use the desired number parser. Also, if they want parsing to simply fail for something like I, instead of returning 1.

Update README.rst

AmPhIbIaN26 · 2021-05-04T13:04:14Z

Thanks for looking into these, ill make the changes

Made all changes except .lower()

Parse roman(numeral support)

AmPhIbIaN26 · 2021-05-06T16:02:06Z

@Gallaecio hope you and your family are safe.
I have made all the changes you suggested also added support for incorrect roman numbers.

number_parser/parser.py

Co-authored-by: Adrián Chaves <adrian@chaves.io>

Parse roman(regex approach)

Minor changes to parser.py and added more test cases for numeral support

AmPhIbIaN26 · 2021-05-29T11:15:44Z

Hi @Gallaecio hope you and your family are safe.

Any thoughts on this?

Gallaecio · 2021-07-14T08:48:08Z

number_parser/parser.py

 SENTENCE_SEPARATORS = [".", ","]
 SUPPORTED_LANGUAGES = ['en', 'es', 'hi', 'ru']
 RE_BUG_LANGUAGES = ['hi']
+NUMERAL_SYSTEMS = ('decimal', 'roman')
+ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$"


Let’s make this a private constant, so that we can freely rename it or move it in the future if we wish without breaking the API:

Suggested change

ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$"

_ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm|cd|d?c{0,4})(xc|xl|l?x{0,4})(ix|iv|v?i{0,4})$"

Gallaecio · 2021-07-14T08:48:48Z

number_parser/parser.py

@@ -241,7 +244,11 @@ def parse_ordinal(input_string, language=None):
    return parse_number(output_string, language)


-def parse_number(input_string, language=None):
+def _is_roman(search_string):
+    return re.search(ROMAN_REGEX_EXPRESSION, search_string, re.IGNORECASE)


(?i) is already in the pattern. Also, this function should return a boolean value.

Suggested change

return re.search(ROMAN_REGEX_EXPRESSION, search_string, re.IGNORECASE)

return bool(re.search(ROMAN_REGEX_EXPRESSION, search_string))

Gallaecio · 2021-07-14T08:50:59Z

number_parser/parser.py

+        if _is_roman(input_string):
+            numeral_systems = ['roman']
+
+        elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string):


not _is_roman(input_string) will always be true, given the preceding if statement.

Suggested change

elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string):

elif language in SUPPORTED_LANGUAGES:

Also, I am probably forgetting something, but why are we forcing decimal for any supported language? I could understand doing it for en given I is an English word, but I don’t think we need to avoid Roman numerals in Spanish, for example.

Gallaecio · 2021-07-14T08:52:30Z

number_parser/parser.py

+        elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string):
+            numeral_systems = ['decimal']
+
+    for numeral_system in numeral_systems:


This will fail if numeral_systems is None.

Gallaecio · 2021-07-14T09:00:58Z

number_parser/parser.py

+                return None
+            number_built = _build_number(normalized_tokens, lang_data)
+            if len(number_built) == 1:
+                return int(number_built[0])
+            return None


Those return None seem like an issue now within a for loop. It looks like they will prevent the for loop to reach the next iteration (for the next numeral system).

Maybe you could move this code into a _parse_decimal function, and only return if the result is not None, else let the for loop go to the next numeral system.

AmPhIbIaN26 added 3 commits April 18, 2021 02:41

Implemented parse_roman() function

61f57d7

Moved test_numeral_roman.py

6664b6f

Implemented roman_numera()l function

376db40

I have implemented roman_numeral() function with the use of regex to build the number.

AmPhIbIaN26 mentioned this pull request Apr 19, 2021

Add support for other numeral systems. #18

Open

AmPhIbIaN26 changed the title ~~I have implemented parse_roman() funciton~~ I have implemented parse_roman() function Apr 19, 2021

Adding numeral_system as a parameter to functions

ce4e5c6

AmPhIbIaN26 closed this Apr 25, 2021

AmPhIbIaN26 reopened this Apr 25, 2021

added _valid_input_by_numeral_system()

6d420a7

Added the _valid_input_by_numeral_system(), it will decide the numeral system based on the input string if not given by the user.

AmPhIbIaN26 added 5 commits May 3, 2021 12:59

Added NUMERAL_SYSTEMS

bf7d6e3

added NUMERAL_SYSTEM list where you decide which numeral system should be used to parse.

fixed Unicode issue for Hindi, Spanish and Russian

5104b76

Update test_numeral_roman.py

94d9441

Added more test cases to test the numeral system

Update test_number_parsing.py

475ad04

Merge pull request #1 from AmPhIbIaN26/parse_roman(numeral-support)

f847cf6

Added Support for numeral systems

Gallaecio reviewed May 4, 2021

View reviewed changes

AmPhIbIaN26 added 2 commits May 4, 2021 18:14

Update README.rst

45bfdc5

Merge pull request #2 from AmPhIbIaN26/parse_roman(numeral-support)

c13102f

Update README.rst

AmPhIbIaN26 added 5 commits May 4, 2021 19:49

Delete test.py

973664b

Made all the changes and added support for incorrect roman numbers

0e89680

Made all changes except .lower()

Removed .lower() from roman regex expressions

9704eff

Update test_numeral_systems.py

554d553

Merge pull request #3 from AmPhIbIaN26/parse_roman(numeral-support)

77ae66f

Parse roman(numeral support)

Gallaecio reviewed May 7, 2021

View reviewed changes

number_parser/parser.py Outdated Show resolved Hide resolved

number_parser/parser.py Outdated Show resolved Hide resolved

number_parser/parser.py Outdated Show resolved Hide resolved

AmPhIbIaN26 and others added 5 commits May 8, 2021 17:11

Update number_parser/parser.py

fb9bff7

Co-authored-by: Adrián Chaves <adrian@chaves.io>

Update number_parser/parser.py

480cf7c

Co-authored-by: Adrián Chaves <adrian@chaves.io>

Merge pull request #4 from AmPhIbIaN26/parse_roman(regex-approach)

faecf42

Parse roman(regex approach)

Added more test cases for better code coverage

cbc4661

Merge pull request #5 from AmPhIbIaN26/parse_roman(numeral-support)

086f5ec

Minor changes to parser.py and added more test cases for numeral support

Gallaecio reviewed Jul 14, 2021

View reviewed changes

	ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm\|cd\|d?c{0,4})(xc\|xl\|l?x{0,4})(ix\|iv\|v?i{0,4})$"
	_ROMAN_REGEX_EXPRESSION = "(?i)^(m{0,3})(cm\|cd\|d?c{0,4})(xc\|xl\|l?x{0,4})(ix\|iv\|v?i{0,4})$"

	return re.search(ROMAN_REGEX_EXPRESSION, search_string, re.IGNORECASE)
	return bool(re.search(ROMAN_REGEX_EXPRESSION, search_string))

	elif language in SUPPORTED_LANGUAGES and not _is_roman(input_string):
	elif language in SUPPORTED_LANGUAGES:

I have implemented parse_roman() function #64

Are you sure you want to change the base?

I have implemented parse_roman() function #64

Uh oh!

Conversation

AmPhIbIaN26 commented Apr 19, 2021

Uh oh!

codecov bot commented Apr 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Gallaecio commented Apr 19, 2021

Uh oh!

AmPhIbIaN26 commented Apr 25, 2021 • edited by Gallaecio Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmPhIbIaN26 commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmPhIbIaN26 commented May 3, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Gallaecio commented May 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AmPhIbIaN26 commented May 4, 2021

Uh oh!

AmPhIbIaN26 commented May 6, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AmPhIbIaN26 commented May 29, 2021

Uh oh!

Gallaecio Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

Gallaecio Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

Gallaecio Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

Gallaecio Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

Gallaecio Jul 14, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Apr 19, 2021 •

edited

Loading

AmPhIbIaN26 commented Apr 25, 2021 •

edited by Gallaecio

Loading

AmPhIbIaN26 commented Apr 28, 2021 •

edited

Loading

Gallaecio commented May 4, 2021 •

edited

Loading