-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build ASR Support for Regex, Email. Enhance Number, Date Entity #475
Conversation
import string | ||
from six.moves import range | ||
|
||
from chatbot_ner.config import ner_logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'chatbot_ner.config.ner_logger' imported but unused
lib/nlp/text_normalization.py
Outdated
# Constants | ||
_re_flags = re.UNICODE | re.V1 | ||
PUNCTUATION_CHARACTERS = list(string.punctuation + '। ') | ||
CAPTURE_RANGE_RE = "{(?P<minimum>\d+),(?P<maximum>\d+)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W605 invalid escape sequence '\d'
lib/nlp/text_normalization.py
Outdated
""" | ||
|
||
if not insert_edits: | ||
count = lambda l1, l2: sum([1 for x in l1 if x in l2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E731 do not assign a lambda expression, use a def
lib/nlp/text_normalization.py
Outdated
|
||
Example procedure: | ||
input_text = "बी nine nine three zero" | ||
regex = "\w\d{4}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W605 invalid escape sequence '\w'
W605 invalid escape sequence '\d'
number_unit = number_value_dict[NUMBER_DETECTION_RETURN_DICT_UNIT] | ||
if self.min_digit <= self._num_digits(number_value) <= self.max_digit: | ||
if self.unit_type and (number_unit is None or self.language_number_detector.units_map[ | ||
number_unit].type != self.unit_type) and not self.detect_without_unit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E125 continuation line with same indent as next logical line
|
||
# add re.escape to handle decimal cases in detected original | ||
detected_original = re.escape(detected_original) | ||
unit_matches = re.search(r'\W+((' + self.unit_choices + r')[.,\s]*' + detected_original + r')\W+|\W+(' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W504 line break after binary operator
# add re.escape to handle decimal cases in detected original | ||
detected_original = re.escape(detected_original) | ||
unit_matches = re.search(r'\W+((' + self.unit_choices + r')[.,\s]*' + detected_original + r')\W+|\W+(' + | ||
detected_original + r'\s*(' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W504 line break after binary operator
end_span = -1 | ||
spanned_text = self.processed_text | ||
|
||
regex_numeric_patterns = re.compile(r'(([\d,]+\.?[\d]*)\s?(' + self.scale_map_choices + r'))[\s\-\:]' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W504 line break after binary operator
Lint fixes to be pushed with next suggested changes. |
input_text (str): modified text | ||
|
||
Example: | ||
fit_text_to_format(input_text='1 2 3 45', regex_pattern='\d{5}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W605 invalid escape sequence '\d'
|
||
if not insert_edits: | ||
# A rough heuristic to allow (#_of_punctuations + 2) extra characters during fuzzy matching | ||
count = lambda l1, l2: sum([1 for x in l1 if x in l2]) # pylint: disable=E731 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E731 do not assign a lambda expression, use a def
|
||
Example procedure: | ||
input_text = "बी nine nine three zero" | ||
regex = r"\w\d{4}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W605 invalid escape sequence '\w'
W605 invalid escape sequence '\d'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Please fix the lint errors before merging it.
retest this please |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
UNIT TESTS HAVE PASSED... Good To Merge |
JIRA Ticket Number
JIRA TICKET: ML-2962
Description of change
Add ASR Utils Library for text normalization
Build support for longest fuzzy match
Change API Calls on both Haptik API and Chatbot_NER
Update dictionaries
Updates for Number Entity
- Punctuation filtering (Numeric Entity)
- Scale resolution logic (Numeric Entity)
- Number sorting fix (Numeric Entity)
- Add Double/Triple in Scaling (Numeric Entity)
Checklist (OPTIONAL):