Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build ASR Support for Regex, Email. Enhance Number, Date Entity #475

Merged
merged 16 commits into from
Apr 20, 2022

Conversation

tanaya-b
Copy link
Contributor

@tanaya-b tanaya-b commented Mar 25, 2022

JIRA Ticket Number

JIRA TICKET: ML-2962

Description of change

  • Add ASR Utils Library for text normalization

  • Build support for longest fuzzy match

  • Change API Calls on both Haptik API and Chatbot_NER

  • Update dictionaries

  • Updates for Number Entity
    - Punctuation filtering (Numeric Entity)
    - Scale resolution logic (Numeric Entity)
    - Number sorting fix (Numeric Entity)
    - Add Double/Triple in Scaling (Numeric Entity)

Checklist (OPTIONAL):

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

import string
from six.moves import range

from chatbot_ner.config import ner_logger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'chatbot_ner.config.ner_logger' imported but unused

# Constants
_re_flags = re.UNICODE | re.V1
PUNCTUATION_CHARACTERS = list(string.punctuation + '। ')
CAPTURE_RANGE_RE = "{(?P<minimum>\d+),(?P<maximum>\d+)}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W605 invalid escape sequence '\d'

"""

if not insert_edits:
count = lambda l1, l2: sum([1 for x in l1 if x in l2])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E731 do not assign a lambda expression, use a def


Example procedure:
input_text = "बी nine nine three zero"
regex = "\w\d{4}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W605 invalid escape sequence '\w'
W605 invalid escape sequence '\d'

number_unit = number_value_dict[NUMBER_DETECTION_RETURN_DICT_UNIT]
if self.min_digit <= self._num_digits(number_value) <= self.max_digit:
if self.unit_type and (number_unit is None or self.language_number_detector.units_map[
number_unit].type != self.unit_type) and not self.detect_without_unit:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E125 continuation line with same indent as next logical line


# add re.escape to handle decimal cases in detected original
detected_original = re.escape(detected_original)
unit_matches = re.search(r'\W+((' + self.unit_choices + r')[.,\s]*' + detected_original + r')\W+|\W+(' +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W504 line break after binary operator

# add re.escape to handle decimal cases in detected original
detected_original = re.escape(detected_original)
unit_matches = re.search(r'\W+((' + self.unit_choices + r')[.,\s]*' + detected_original + r')\W+|\W+(' +
detected_original + r'\s*(' +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W504 line break after binary operator

end_span = -1
spanned_text = self.processed_text

regex_numeric_patterns = re.compile(r'(([\d,]+\.?[\d]*)\s?(' + self.scale_map_choices + r'))[\s\-\:]' +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W504 line break after binary operator

@tanaya-b tanaya-b added the new-feature Added new functionality label Mar 25, 2022
@tanaya-b
Copy link
Contributor Author

Lint fixes to be pushed with next suggested changes.

input_text (str): modified text

Example:
fit_text_to_format(input_text='1 2 3 45', regex_pattern='\d{5}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W605 invalid escape sequence '\d'


if not insert_edits:
# A rough heuristic to allow (#_of_punctuations + 2) extra characters during fuzzy matching
count = lambda l1, l2: sum([1 for x in l1 if x in l2]) # pylint: disable=E731

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E731 do not assign a lambda expression, use a def


Example procedure:
input_text = "बी nine nine three zero"
regex = r"\w\d{4}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W605 invalid escape sequence '\w'
W605 invalid escape sequence '\d'

@naseem-shaik naseem-shaik self-requested a review April 14, 2022 14:06
Copy link
Contributor

@naseem-shaik naseem-shaik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Please fix the lint errors before merging it.

@tanaya-b
Copy link
Contributor Author

retest this please

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 14 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@haptik-deployment
Copy link

UNIT TESTS HAVE PASSED... Good To Merge

@tanaya-b tanaya-b merged commit a835daa into develop Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing-tests new-feature Added new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants