Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple dashes in phone number detection in english #392

Conversation

debadityamandal
Copy link
Contributor

This commit will replace em-dash and en-dash with hyphen which will help us to find the locale easily in phone number. Since all dashes look like same, we can't separate them by any expression because except hyphen another dashes are not available in keyboard but their unicode values are different. To do this I replaced re module by regex module.

@haptik-unit-test
Copy link

Can one of the admins verify this patch?

@haptik-deployment
Copy link

👍 No lint errors found.

@chiragjn
Copy link
Contributor

chiragjn commented Mar 4, 2021

Hello, thank you for your contribution. Sorry for the late reply 😅
Can you please give us an example case of locale where such kind of substitution is needed?
Since locale is supposed to be passed from the outside, we would normally expect it to be correct

@debadityamandal
Copy link
Contributor Author

I was using this module to extract phone number from few documents. That time I got locale "EN–US" from those documents and I got the following response-
([{'country_calling_code': '91', 'value': 'XXXXXXXXXX'}], ['XXXXXXXXXX'])
This 'country_calling_code' is representing India but I gave US. Then I started to debug the code and I saw it was not detecting country code because of the wrong dash as the code is detecting only hyphen. We can't type Em-Dash/En-Dash through keyboard normally. So, I converted the Em-Dash which is mentioned in locale to hyphen and now I am getting following response-
([{'country_calling_code': '1', 'value': 'XXXXXXXXXX'}], ['XXXXXXXXXX']).

@chiragjn
Copy link
Contributor

chiragjn commented Mar 8, 2021

Although this seems like something that should be handled outside of this module, I am still willing to accept the change. Can you please revert all re changes? We would like to use both re and regex (only for the dash replacing use cases) for now to keep the diff as small as possible

@haptik-deployment
Copy link

👍 No lint errors found.

@debadityamandal
Copy link
Contributor Author

I revert all re changes. Please let me know if you have any doubt.

@haptik-deployment
Copy link

👍 No lint errors found.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 8, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ranvijayj ranvijayj merged commit 7408639 into hellohaptik:develop Mar 8, 2021
@debadityamandal debadityamandal deleted the handle_multiple_dashes_phone_number_english branch March 8, 2021 09:54
@debadityamandal
Copy link
Contributor Author

Thank you for accepting my code. I am closing pull request

@debadityamandal debadityamandal restored the handle_multiple_dashes_phone_number_english branch March 8, 2021 10:07
@debadityamandal debadityamandal deleted the handle_multiple_dashes_phone_number_english branch March 8, 2021 13:25
@chiragjn chiragjn mentioned this pull request Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants