-
-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: UPS tracking numbers #228
Conversation
c287929
to
2516179
Compare
pywhat/Data/regex.json
Outdated
"Regex": "^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$", | ||
"plural_name": false, | ||
"Description": null, | ||
"Rarity": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that rarity should be lowered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something around 0.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0.3? I'd say higher like 0.5 or 0.6 because:
- The string has to start with
1Z
- It needs 7 chars
0-9A-Z
- It has exactly 2 numbers
- It has 8 numbers
Also, can we make it:
- ^(1Z[0-9A-Z]{6}[0-9]{2}[0-9]{8})$
- + ^(1Z[0-9A-Z]{6}[0-9]{10})$
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 0.4 or 0.5. And yes, regex should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the 2+8 split is because the first 2 digits in this group represent a service indicator code and perhaps it could be captured and handled in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service indicator code
We have precedence for this called sub-categories. See the Mastercard / Phone Numbers regex. I am not sure it'll work on data in the middle of the regex, we may need to change the code for that :)
Aside: I wonder if the "rarity" could be estimated more reliably through some entropy-based measure 🤔
Probably! Currently I am estimating it based on what I see when people post this:
And also whether we have any false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@P403n1x87 You can use subcategories with regex method for that.
14b9fcc
to
45d37fe
Compare
45d37fe
to
e3880c0
Compare
Codecov Report
@@ Coverage Diff @@
## main #228 +/- ##
=======================================
Coverage 92.60% 92.60%
=======================================
Files 15 15
Lines 1217 1217
=======================================
Hits 1127 1127
Misses 90 90 Continue to review full report at Codecov.
|
Co-authored-by: piatrashkakanstantinass <74979584+piatrashkakanstantinass@users.noreply.github.com>
⚠ Pull Requests not made with this template will be automatically closed 🔥
Prerequisites
Why do we need this pull request?
What GitHub issues does this fix?
N. A.
Copy / paste of output