-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup linkify regex patterns (efficiency) #90
base: main
Are you sure you want to change the base?
Conversation
* Avoid groups being capture-groups (memory-usage) * more compact file-extension matching (especially similar extensions) * replace `.+` with more specific URL-only set * match more possible variations (robustness & future-proofing)
Having read & thought a little more, I now suspect that my making all groups non-capturing will sabotage the next block of code for applying special treatment to some URLs. I'll undo this and see if I can figure out (on a phone) how to add those commits to this PR (else I'll close this and open anew). Edit: done. |
Avoid over-broad (`.+`) matching.
I'm working on refining the patterns further. To make the patterns more specific (matching only valid URLs (avoiding false-positives), and doing so efficiently (since each pattern is run against each queried (OSM-)note)), it would be helpful to use backreferences.
|
* closer matching to valid-only URLs * increased efficiency
[Trailing Dots in Domain Name](http://www.dns-sd.org/TrailingDotsInDomainNames.html)
Match well-formed URLs
Match more (valid) possibilities.
To match percent-encoding URL-escaping.
Apologies for the multiple related commits; I'm doing all this on a phone via GH's Web UI. |
Thank you for your work so far, I did not look into the changes in detail yet, as I wanted to answer your questions before:
Correct, in some cases, the URLs are transformed to a different URL to provide a smaller image/thumbnail or to map a webpage to the correct image (needed for some Imgur links). As you already noticed, this is done via capture groups.
If you mean the language the user interface is implemented in and where the image linking takes place (
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions and especially https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges are useful if you are interested in further documentation. I also found another page which offers a quick overview of the possibilities: https://javascript.info/regexp-backreferences |
Welcome; contributing what I'm able to 🙂. Opportunity to learn.
Don't worry about review, yet. I'll change the status of the PR (from
👍 an elegant approach. The necessary groups are now capturing, again. Embarrassment averted.
Ah, it happens client-side; OK. Either way, thanks for the confirmation of which flavour of regex to use. I don't have the means to test changes (otherwise it would also mean that I could ensure that all the changes would be in a single commit), so I want to be extra-careful that what I submit is at least valid syntax.
Thankyou muchly. Mozilla does a decent job of documentation 🙂. Besides backreferences for deduplicating code (some parts of e.g. wiki URLs repeat), I want to use named backreferences (or, rather, backreferences to named groups) for easier readability (I'm thinking of when future changes are necessary). Almost like comments in code, or well-named variables & functions. I'll try to do that with fewer commits, though 😋. |
de790a5
to
2887ad8
Compare
Inspired by #65:
.+
with more specific URL-only set