Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup linkify regex patterns (efficiency) #90

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Lee-Carre
Copy link

Inspired by #65:

  • Avoid groups being capture-groups (memory-usage)
  • more compact file-extension matching (especially similar extensions)
  • replace .+ with more specific URL-only set
  • match more possible variations (robustness & future-proofing)

* Avoid groups being capture-groups (memory-usage)
* more compact file-extension matching (especially similar extensions)
* replace `.+` with more specific URL-only set
* match more possible variations (robustness & future-proofing)
@Lee-Carre
Copy link
Author

Lee-Carre commented Feb 24, 2022

Having read & thought a little more, I now suspect that my making all groups non-capturing will sabotage the next block of code for applying special treatment to some URLs.

I'll undo this and see if I can figure out (on a phone) how to add those commits to this PR (else I'll close this and open anew).

Edit: done.

@Lee-Carre Lee-Carre marked this pull request as draft February 24, 2022 16:38
@Lee-Carre
Copy link
Author

@ENT8R

I'm working on refining the patterns further.

To make the patterns more specific (matching only valid URLs (avoiding false-positives), and doing so efficiently (since each pattern is run against each queried (OSM-)note)), it would be helpful to use backreferences.

  • Which regex parser / interpreter does your backend use? I would guess JavaScript, given that's the language of linkify.js.
  • If you already know; does it implement backreferences, and if so via what syntax? If you don't happen to recall, that's fine; I'll gladly read the documentation for the answer to the first question 🙂.

@Lee-Carre
Copy link
Author

Apologies for the multiple related commits; I'm doing all this on a phone via GH's Web UI.

@ENT8R
Copy link
Owner

ENT8R commented Feb 27, 2022

Thank you for your work so far, I did not look into the changes in detail yet, as I wanted to answer your questions before:

Having read & thought a little more, I now suspect that my making all groups non-capturing will sabotage the next block of code for applying special treatment to some URLs.

Correct, in some cases, the URLs are transformed to a different URL to provide a smaller image/thumbnail or to map a webpage to the correct image (needed for some Imgur links). As you already noticed, this is done via capture groups.

Which regex parser / interpreter does your backend use? I would guess JavaScript, given that's the language of linkify.js.

If you mean the language the user interface is implemented in and where the image linking takes place (linkify.js), JavaScript is the language being used.

If you already know; does it implement backreferences, and if so via what syntax?

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions and especially https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges are useful if you are interested in further documentation. I also found another page which offers a quick overview of the possibilities: https://javascript.info/regexp-backreferences

@Lee-Carre
Copy link
Author

Thank you for your work so far

Welcome; contributing what I'm able to 🙂. Opportunity to learn.

I did not look into the changes in detail yet

Don't worry about review, yet. I'll change the status of the PR (from draft to open; I presume that'll notify you(?), since you're likely ‘watching’ ‘all activity’) when I feel it's ready for scrutiny & consideration for merging.

Correct, in some cases, the URLs are transformed to a different URL to provide a smaller image/thumbnail or to map a webpage to the correct image (needed for some Imgur links). As you already noticed, this is done via capture groups.

👍 an elegant approach.

The necessary groups are now capturing, again. Embarrassment averted.

If you mean the language the user interface is implemented in and where the image linking takes place (linkify.js), JavaScript is the language being used.

Ah, it happens client-side; OK. Either way, thanks for the confirmation of which flavour of regex to use. I don't have the means to test changes (otherwise it would also mean that I could ensure that all the changes would be in a single commit), so I want to be extra-careful that what I submit is at least valid syntax.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions and especially https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges are useful if you are interested in further documentation. I also found another page which offers a quick overview of the possibilities: https://javascript.info/regexp-backreferences

Thankyou muchly. Mozilla does a decent job of documentation 🙂.

Besides backreferences for deduplicating code (some parts of e.g. wiki URLs repeat), I want to use named backreferences (or, rather, backreferences to named groups) for easier readability (I'm thinking of when future changes are necessary). Almost like comments in code, or well-named variables & functions.

I'll try to do that with fewer commits, though 😋.

@ENT8R ENT8R force-pushed the main branch 2 times, most recently from de790a5 to 2887ad8 Compare August 10, 2022 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants