Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camofy html image sources #467

Open
wants to merge 6 commits into
base: staging
Choose a base branch
from
Open

Conversation

DrumsnChocolate
Copy link
Contributor

@DrumsnChocolate DrumsnChocolate commented Nov 21, 2024

fix #466
todo:

  • test new stuff

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.92%. Comparing base (fbf1070) to head (ad7ba11).
Report is 1 commits behind head on staging.

Additional details and impacted files
@@           Coverage Diff            @@
##           staging     #467   +/-   ##
========================================
  Coverage    99.92%   99.92%           
========================================
  Files          207      207           
  Lines         2733     2745   +12     
========================================
+ Hits          2731     2743   +12     
  Misses           2        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

# note that we don't allow mismatched quotes like 'url" or shenanigans like that
# This regex contains two particularly useful features:
# capturing groups, and lazy matching.
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't ask me why I had to put %r{} around it, I just followed rubocop's instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any documentaion online that i can use as a guide line for the review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've never seen regexes before, they are a pain in the ass to review. They are kind of like a language in their own right, and the main way to understand regexes is to play around building them and testing them out on a site that shows you what the regex matches.
I've taken the regex here and prepared an example for you: https://regex101.com/r/RLd8UL/1
If you'd like to have a quick online meeting about how to understand this, let me know. I have time tomorrow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen regex before, and it is almost unreadable. thank you for the resource

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best way to test the regex is probably to come up with valid HTML img tags that are not matched by the regex. The edge cases I could think of have been captured in this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am cheating a little bit by using chatgpt, it has some suggestions and explaination, i will check it and comment it

lodewiges
lodewiges previously approved these changes Nov 21, 2024
# note that we don't allow mismatched quotes like 'url" or shenanigans like that
# This regex contains two particularly useful features:
# capturing groups, and lazy matching.
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}
Copy link
Contributor

@lodewiges lodewiges Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}
%r{<img([^>])*\s+src=(["']?)([^'">]+)\1(?=[/>])}

this one also checks for space in between the elements and has better checking for the imageurl

Copy link
Contributor Author

@DrumsnChocolate DrumsnChocolate Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you seem to have removed the first capturing group. chatgpt has that tendency because it doesn't see you using its result anywhere in your regex. However, we definitely use this capturing group to produce the new text. What I do like, is the use of \s; that's any whitespace right?

What exactly does the latter part improve? Because I don't think it really improves anything. Let me break my thoughts down:

  • original: (.+?)\2( |>|/>) where \2 matches to the same value as the captured opening quote (either ', " or nothing at all). (.+?) lazily matches any character. That means that it will begin with src=' or src=" or src= and then it will continue to match any character until the first time it encounters that same quotation mark again. Subsequently, after the quotation mark, it needs to match one of the three options in ( |>|/>) which are a space, the > or />.
  • your suggestion: ([^'">]+)\1(?=[/>])} where \1 matches the same value as the captured opening quote. Note that [^'">] is unnecessarily restrictive: I don't know exactly the specification of HTML, but I can image that something like src='lookatthisdoublequote"isntitamazing' is valid. Notice the " in the middle? Your regex suggestion would stop the src value at that middle quote, because it does not pass the [^'"] check. And the latter part, (?=[/>]) does not allow for any whitespace.

I do have some ideas based on your suggestion:

  • use \s instead of spaces when I'm indicating whitespace
  • I've reconsidered whether the last capturing group is necessary, but yes it is, because when we have src=somesource, we can only determine the end of the source value when we encounter a whitespace or / or >. But I can change it from ( |>|/>) to ( |>|/) which can be put simpler by writing [ >/]. However, if I want to match for \s instead of a space, I can't use the [] notation, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied the new ideas I mentioned in the latest commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was my bad "(?=[/>]) does not allow for any whitespace'' i asked chat to remove the whitespace.
I was unaware that this is a valid source "src='lookatthisdoublequote"isntitamazing'", but i can agree that it is to restictive

@lodewiges
Copy link
Contributor

Looks good, unable to test due to not having camo locally installed

@DrumsnChocolate
Copy link
Contributor Author

DrumsnChocolate commented Nov 21, 2024

Looks good, unable to test due to not having camo locally installed

I don't have camo locally installed either. You don't need to. I believe it'll just use camo.csvalpha.nl when you're in development. Tests do things slightly different, not sure how that's handled, but they should pass for you locally as well if you desire to run them.

@lodewiges
Copy link
Contributor

Looks good, unable to test due to not having camo locally installed

I don't have camo locally installed either. You don't need to. I believe it'll just use camo.csvalpha.nl when you're in development. Tests do things slightly different, not sure how that's handled, but they should pass for you locally as well if you desire to run them.

I will, i am having so trouble getting it to work

@lodewiges lodewiges dismissed their stale review November 21, 2024 23:43

unable to get it working locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Camofy HTML content for static-pages
2 participants