Camofy html image sources #467

DrumsnChocolate · 2024-11-21T20:31:10Z

fix #466
todo:

test new stuff

codecov · 2024-11-21T20:40:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.92%. Comparing base (fbf1070) to head (ad7ba11).
Report is 1 commits behind head on staging.

Additional details and impacted files

@@           Coverage Diff            @@
##           staging     #467   +/-   ##
========================================
  Coverage    99.92%   99.92%           
========================================
  Files          207      207           
  Lines         2733     2745   +12     
========================================
+ Hits          2731     2743   +12     
  Misses           2        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

DrumsnChocolate · 2024-11-21T21:34:49Z

app/helpers/markdown_helper.rb

+    # note that we don't allow mismatched quotes like 'url" or shenanigans like that
+    # This regex contains two particularly useful features:
+    #  capturing groups, and lazy matching.
+    %r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}


don't ask me why I had to put %r{} around it, I just followed rubocop's instructions.

is there any documentaion online that i can use as a guide line for the review?

If you've never seen regexes before, they are a pain in the ass to review. They are kind of like a language in their own right, and the main way to understand regexes is to play around building them and testing them out on a site that shows you what the regex matches.
I've taken the regex here and prepared an example for you: https://regex101.com/r/RLd8UL/1
If you'd like to have a quick online meeting about how to understand this, let me know. I have time tomorrow

I have seen regex before, and it is almost unreadable. thank you for the resource

The best way to test the regex is probably to come up with valid HTML img tags that are not matched by the regex. The edge cases I could think of have been captured in this one.

I am cheating a little bit by using chatgpt, it has some suggestions and explaination, i will check it and comment it

app/helpers/markdown_helper.rb

lodewiges · 2024-11-21T22:13:38Z

app/helpers/markdown_helper.rb

+    # note that we don't allow mismatched quotes like 'url" or shenanigans like that
+    # This regex contains two particularly useful features:
+    #  capturing groups, and lazy matching.
+    %r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}


Suggested change

%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)}

%r{<img([^>])*\s+src=(["']?)([^'">]+)\1(?=[/>])}

this one also checks for space in between the elements and has better checking for the imageurl

you seem to have removed the first capturing group. chatgpt has that tendency because it doesn't see you using its result anywhere in your regex. However, we definitely use this capturing group to produce the new text. What I do like, is the use of \s; that's any whitespace right?

What exactly does the latter part improve? Because I don't think it really improves anything. Let me break my thoughts down:

original: (.+?)\2( |>|/>) where \2 matches to the same value as the captured opening quote (either ', " or nothing at all). (.+?) lazily matches any character. That means that it will begin with src=' or src=" or src= and then it will continue to match any character until the first time it encounters that same quotation mark again. Subsequently, after the quotation mark, it needs to match one of the three options in ( |>|/>) which are a space, the > or />.

your suggestion: ([^'">]+)\1(?=[/>])} where \1 matches the same value as the captured opening quote. Note that [^'">] is unnecessarily restrictive: I don't know exactly the specification of HTML, but I can image that something like src='lookatthisdoublequote"isntitamazing' is valid. Notice the " in the middle? Your regex suggestion would stop the src value at that middle quote, because it does not pass the [^'"] check. And the latter part, (?=[/>]) does not allow for any whitespace.

I do have some ideas based on your suggestion:

use \s instead of spaces when I'm indicating whitespace

I've reconsidered whether the last capturing group is necessary, but yes it is, because when we have src=somesource, we can only determine the end of the source value when we encounter a whitespace or / or >. But I can change it from ( |>|/>) to ( |>|/) which can be put simpler by writing [ >/]. However, if I want to match for \s instead of a space, I can't use the [] notation, I think

I've applied the new ideas I mentioned in the latest commit

This part was my bad "(?=[/>]) does not allow for any whitespace'' i asked chat to remove the whitespace.
I was unaware that this is a valid source "src='lookatthisdoublequote"isntitamazing'", but i can agree that it is to restictive

lodewiges · 2024-11-21T22:25:50Z

Looks good, unable to test due to not having camo locally installed

DrumsnChocolate · 2024-11-21T23:04:48Z

Looks good, unable to test due to not having camo locally installed

I don't have camo locally installed either. You don't need to. I believe it'll just use camo.csvalpha.nl when you're in development. Tests do things slightly different, not sure how that's handled, but they should pass for you locally as well if you desire to run them.

lodewiges · 2024-11-21T23:43:06Z

Looks good, unable to test due to not having camo locally installed

I don't have camo locally installed either. You don't need to. I believe it'll just use camo.csvalpha.nl when you're in development. Tests do things slightly different, not sure how that's handled, but they should pass for you locally as well if you desire to run them.

I will, i am having so trouble getting it to work

unable to get it working locally

write a regex for html images. might not be perfect

4a72f7e

DrumsnChocolate added 3 commits November 21, 2024 21:45

linter was annoying

4f20a08

I'm not used to how ruby deals with returns

e9a38e7

running the linter is a pain

90aecb4

DrumsnChocolate commented Nov 21, 2024

View reviewed changes

app/helpers/markdown_helper.rb Show resolved Hide resolved

DrumsnChocolate requested a review from lodewiges November 21, 2024 21:38

lodewiges reviewed Nov 21, 2024

View reviewed changes

app/helpers/markdown_helper.rb Outdated Show resolved Hide resolved

rename variable

02b0f95

lodewiges previously approved these changes Nov 21, 2024

View reviewed changes

improve regex to be more acommodating to general whitespace

ad7ba11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Camofy html image sources #467

Camofy html image sources #467

DrumsnChocolate commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 21, 2024 •

edited

Loading

DrumsnChocolate Nov 21, 2024

lodewiges Nov 21, 2024

DrumsnChocolate Nov 21, 2024

lodewiges Nov 21, 2024

DrumsnChocolate Nov 21, 2024

lodewiges Nov 21, 2024

lodewiges Nov 21, 2024 •

edited

Loading

DrumsnChocolate Nov 21, 2024 •

edited

Loading

DrumsnChocolate Nov 21, 2024

lodewiges Nov 21, 2024

lodewiges commented Nov 21, 2024

DrumsnChocolate commented Nov 21, 2024 •

edited

Loading

lodewiges commented Nov 21, 2024

	%r{<img([^>]*) src=(["']?)(.+?)\2( \|>\|/>)}
	%r{<img([^>])*\s+src=(["']?)([^'">]+)\1(?=[/>])}

Camofy html image sources #467

Are you sure you want to change the base?

Camofy html image sources #467

Conversation

DrumsnChocolate commented Nov 21, 2024 • edited Loading

codecov bot commented Nov 21, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lodewiges Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

DrumsnChocolate Nov 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lodewiges commented Nov 21, 2024

DrumsnChocolate commented Nov 21, 2024 • edited Loading

lodewiges commented Nov 21, 2024

DrumsnChocolate commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 21, 2024 •

edited

Loading

lodewiges Nov 21, 2024 •

edited

Loading

DrumsnChocolate Nov 21, 2024 •

edited

Loading

DrumsnChocolate commented Nov 21, 2024 •

edited

Loading