-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Camofy html image sources #467
base: staging
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## staging #467 +/- ##
========================================
Coverage 99.92% 99.92%
========================================
Files 207 207
Lines 2733 2745 +12
========================================
+ Hits 2731 2743 +12
Misses 2 2 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
app/helpers/markdown_helper.rb
Outdated
# note that we don't allow mismatched quotes like 'url" or shenanigans like that | ||
# This regex contains two particularly useful features: | ||
# capturing groups, and lazy matching. | ||
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't ask me why I had to put %r{} around it, I just followed rubocop's instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any documentaion online that i can use as a guide line for the review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you've never seen regexes before, they are a pain in the ass to review. They are kind of like a language in their own right, and the main way to understand regexes is to play around building them and testing them out on a site that shows you what the regex matches.
I've taken the regex here and prepared an example for you: https://regex101.com/r/RLd8UL/1
If you'd like to have a quick online meeting about how to understand this, let me know. I have time tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen regex before, and it is almost unreadable. thank you for the resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best way to test the regex is probably to come up with valid HTML img tags that are not matched by the regex. The edge cases I could think of have been captured in this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am cheating a little bit by using chatgpt, it has some suggestions and explaination, i will check it and comment it
app/helpers/markdown_helper.rb
Outdated
# note that we don't allow mismatched quotes like 'url" or shenanigans like that | ||
# This regex contains two particularly useful features: | ||
# capturing groups, and lazy matching. | ||
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%r{<img([^>]*) src=(["']?)(.+?)\2( |>|/>)} | |
%r{<img([^>])*\s+src=(["']?)([^'">]+)\1(?=[/>])} |
this one also checks for space in between the elements and has better checking for the imageurl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you seem to have removed the first capturing group. chatgpt has that tendency because it doesn't see you using its result anywhere in your regex. However, we definitely use this capturing group to produce the new text. What I do like, is the use of \s
; that's any whitespace right?
What exactly does the latter part improve? Because I don't think it really improves anything. Let me break my thoughts down:
- original:
(.+?)\2( |>|/>)
where\2
matches to the same value as the captured opening quote (either ', " or nothing at all).(.+?)
lazily matches any character. That means that it will begin withsrc='
orsrc="
orsrc=
and then it will continue to match any character until the first time it encounters that same quotation mark again. Subsequently, after the quotation mark, it needs to match one of the three options in( |>|/>)
which are a space, the>
or/>
. - your suggestion:
([^'">]+)\1(?=[/>])}
where\1
matches the same value as the captured opening quote. Note that[^'">]
is unnecessarily restrictive: I don't know exactly the specification of HTML, but I can image that something likesrc='lookatthisdoublequote"isntitamazing'
is valid. Notice the"
in the middle? Your regex suggestion would stop the src value at that middle quote, because it does not pass the[^'"]
check. And the latter part,(?=[/>])
does not allow for any whitespace.
I do have some ideas based on your suggestion:
- use
\s
instead of spaces when I'm indicating whitespace - I've reconsidered whether the last capturing group is necessary, but yes it is, because when we have
src=somesource
, we can only determine the end of the source value when we encounter a whitespace or / or >. But I can change it from( |>|/>)
to( |>|/)
which can be put simpler by writing[ >/]
. However, if I want to match for\s
instead of a space, I can't use the[]
notation, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied the new ideas I mentioned in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was my bad "(?=[/>]) does not allow for any whitespace'' i asked chat to remove the whitespace.
I was unaware that this is a valid source "src='lookatthisdoublequote"isntitamazing'", but i can agree that it is to restictive
Looks good, unable to test due to not having camo locally installed |
I don't have camo locally installed either. You don't need to. I believe it'll just use camo.csvalpha.nl when you're in development. Tests do things slightly different, not sure how that's handled, but they should pass for you locally as well if you desire to run them. |
I will, i am having so trouble getting it to work |
fix #466
todo: