-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match protocol relative URLs (://
)
#17
Conversation
Hmm, so this PR adds support for If we want to support |
Oh, you're right. That's wrong. It was a bit late. |
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, `//example.com` will use the protocol of the current page, typically HTTP or HTTPS. These URLs commonly occur in the wild, specifically on pages that support both http and https protocols. Since the URLs without a protocol belong to the class of relative URLs, it is up to the user to decide what protocol to use when parsing these URLs. Example configuration for the url crate: https://docs.rs/url/2.2.1/url/#base-url
src/url.rs
Outdated
fn scan(&self, s: &str, trigger_index: usize) -> Option<Range<usize>> { | ||
let (proto, offset) = match self.trigger { | ||
'/' => ("//", 2), | ||
_ => ("://", 3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you like, we can make the trigger an enum
type. Then we can use an exhaustive match here:
let (proto, offset) = match self.trigger {
Trigger::Slash => ("//", 2),
Trigger::Colon => ("://", 3),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just went ahead and added it. Think that's cleaner now. Let me know if I should revert the change. ✌️
Should be fixed now. |
/// Users should not exhaustively match this enum, because more triggers | ||
/// may be added in the future. | ||
#[derive(Debug, PartialEq)] | ||
pub enum Trigger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this pub(crate)
? Want to make it clear at this point that it's not part of the API.
@@ -29,6 +36,12 @@ impl Scanner for UrlScanner { | |||
impl UrlScanner { | |||
// See "scheme" in RFC 3986 | |||
fn find_start(&self, s: &str) -> Option<usize> { | |||
// Match protocol relative URLs (`//example.org`) | |||
// See https://stackoverflow.com/a/15146073/270334 | |||
if s.is_empty() && self.trigger == Trigger::Slash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm does this mean in abc //example.org
we wouldn't match //example.org
? Can you add a test case for this?
Also needs some more negative test cases, e.g. something like foo//bar
should not match, but e.g. "//foo.bar
should.
@@ -188,13 +201,12 @@ impl Default for LinkFinder { | |||
|
|||
impl<'t> Links<'t> { | |||
fn new(text: &'t str, url: bool, email: bool, email_domain_must_have_dot: bool) -> Links<'t> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have an option (disabled by default) for the new functionality, something like protocol_relative_urls(true)
. Normally I wouldn't want to match those kinds of links in plain text.
You can see e.g. GitHub not matching it either: //github.com
Just checking back on this one. Still needs some more changes (and resolving conflicts). |
Oh yeah. Didn't find the time to look into it, heh. Will set myself a reminder for next week. |
Should we close this and try again at a later point in time? linkify 0.9.0 made some changes to the codebase, so it's probably easier to start from a clean slate. |
Yep! It should be much simpler to do that now as the necessary bits of logic are now separate from |
Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example,
//example.com
will use the protocol of the current page, typically HTTP or HTTPS.These URLs commonly occur in the wild, specifically on pages that support both HTTP and HTTPS protocols.
Since the URLs without a protocol belong to the class of relative URLs, it is up to the user to decide what protocol to use when parsing these URLs.
Example configuration for the url crate: https://docs.rs/url/2.2.1/url/#base-url
It would be great to support those in linkify, as I discovered quite a few cases while working on recursion in lychee.
(lycheeverse/lychee#165)