Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex bug fix #169

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Regex bug fix #169

wants to merge 3 commits into from

Conversation

Faris-07
Copy link

This is a bug fix PR. The regex patterns do not work when resolving browser URI requests, nor does it function when using the pattern tester.

The cause is due to no checking being performed on the type and all requests defaulting to PATTERN_TYPE_WILDCARD, and thus only checking the host of a given URI and not the full URI.

I have added code that will check whether the pattern type is either a wildcard or regex, and return the appropriate URI string.

I am currently using it in production on some nix machines on Firefox 97, and it is functioning as intended.

@ericjung
Copy link
Collaborator

Hi,

Thanks for the PR. The full URL is not available to add-ons from the Firefox WebExtensions API:

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/proxy/onRequest
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/proxy/RequestDetails

Only the host, protocol, port, etc are available -- no path info, no query params, etc. This is why the FoxyProxy patttern matcher works the way it does. Also, when you define a pattern in FoxyProxy, you explicitly choose a protocol separately from the domain -- the protocol is not used as part of the regexp pattern matching, as i recall, for that reason.

If Firefox has changed the behavior of this API, then I'd be glad to review your PR further. Please let me know.

Thanks again for your work!

@Faris-07
Copy link
Author

Hi Eric,

The proxy.RequestDetails object has implemented access to the full url (proxy.RequestDetails.url) since Firefox 60.

Reference: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/proxy/RequestDetails

@ericjung
Copy link
Collaborator

ericjung commented Feb 22, 2022

Hi Eric,

The proxy.RequestDetails object has implemented access to the full url (proxy.RequestDetails.url) since Firefox 60.

Reference: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/proxy/RequestDetails

proxy.RequestDetails.url has been there since the beginning of the proxy WebExtensions API. Have you actually inspected the value at runtime by logging it to the console, using a breakpoint, or a similar method? Unless something has changed, many of the properties of the URL object are not populated. This was done on purpose so that add-ons could not exfiltrate private info.

@Faris-07
Copy link
Author

Faris-07 commented Feb 22, 2022

Yes, see the step debug below; the functionality is there, and the regex is now applied to the entire uri, and works as expected.

stepdebug

@erosman
Copy link
Collaborator

erosman commented Feb 22, 2022

only checking the host of a given URI and not the full URI.

I am not sure if I understand the purpose of the request.
Is it saying that the proxy should apply to the entire URL?
What would be the purpose of it?

@Faris-07
Copy link
Author

Faris-07 commented Feb 22, 2022

only checking the host of a given URI and not the full URI.

I am not sure if I understand the purpose of the request. Is it saying that the proxy should apply to the entire URL? What would be the purpose of it?

This introduces an additional feature which allows you to regex the authority, or the path, or the query, or the entire url VS just the authority with the current release. In fact, when I installed the addon, I assumed regex meant against the entire URI, and spent over an hour trying to understand why the regular expressions were not working.

There are many instances whereby a resource is accessible via a particular path, and you only wish to enable the proxy for that path and not the entire domain. For example, take bbc.co.uk/iplayer - I could set a wildcard for bbc.co.uk but now all traffic for that domain anywhere on the net is routed through the proxy whereas I only want bbc\.co\.uk\/iplayer([\/]?.*) to be routed via proxy. It could be due to a bandwidth cap, or maybe it's a shared proxy and you want to limit the traffic due to security/reliability reasons, or maybe you need a different geolocation to access that particular resource.

Now what if I go to foo.com, and it is also getting data from bbc.co.uk, that will also be routed via the proxy, but there is no need for that in this particular scenario.

If I want to route an entire domain, then the wildcard type makes sense. Regex is for more granular control, and should not be limited to just the authority.

@erosman
Copy link
Collaborator

erosman commented Feb 22, 2022

The concept of proxy is a node between the connection from client (Firefox user) to the target server (target domain).

Once that connection is established, the internal pages are irreverent.

📌 In other words, the connection is always to the domain server and it is that server that gets the internal pages and sends them to the client.

Furthermore, most pages have elements from other paths within the domain e.g. images, scripts, etc. It would be meaningless to proxy a path that gets its elements from other paths.

Additionally, proxy related API such as webRequest.onAuthRequired etc are host (and port) based.

Firefox also caches the details and sends the subsequent requests to the same host via the same connection and using the same authentication.

@Faris-07
Copy link
Author

Faris-07 commented Feb 22, 2022

The concept of proxy is a node between the connection from client (Firefox user) to the target server (target domain).

Once that connection is established, the internal pages are irreverent.

📌 In other words, the connection is always to the domain server and it is that server that gets the internal pages and sends them to the client.

Not really sure what you're alluding to with this truism. From what I understand, Foxyproxy does re-routing per URI, not per instance. If you want instance re-routing, use Firefox's internal proxy feature.

Furthermore, most pages have elements from other paths within the domain e.g. images, scripts, etc. It would be meaningless to proxy a path that gets its elements from other paths.

Additionally, proxy related API such as webRequest.onAuthRequired etc are host (and port) based.

Your points are also applicable to wildcard domains too.
Say I whitelist *.foo.com, it may be pulling resources from static-*.cdn1.com and assets-*.cdn0.com so the points you mention are also applicable to your proposition.

This is another feature I am considering BTW, rather than redirecting on a per URI basis, proxy re-routing is done per tab.
if tab.id == 123 && tab.url == (REGEX | WILDCARD) then proxy(url)
This will effectively re-route every URI in a given tab rather than specific domains as is currently the case.

@ericjung
Copy link
Collaborator

FoxyProxy worked in this manner before the Firefox WebExtensions API (versions 4.6.5 and earlier). The behavior only changed because the WebExtensions Proxy API did not expose the complete URL. If it is now exposed, I think this is great news. The current behavior of switching proxy by domain is inherently supported, @erosman, so I don't see the harm in supporting switching other URL components.

However, I worry that existing users may be affected in ways they do not anticipate. @Faris-07, are there existing patterns that might yield different results when applied to a complete URL rather than just the domain?

Additionally, proxy related API such as webRequest.onAuthRequired etc are host (and port) based.

This is indeed concerning since the pattern matching would not be consistent between auth and proxy selection. I am not sure how to resolve this.

@Faris-07
Copy link
Author

However, I worry that existing users may be affected in ways they do not anticipate. @Faris-07, are there existing patterns that might yield different results when applied to a complete URL rather than just the domain?
I was thinking about this a few days ago.

It should work as expected as long as no boundaries are being set. For example, (.*)\.(foo|moo)\.com would still match www.foo.com/123 with the current release, however if the end user set (.*)\.(foo|moo)\.com$ then such a match would fail. I do wonder how many people are using the regex though. There is nothing to indicate the regex was being applied to the host only, I would suspect quite a few people might think the feature is broken, hence why I submitted the "bug fix".

Additionally, proxy related API such as webRequest.onAuthRequired etc are host (and port) based.

This is indeed concerning since the pattern matching would not be consistent between auth and proxy selection. I am not sure how to resolve this.

Can you give me a scenario where this is a problem ? and how it correlates to the fix I have submitted ? I am just wondering whether this is specific to the regex, a design issue, or a limitation of the API.

@ericjung
Copy link
Collaborator

By the way, here is the bug where this was changed.

when I installed the addon, I assumed regex meant against the entire URI, and spent over an hour trying to understand why the regular expressions were not working.

It is mentioned in the Pattern Help page within the add-on, but it is buried and not intuitive. Read the pink callout in this screenshot:

image

however if the end user set (.*).(foo|moo).com$ then such a match would fail

That is bad. So this is a breaking change and needs to alert users who upgrade. We need to increase the version to 8.x. There is logic in the addon to display the release notes page after an upgrade. It even distinguishes between upgrades vs new installs.

We need to use that same logic to alert users about this breaking change with figurative flashing red lights and dance music. Ideally, it works like this:

  1. At startup, check if the addon was updated from 7.x to 8.x or higher.
  2. If 1 is true, then check if any patterns are regular expressions. If they are not any regexps, then we display the usual release notes page. If any of them are regular expressions, then we display a new page with the flashing lights and dance music... urging users to review their pattens and giving examples of patterns that are broken after the upgrade, with a link to this git URL for support.

I do wonder how many people are using the regex though

No idea but certainly greater than zero!

@erosman
Copy link
Collaborator

erosman commented Feb 23, 2022

What is the purpose of Proxy?

The primary purpose of proxy is to hide client's real IP/location from target server. A secondary purpose is to bypass local restrictions from local ISP.
There are other uses which mostly relate to private/intranet proxies and to do with security.
Note: There is only the target server, not target page. it is the server that sends the internal pages.
Client ➜ Proxy ➜ Server (e.g. bbc.co.uk)

The remote IP is seen, and recorded if needed, by the server e.g. bbc.co.uk and all pages under the same domain server have access to it.

Servers track users via IP and often record data in form of cookies and stored data.
Above data are tied to the domain.

A cookie with IP (and/or login credentials) placed at https//bbc.co.uk/iplayer/abc.html is actually placed under bbc.co.uk and accessible by all bbc.co.uk pages.

Once a user connects to https//bbc.co.uk/news/xyz.html without a proxy, the user IP/login/etc will be recorded.
If the user connects to https//bbc.co.uk/iplayer/abc.html with a proxy, the other data are also accessible.

Users often attempt to login in with multiple usernames to sites or games. Therefore, servers often keep track of users IP/Location. Furthermore, many servers track IP/Location for security reason.

That is when a user from UK gets an email or notice that someone tried to access their server/site from Vietnam.

Many servers ban such behaviour. I am aware of a number of such sites/servers that actively ban users who attempts to fool the server with different IP/Location.
There are server that are contracted as single user (server rental packages) and they block shared use by tracking user's IP/Location.

To cut the long story short, proxies on path defeats the purpose of the proxy use as everything is tied to the main domain. The server at the main domain sees all.

That is the reason, many browser APIs are tied to the host as otherwise is meaningless. While the full URL is available in some API, that is for information.

@Faris-07
Copy link
Author

@erosman, no one is denying any of these issues, however, people should be given the freedom to do as they wish with the tool. My proxy service isn't as reliable, and has capped behaviour; I only need geolocation services so path regex works fine.

@Faris-07
Copy link
Author

Faris-07 commented Feb 23, 2022

No idea but certainly greater than zero!

Heh. We can dream, right ?

Agreed, users should be notified of breaking behaviour. I will whip something up in the coming days.

Since you are going for v8.x then maybe there are some other things we can consider. Here are some suggestions, please share your thoughts:

  1. Due to the precarious nature of how the tool works, we need to provide strong visual feedback to the user. There is no way of knowing what is and isn't being re-routed without pulling up web dev tools, or checking the logging feature. This is not good UI design. To solve this we can do the following:
    1.1 Add a percentage metric in the add-on drop down e.g. 60% (6/10 URLs) of traffic is being routed via proxy
    1.2 Add the URLs that are being routed and/or not being routed in the drop down
    1.3 Rather than having an svg being toggled in the icon, we add the user specified proxy colour to the tab button, or the address toolbar of any tab routing traffic via proxy.

  2. For each pattern we give the user the option to route the entire tab if the tab.url matches the pattern, or limit it to any matching URI.

@erosman
Copy link
Collaborator

erosman commented Oct 12, 2022

The matching process has changed in FoxyProxy v8.0 (once released).

Patterns can match the entire URL, if needed. Therefore, e.g.:

Wildcard

*://example.com/
wss://example.com/

Regular expression

https?://example\.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants