Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: decode special characters in proxy username and password #2696

Merged
merged 1 commit into from
Oct 4, 2024

Conversation

B4nan
Copy link
Member

@B4nan B4nan commented Oct 4, 2024

When using newProxyInfo function, the username and password extracted from proxyUrl are now properly decoded.

https://apify.slack.com/archives/C0L33UM7Z/p1727966399183259

@B4nan B4nan added the adhoc Ad-hoc unplanned task added during the sprint. label Oct 4, 2024
@github-actions github-actions bot added this to the 99th sprint - Tooling team milestone Oct 4, 2024
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Oct 4, 2024
@B4nan B4nan requested a review from barjin October 4, 2024 10:17
Copy link
Contributor

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes seem safe enough to me, so feel free to merge.

My question is: do we know the actual mechanism of where the bug (from the Slack thread) is originating? I cannot find a single use of proxyInfo.username or .password in Crawlee codebase (we always just parse it from the URL with decodeURIComponent).

@barjin
Copy link
Contributor

barjin commented Oct 4, 2024

Btw. if somebody is expecting the username and password to be percent-encoded in their user code (e.g. reading it from context in a router), this could cause issues for them (decodeURIComponent is not idempotent - e.g. %25zzz decodes to %zzz, which is not a valid percent-encoded string, i.e. another decodeURIComponent call will throw).

@B4nan
Copy link
Member Author

B4nan commented Oct 4, 2024

My question is: do we know the actual mechanism of where the bug (from the Slack thread) is originating? I cannot find a single use of proxyInfo.username or .password in Crawlee codebase (we always just parse it from the URL with decodeURIComponent).

I guess some custom handling, I am also not entirely sure. I would guess they work with the response of newProxyInfo directly. Also I just noticed we have the very same problem in the SDK.

https://github.com/apify/apify-sdk-js/blob/master/packages/apify/src/proxy_configuration.ts#L267

Btw. if somebody is expecting the username and password to be percent-encoded in their user code (e.g. reading it from context in a router), this could cause issues for them (decodeURIComponent is not idempotent - e.g. %25zzz decodes to %zzz, which is not a valid percent-encoded string, i.e. another decodeURIComponent call will throw).

Yes, if someone was too lazy to report a bug and instead implemented a workaround without telling us, their code will break. Not much we can do about that, to me this is a clear bug.

@B4nan
Copy link
Member Author

B4nan commented Oct 4, 2024

Btw. if somebody is expecting the username and password to be percent-encoded in their user code (e.g. reading it from context in a router), this could cause issues for them (decodeURIComponent is not idempotent - e.g. %25zzz decodes to %zzz, which is not a valid percent-encoded string, i.e. another decodeURIComponent call will throw).

Also note that this is inconsistent with the SDK, if you use the proxy config password option, there is no encoding, if you use a proxy URL with the password, it will be encoded. This only confirms that we really should fix it.

I asked for more details about the actual problem, but we should do this regardless of that I'd say.

@B4nan B4nan merged commit 0f0fcc5 into master Oct 4, 2024
11 checks passed
@B4nan B4nan deleted the decode-proxy-passwords branch October 4, 2024 12:23
B4nan added a commit to apify/apify-sdk-js that referenced this pull request Oct 4, 2024
B4nan added a commit to apify/apify-sdk-js that referenced this pull request Oct 7, 2024
* fix: decode special characters in proxy `username` and `password`

Related: apify/crawlee#2696

* encode username and hostname in `composeDefaultUrl`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants