-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect header encoding conversion #2011
Comments
Can you give me the code for this vs screenshots so that I can review? |
Is this it? |
OK, I've moved the re-encoding fix-up to only response headers. That's in place to fix #706 where the header was encoded as 8559 but held UTF bytes instead. Browsers seem to do this fix up too so the solution seems necessary. We do need better tests for this - I wasn't able to get Jetty to emit the header incorrectly so can't directly add a test case. For request headers, the value set by the user is now retained directly. When making the request, Java will encode the header as UTF-8. Servers will probably expect 8559 and so this may or not work. Per spec, the content should either be limited to 8559 content or encoded with RFC 2047. We don't attempt to automatically do that (and some servers will be OK). A bit of a grey area here. Happy for other suggestions. |
Can you give me a sample URL or code so that I can actually review the server's response properly? |
https://www.zhenshezw.com/ |
Hi, if you can't reproduce the issue could you add a configuration option to skip the fixHeaderEncoding? |
I get caught in bot detections when I try this. Can you provide sample code so that I can try and repro? I won't add a configuration option unless I can validate it. You could always fork the code yourself, of course. |
package org.example;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
Map<String, String> headers = new HashMap<>();
// get these header from browser devtools
headers.put("User-Agent", "");
headers.put("Cookie", "");
try {
Connection.Response response = Jsoup.connect("https://www.zhenshezw.com/gut.php")
.followRedirects(false)
.requestBody("search=%E6%88%91%E7%9A%84")
.headers(headers)
.method(Connection.Method.POST)
.execute();
System.out.println(response.header("Location"));
} catch (IOException e) {
e.printStackTrace();
}
}
} |
I found this issue to be platform related, in java it works fine but in android it has issues. |
Thanks, that's good sleuthing! Need to think of a good way to detect and handle this situation... |
This encoding conversion is wrong, you cannot restore the original binary content from a string without knowing its encoding.
Such conversion leads to loss of some characters.
Related references: https://stackoverflow.com/a/39308860
By the way: when will the next version be released?
The text was updated successfully, but these errors were encountered: