-
Notifications
You must be signed in to change notification settings - Fork 7.3k
full-icu build: https.get with Chinese characters results in Bad Request #25634
Comments
@Christilut |
I'd imagine it's because the characters aren't being packaged properly. I'd use a TCP server, hit it with the get request and dump out the raw data to see the difference. |
This is the sample I'm using to recreate it // Generated by CoffeeScript 1.9.3
(function() {
var GET, cmd, https, languageCode, options;
https = require('https');
GET = function(options, callback) {
return https.get(options, function(response) {
var body;
body = '';
response.on('data', function(data) {
return body += data;
});
return response.on('end', function() {
return callback(body, response);
});
});
};
languageCode = 'zh';
cmd = '笔记本电脑';
options = {
host: languageCode + '.wikipedia.org',
path: '/w/api.php?action=opensearch&search=' + cmd
};
GET(options, function(body, response) {
return console.log(response);
});
}).call(this); |
Here is a simplified case: var http = require('http');
var cmd = '笔记本电脑';
http.get('http://l.me:8000/' + cmd, function() { }); Output from v0.10:
Output from v0.12:
The header is all borked, and looks the exact same as if we did: console.log(Buffer(cmd, 'binary').toString()); // �,5 So basically, the header is being parsed as While technically the new encoding is correct, as unicode characters should be decoded into their
So this is a regression, but at the same time it seems reverting also isn't the correct solution. Instead the missing functionality should be added. I'm not sure when this landed, but I'll be interested to see what the commit's reasoning for the change. |
Any suggestions to work around this so I can keep using the latest version? |
@trevnorris thanks. On mac v0.12.2 it seems to ""work"". May be env dependent. So wouldn't `
… be a workaround? @Christilut So, it's not related to your |
Heh. Totally forgot that I'm going to push that this is done by default in io.js if multi-char strings are detected. |
@trevnorris beware the "May be env dependent." |
@srl295 Don't follow? |
@trevnorris I meant, I couldn't reproduce this on Mac, perhaps it is environment dependent, picking up the "default encoding" of the platform. |
I tried something similar in v0.11 but I had to back it out again: 38149bb http: escape unsafe characters in request path It was less discriminatory but the basic issue remains the same, I think: you don't know if (part of) the path has been escaped already. |
@bnoordhuis @trevnorris does this need to stay open in the 'archive' repo? |
I doubt anyone is going to fix it in v0.12, too much risk of regressions. I'll close it. |
I've filed a very similar issue with the current |
I built node 0.12.6 from source with the following options: --with-intl=full-icu --download=all
When doing a https.get with the following URL:
it results in a Bad Request and an empty body.
However, if I did the exact same thing with an older node version (0.10.25) it did work. I understand it doesn't work without the full-icu build but with full-icu it should work, right?
Can someone shed some light on this?
The text was updated successfully, but these errors were encountered: