Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

full-icu build: https.get with Chinese characters results in Bad Request #25634

Closed
Christilut opened this issue Jul 6, 2015 · 14 comments
Closed
Assignees
Labels

Comments

@Christilut
Copy link

I built node 0.12.6 from source with the following options: --with-intl=full-icu --download=all
When doing a https.get with the following URL:

zh.wikipedia.org/w/api.php?action=opensearch&search=笔记本电脑  

it results in a Bad Request and an empty body.
However, if I did the exact same thing with an older node version (0.10.25) it did work. I understand it doesn't work without the full-icu build but with full-icu it should work, right?
Can someone shed some light on this?

@srl295 srl295 added the i18n label Jul 6, 2015
@srl295 srl295 self-assigned this Jul 6, 2015
@srl295
Copy link
Member

srl295 commented Jul 6, 2015

@Christilut full-icu shouldn't affect https.get at all… can you submit the source that caused the issue?

@trevnorris
Copy link

I'd imagine it's because the characters aren't being packaged properly. I'd use a TCP server, hit it with the get request and dump out the raw data to see the difference.

@Christilut
Copy link
Author

This is the sample I'm using to recreate it

// Generated by CoffeeScript 1.9.3
(function() {
  var GET, cmd, https, languageCode, options;

  https = require('https');

  GET = function(options, callback) {
    return https.get(options, function(response) {
      var body;
      body = '';
      response.on('data', function(data) {
        return body += data;
      });
      return response.on('end', function() {
        return callback(body, response);
      });
    });
  };

  languageCode = 'zh';

  cmd = '笔记本电脑';

  options = {
    host: languageCode + '.wikipedia.org',
    path: '/w/api.php?action=opensearch&search=' + cmd
  };

  GET(options, function(body, response) {
    return console.log(response);
  });

}).call(this);

@trevnorris
Copy link

Here is a simplified case:

var http = require('http');
var cmd = '笔记本电脑';

http.get('http://l.me:8000/' + cmd, function() { });

Output from v0.10:

GET /笔记本电脑 HTTP/1.1
Host: l.me:8000
Connection: keep-alive

Output from v0.12:

GET /�,5 HTTP/1.1
Host: l.me:8000
Connection: close

The header is all borked, and looks the exact same as if we did:

console.log(Buffer(cmd, 'binary').toString());  // �,5

So basically, the header is being parsed as 'binary' encoding instead of utf8.

While technically the new encoding is correct, as unicode characters should be decoded into their % counterparts, it is not being done. What I mean is essentially doing the same as the browser:

> encodeURI('笔记本电脑')
"%E7%AC%94%E8%AE%B0%E6%9C%AC%E7%94%B5%E8%84%91"

So this is a regression, but at the same time it seems reverting also isn't the correct solution. Instead the missing functionality should be added.

I'm not sure when this landed, but I'll be interested to see what the commit's reasoning for the change.

@Christilut
Copy link
Author

Any suggestions to work around this so I can keep using the latest version?

@srl295
Copy link
Member

srl295 commented Jul 6, 2015

@trevnorris thanks. On mac v0.12.2 it seems to ""work"". May be env dependent.

So wouldn't `

path: '/w/api.php?action=opensearch&search=' + encodeURI(cmd)

… be a workaround?

@Christilut So, it's not related to your --with-intl options.

@trevnorris
Copy link

Heh. Totally forgot that encodeURI is part of V8, not the browser. Yeah, that's the correct work around for now.

I'm going to push that this is done by default in io.js if multi-char strings are detected.

@srl295
Copy link
Member

srl295 commented Jul 7, 2015

@trevnorris beware the "May be env dependent."

@trevnorris
Copy link

@srl295 Don't follow?

@srl295
Copy link
Member

srl295 commented Jul 7, 2015

@trevnorris I meant, I couldn't reproduce this on Mac, perhaps it is environment dependent, picking up the "default encoding" of the platform.

@bnoordhuis
Copy link
Member

I'm going to push that this is done by default in io.js if multi-char strings are detected.

I tried something similar in v0.11 but I had to back it out again:

38149bb http: escape unsafe characters in request path
7124387 http: don't escape request path, reject bad chars

It was less discriminatory but the basic issue remains the same, I think: you don't know if (part of) the path has been escaped already.

@srl295
Copy link
Member

srl295 commented Sep 8, 2015

@bnoordhuis @trevnorris does this need to stay open in the 'archive' repo?

@bnoordhuis
Copy link
Member

I doubt anyone is going to fix it in v0.12, too much risk of regressions. I'll close it.

@Flimm
Copy link

Flimm commented May 30, 2017

I've filed a very similar issue with the current master branch of Node nodejs/node#13296

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants