Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple HTTP proxying fails #92

Closed
t-beckmann opened this issue Nov 28, 2014 · 8 comments
Closed

Simple HTTP proxying fails #92

t-beckmann opened this issue Nov 28, 2014 · 8 comments
Milestone

Comments

@t-beckmann
Copy link
Contributor

The script below requires express.js and request.js simply installed using npm. When running it on node.js and browsing to http://localhost:8090/request you'll get the landing page of google, as expected. However, when running in Trireme, the response data is truncated half-way of data. Trouble is, the reading side of the client socket is closed although not all data was read...

var request = require('request');
var express = require('express');
var app = express();

app.get('/request', function(req, res) {
    request('http://www.google.de', function(error, response, body) {
       if(error) {
           console.log(error);
           res.send(response.statusCode, error);
           throw error;
       }
       console.log(body);
       res.send(response.statusCode, body);
    });
});
app.listen(8090);

console.log(process.versions);
@t-beckmann
Copy link
Contributor Author

I debugged this, HTTPParser seems to call onMessageComplete() too early.

@t-beckmann
Copy link
Contributor Author

The problem originates at request.js, line #1297. There a buffer object is converted to a string using buffer.toString(encoding) with encoding=undefined. The buffer contains German special characters (ü), the resulting string is truncated at the first such character.

@gbrail
Copy link
Contributor

gbrail commented Nov 29, 2014

Thanks for tracking this down!

The latest request module (in master) looks a little different. Have they fixed this in the meantime?

https://github.com/request/request/blob/master/request.js

Nonetheless, it looks to me like this is a case where Trireme is handling the invalid string encoding differently than regular Node. That means it's something that we should fix. Do you think you know enough to put together a test case? Java has many options for string encoding and decoding that effect how invalid characters are handled and it may well be the case that we need to choose another one.

@gbrail gbrail self-assigned this Dec 1, 2014
@gbrail gbrail added this to the 0.8.4 milestone Dec 2, 2014
@t-beckmann
Copy link
Contributor Author

Yes, I tried this code and it fails:

 if (self.encoding === null) {
            // response.body = buffer
            // can't move to this until https://github.com/rvagg/bl/issues/13
            response.body = buffer.slice()
          } else {
            response.body = buffer.toString(self.encoding)
          }

Here, self.encoding is undefined, so that the code path calls buffer.toString(undefined) which will apply UTF-8 decoding to the buffer contents. The buffer contents however use a German charset and can not be interpreted as UTF-8.

My guess is, node.js represents strings as UTF-8 internally, so that conversion from buffer to string back to buffer again is a plain memcpy. In my understanding, response.js is broken in that it does not respect the charset attribute of the content-type header sent by default. A workaround is to specify binary encoding in the request options explicitly.

@t-beckmann
Copy link
Contributor Author

The following code will print okay on node and 256 != 128 on trireme:

var buffer1 = new Buffer(0x100);
for(i = 0; i < 0x100; ++i) {
    buffer1[i] = i;
}
var length1 = buffer1.length;

var buffer2 = new Buffer(buffer1.toString(), 'binary');
var length2 = buffer2.length;

if(length1 != length2) {
    console.log('', length1, '!=', length2);
} else {
    console.log('okay');
}

My guess was wrong, but node and Trireme treat illegal byte sequences differently. As you pointed out, the replacement mechnism can be used here.

This will not fix request.js however, even on node the code mentioned in this issue does NOT provide the original data as sent by the remote server. It just replaces the special characters. So, I'm not sure if this is worth fixing as not specifying a correct encoding is an issue of request.js.

@gbrail
Copy link
Contributor

gbrail commented Dec 3, 2014

Yep, that makes sense.

The short-term fix is going to be to change the "Utils.bufferToString" class to tell the CharsetEncoder to "replace" unrecognized characters rather than to "report" them. That brings Trireme to work more like regular node in that it will replace those characters.

(Interestingly, in this case, Java will replace the unrecognized characters with "?" while node uses a different ASCII code.)

Longer-term I think that we want to do more because Node does a lot of specific things with Base64 encoding that we don't do quite the same way and I'd like to fix that.

Now, on your point about request, you're absolutely right -- it looks like google.de returns content encoded in iso-8859-1, which is not the same as UTF-8, and in fact it is not a character set that's built in to Node -- even if you passed that character set name to request, I still think it'd fail. Technically request should be using a module like "iconv" or "iconv-lite" to convert the characters, since those modules (which also work in Trireme) support a much larger universe of character sets.

@gbrail
Copy link
Contributor

gbrail commented Dec 4, 2014

This turned out to be complicated because the default UTF-8 converter in Java was treating the umlaut character as an impartial UTF-8 character rather than as an invalid character, and it stopped the decoding.

The Node.js "string_decoder" module handles this correctly but Trireme was using its own implementation. I changed that back, and also changed the string decoding and encoding code to replace invalid characters rather than stop, and now things work a lot more like regular Node.

I'd like to push out a release in the next day or two that contains the latest few fixes, as they're all pretty important bugs to get fixed out there. Can you try what is in master and see if it is any better? It worked on "http://www.google.de/" for me.

@gbrail gbrail closed this as completed Dec 4, 2014
@t-beckmann
Copy link
Contributor Author

Tested it, the output is just like node.js, sweet!

By the way, on Windows when running the tests via Maven I get this, which is probably because the test not being maintained on this platform:

Tests in error:
  testNpm(io.apigee.trireme.core.test.CertParserTest): Error: Mismatch in valid_
to: Mrz 14 19:06:17 2015 GMT !== Mar 14 19:06:17 2015 GMT (comparecerts.js#12)

So, I build using mvn -Dmaven.test.skip=true package, copied the npm-package output to my node_modules folder and ran it. It'll probably take a few days to get my server code base uptodate, but I don't expect to run into problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants