-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple HTTP proxying fails #92
Comments
|
The problem originates at request.js, line #1297. There a buffer object is converted to a string using buffer.toString(encoding) with encoding=undefined. The buffer contains German special characters (ü), the resulting string is truncated at the first such character. |
Thanks for tracking this down! The latest request module (in master) looks a little different. Have they fixed this in the meantime? https://github.com/request/request/blob/master/request.js Nonetheless, it looks to me like this is a case where Trireme is handling the invalid string encoding differently than regular Node. That means it's something that we should fix. Do you think you know enough to put together a test case? Java has many options for string encoding and decoding that effect how invalid characters are handled and it may well be the case that we need to choose another one. |
Yes, I tried this code and it fails:
Here, self.encoding is undefined, so that the code path calls buffer.toString(undefined) which will apply UTF-8 decoding to the buffer contents. The buffer contents however use a German charset and can not be interpreted as UTF-8. My guess is, node.js represents strings as UTF-8 internally, so that conversion from buffer to string back to buffer again is a plain memcpy. In my understanding, response.js is broken in that it does not respect the charset attribute of the content-type header sent by default. A workaround is to specify binary encoding in the request options explicitly. |
The following code will print okay on node and 256 != 128 on trireme:
My guess was wrong, but node and Trireme treat illegal byte sequences differently. As you pointed out, the replacement mechnism can be used here. This will not fix request.js however, even on node the code mentioned in this issue does NOT provide the original data as sent by the remote server. It just replaces the special characters. So, I'm not sure if this is worth fixing as not specifying a correct encoding is an issue of request.js. |
Yep, that makes sense. The short-term fix is going to be to change the "Utils.bufferToString" class to tell the CharsetEncoder to "replace" unrecognized characters rather than to "report" them. That brings Trireme to work more like regular node in that it will replace those characters. (Interestingly, in this case, Java will replace the unrecognized characters with "?" while node uses a different ASCII code.) Longer-term I think that we want to do more because Node does a lot of specific things with Base64 encoding that we don't do quite the same way and I'd like to fix that. Now, on your point about request, you're absolutely right -- it looks like google.de returns content encoded in iso-8859-1, which is not the same as UTF-8, and in fact it is not a character set that's built in to Node -- even if you passed that character set name to request, I still think it'd fail. Technically request should be using a module like "iconv" or "iconv-lite" to convert the characters, since those modules (which also work in Trireme) support a much larger universe of character sets. |
This turned out to be complicated because the default UTF-8 converter in Java was treating the umlaut character as an impartial UTF-8 character rather than as an invalid character, and it stopped the decoding. The Node.js "string_decoder" module handles this correctly but Trireme was using its own implementation. I changed that back, and also changed the string decoding and encoding code to replace invalid characters rather than stop, and now things work a lot more like regular Node. I'd like to push out a release in the next day or two that contains the latest few fixes, as they're all pretty important bugs to get fixed out there. Can you try what is in master and see if it is any better? It worked on "http://www.google.de/" for me. |
Tested it, the output is just like node.js, sweet! By the way, on Windows when running the tests via Maven I get this, which is probably because the test not being maintained on this platform:
So, I build using mvn -Dmaven.test.skip=true package, copied the npm-package output to my node_modules folder and ran it. It'll probably take a few days to get my server code base uptodate, but I don't expect to run into problems. |
The script below requires express.js and request.js simply installed using npm. When running it on node.js and browsing to http://localhost:8090/request you'll get the landing page of google, as expected. However, when running in Trireme, the response data is truncated half-way of data. Trouble is, the reading side of the client socket is closed although not all data was read...
The text was updated successfully, but these errors were encountered: