-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding conversion #5
Comments
From milo...@gmail.com on December 02, 2011 20:43:44 Reader/Writer can now perform transcoding with Transcoder. Status: Fixed |
From milo...@gmail.com on November 27, 2011 00:33:27
Currently, the input and output of Reader uses the same encoding.
It is often needed to read a stream of one encoding (e.g. UTF-8), and output string of another encoding (e.g. UTF-16). Or in the other way, stringify a DOM from one encoding (e.g. UTF-16) to an output stream of another encoding (e.g. UTF-8)
The most simple solution is converting the stream into a memory buffer of another encoding. This requires more memory storage and memory access.
Another solution is to convert the input stream into another encoding before sending it to the parser. However, only characters in JSON string type are really the ones necessary to be converted. Conversion of other characters just wastes time.
The third solution is letting the parser distinguish the input and output encoding. It uses an encoding converter to convert characters of JSON string type. However, since the output length may longer than the original length, in situ parsing cannot be permitted.
Try to design a mechanism to generalize encoding conversion. And it should support UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. It can also support automatic encoding detection with BOM, while incurring some overheads in dynamic dispatching.
Original issue: http://code.google.com/p/rapidjson/issues/detail?id=4
The text was updated successfully, but these errors were encountered: