-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wchar support #191
Wchar support #191
Conversation
version and no message in linux
Sorry, I am not sure why the LOGFW test fails in Linux. I will have a linux box for testing available tomorrow or the day after and will look into it then. It lead me to a couple of real bugs already, though ... just none that fixed the test for good. |
… Build.cmake to have them already set and to move generation of the header to after calling Build.cmake) %S vs %s windows vs unix headache taken into account in test_io wq
OK, got my linux dev box up and running and found the reason for the failure: MSVC vs. gcc incompatibility in the wprintf format specifications. Thus not a library patch, just the unit test corrected. |
@AndreasSchoenle I haven't forgotten about this one. This one is on top of my current to- review list |
@AndreasSchoenle : how would you suggest that I test this end to end? I've delayed this PR since I don't have enough knowledge, except knowing it is tricky, to get language support correctly. What would you suggest be a comprehensive test setup to verify this? |
@KjellKod : It seems to me that the problem you describe is not really internal to g3log but just a question whether the program logging uses LOG/LOGW statements with the encodings the sinks in use assume to be used: Quite obviously any encoding could be stored both in _message or _wmessage. For _message this also applies to the master branch. If you would i.e. call LOG and pass some strings which are UTF-8 encoded and others with latin-1 encoding, the sink will not be able to distinguish them and the log file cannot be displayed correctly. It thus always was and still is a requirement that the program logging and the sink have to agree about the use of encodings unless you want to store both the string and an encoding identifier in the message - which I think is too much effort for too little benefit. In the end it is the sink that has to make sure that all text written to the log file will be in the same encoding and the current implementation of the LogMessage::message() function will cause all sinks using it to write a correct UTF-8 encoded file as long as all LOG() statements pass UTF-8 and all LOGW() statements pass UCS-2 in windows and UCS-4 in Linux (see http://en.cppreference.com/w/cpp/locale/codecvt_utf8). But this merely introduces a convenient convention which may be ignored when implementing custom sinks. E.g. In my own windows environment I assume that LOG is UTF-8 and LOGW UTF-16. For file sinks all is then converted to UTF-8 and for console or GUI output to UTF-16. If I were to make a LOG call with latin-1 encoded strings, both files and GUI output would not be correct. I think it is a strength of the implementation that it does not impose an encoding. In a way it is the same for the wchar_t part as it always was for the char part: The message is encoding agnostic. As far as tests go, I believe the existing ones are more or less sufficient. Anything beyond this would not really test g3log but rather the codecvt_utf8 implementation. |
@AndreasSchoenle I agree with you completely. Can you create a example sinks which could push readable foreign characters (say Chinese or Cyrillic) to stdout as well as a log file. Or with text in the API doc describe how it could be done? I.e. the number one block of this PR right now, for me, is usage. I'm suspecting most coders are also as locale ignorant as I am (due to lack of experience) |
Although interesting. Due to lack use cases I am hesitant to merge this. |
I haven't gotten around to looking at it again for a long while and I am quite busy with other stuff at the moment. We can also just shelve it - after all our fork is more or less the same as having a separate branch for now. However, here are a couple of quick thoughts, possibly repetitive w.r.t. what was already said: Strictly speaking, the library can do without it: As I have outlined above, not matter whether wchar_t is supported in the log message itself, the logging application and the sinks have to agree on a character encoding anyhow unless you want to add all the overhead of a message carrying information about its specific encoding (UTF-8/UTF-16/Ascii/8bit with specific codepage...). I do not see the need for this at the moment. So in any scenario, instead of having an additional wchar_t payload, one can just convert any string to an 8-bit format (UTF-8 makes the most sense) before passing it to the LogMessage. If I would start to use g3log right now, I would probably do this due to lack of resources to think about the wchar_t issue. However, one of the mission statements of g3log is speed. The wchar_t payload allows the thread that creates the log messages to defer string conversion to the sink side, i.e. the thread that passes the log message is slowed down considerably less. Thus the use case is: In a scenario where large parts of the log string already exist in a wide string encoding, i.e. because they come from a Qt or Windows GUI or because the application uses wide strings throughout, this patch makes g3log faster. Also, if you decide to stay out of the encoding business (i.e. not give the message additional state specifying the encoding but rather accept that sink and logging thread have to "know about each other" as it is the case at the moment) it "completes" the framework: The consumers of the lib has total freedom to create payloads in whichever string encoding they wish as long as the sink they implement recognizes it correctly and deals with it before pushing it into a file, network connection, widget, etc. It seems consistent to me. Anyhow - it is your decision. I am pretty sure that if you decide to give it a shot I would probably add a litte thing that either gives a good error message when trying to pass a wide string to the LOG macro or even create a little stream wrapper that selects wstream() or stream() based on whether the first object passed in is a std::wstring or wchar_t * or anything else. In our code base this would cover every instance of LOGW at the moment but it might be considered a litte obscure. |
I think the best approach for now is to create a (semi) permanent branch on G3log.
I can make that happen. And have these changes there. That means it can easier be used by the community - I can add some doc around it and it'll immediately start having more value than hanging around as a PR
Thanks for your reply
Cheers
Kjell
…Sent from my iPhone
On Oct 13, 2017, at 3:33 AM, Andreas Schönle ***@***.***> wrote:
I haven't gotten around to looking at it again for a long while and I am quite busy with other stuff at the moment. We can also just shelve it - after all our fork is more or less the same as having a separate branch for now.
However, here are a couple of quick thoughts, possibly repetitive w.r.t. what was already said:
Strictly speaking, the library can do without it: As I have outlined above, not matter whether wchar_t is supported in the log message itself, the logging application and the sinks have to agree on a character encoding anyhow unless you want to add all the overhead of a message carrying information about its specific encoding (UTF-8/UTF-16/Ascii/8bit with specific codepage...). I do not see the need for this at the moment. So in any scenario, instead of having an additional wchar_t payload, one can just convert any string to an 8-bit format (UTF-8 makes the most sense) before passing it to the LogMessage. If I would start to use g3log right now, I would probably do this due to lack of resources to think about the wchar_t issue.
However, one of the mission statements of g3log is speed. The wchar_t payload allows the thread that creates the log messages to defer string conversion to the sink side, i.e. the thread that passes the log message is slowed down considerably less.
Thus the use case is: In a scenario where large parts of the log string already exist in a wide string encoding, i.e. because they come from a Qt or Windows GUI or because the application uses wide strings throughout, this patch makes g3log faster.
Also, if you decide to stay out of the encoding business (i.e. not give the message additional state specifying the encoding but rather accept that sink and logging thread have to "know about each other" as it is the case at the moment) it "completes" the framework: The consumers of the lib has total freedom to create payloads in whichever string encoding they wish as long as the sink they implement recognizes it correctly and deals with it before pushing it into a file, network connection, widget, etc. It seems consistent to me.
Anyhow - it is your decision. I am pretty sure that if you decide to give it a shot I would probably add a litte thing that either gives a good error message when trying to pass a wide string to the LOG macro or even create a little stream wrapper that selects wstream() or stream() based on whether the first object passed in is a std::wstring or wchar_t * or anything else. In our code base this would cover every instance of LOGW at the moment but it might be considered a litte obscure.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Entirely your choice, of course and thanks. Having it as a branch and hinting users to the fact that it exists may well help to decide whether it is a feature that seems useful to more than one person. I will shall ponder how to change up our fork after it is done to keep maintenance overhead minimal. Best wishes, |
Let me think it through over the weekend.
…Sent from my iPhone
On Oct 13, 2017, at 5:54 AM, Andreas Schönle ***@***.***> wrote:
Entirely your choice, of course and thanks. Having it as a branch and hinting users to the fact that it exists may well help to decide whether it is a feature that seems useful to more than one person.
Let me know in case you want me to write a few lines basically outlining the motivation I gave above.
I will shall ponder how to change up our fork after it is done to keep maintenance overhead minimal.
Best wishes,
Andreas
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
1210 |
closed. added as a permanent branch: https://github.com/KjellKod/g3log/tree/wchar_support |
Thanks for reviewing my pull requests so quickly and your kind comments. Your recent merge makes the pull request for my current wchar_t implementation quite slim. I also took into account a few of your comments from the previous attempt. I am creating this PR to open the discussion.
A few notes: