Wchar support #191

AndreasSchoenle · 2017-05-10T07:48:59Z

Thanks for reviewing my pull requests so quickly and your kind comments. Your recent merge makes the pull request for my current wchar_t implementation quite slim. I also took into account a few of your comments from the previous attempt. I am creating this PR to open the discussion.

A few notes:

I propose to test for the codecvt header in the cmake file.
It may well make sense to add a flag to the generated header file as you proposed. (I fact, the public compile definition is in fact the same as long as you use export to generate a cmake config file along with your install). On the other hand, availability of codecvt, mainly determines how the message() call (which should be rarely used in production) will translate the wstring part of the message.
One proposal to reduce the "problem" would be to replace codecvt with a g3log internal function if it is not available. A first simple shot would be to take all ANSI characters as they are and replace everything else with a "?".

compiler flags

version and no message in linux

AndreasSchoenle · 2017-05-10T11:46:48Z

Sorry, I am not sure why the LOGFW test fails in Linux. I will have a linux box for testing available tomorrow or the day after and will look into it then. It lead me to a couple of real bugs already, though ... just none that fixed the test for good.

… Build.cmake to have them already set and to move generation of the header to after calling Build.cmake) %S vs %s windows vs unix headache taken into account in test_io wq

AndreasSchoenle · 2017-05-11T11:16:22Z

OK, got my linux dev box up and running and found the reason for the failure: MSVC vs. gcc incompatibility in the wprintf format specifications. Thus not a library patch, just the unit test corrected.

KjellKod · 2017-05-30T20:04:56Z

@AndreasSchoenle I haven't forgotten about this one. This one is on top of my current to- review list

Options.cmake)

KjellKod · 2017-06-27T20:38:43Z

@AndreasSchoenle : how would you suggest that I test this end to end?
I.e. say chinese characters going in one log entry (LOGW) and ascii characters going in another log entry (LOG). Wouldn't that screw up the viewing of the log since they have different encodings?

I've delayed this PR since I don't have enough knowledge, except knowing it is tricky, to get language support correctly.

What would you suggest be a comprehensive test setup to verify this?
(input from this should of course also be added later by me or you to the docs)

AndreasSchoenle · 2017-06-28T13:24:49Z

@KjellKod : It seems to me that the problem you describe is not really internal to g3log but just a question whether the program logging uses LOG/LOGW statements with the encodings the sinks in use assume to be used:

Quite obviously any encoding could be stored both in _message or _wmessage. For _message this also applies to the master branch. If you would i.e. call LOG and pass some strings which are UTF-8 encoded and others with latin-1 encoding, the sink will not be able to distinguish them and the log file cannot be displayed correctly. It thus always was and still is a requirement that the program logging and the sink have to agree about the use of encodings unless you want to store both the string and an encoding identifier in the message - which I think is too much effort for too little benefit.

In the end it is the sink that has to make sure that all text written to the log file will be in the same encoding and the current implementation of the LogMessage::message() function will cause all sinks using it to write a correct UTF-8 encoded file as long as all LOG() statements pass UTF-8 and all LOGW() statements pass UCS-2 in windows and UCS-4 in Linux (see http://en.cppreference.com/w/cpp/locale/codecvt_utf8).

But this merely introduces a convenient convention which may be ignored when implementing custom sinks. E.g. In my own windows environment I assume that LOG is UTF-8 and LOGW UTF-16. For file sinks all is then converted to UTF-8 and for console or GUI output to UTF-16. If I were to make a LOG call with latin-1 encoded strings, both files and GUI output would not be correct.

I think it is a strength of the implementation that it does not impose an encoding. In a way it is the same for the wchar_t part as it always was for the char part: The message is encoding agnostic.

As far as tests go, I believe the existing ones are more or less sufficient. Anything beyond this would not really test g3log but rather the codecvt_utf8 implementation.

KjellKod · 2017-06-28T15:46:25Z

@AndreasSchoenle I agree with you completely.
From a usage perspective I do think I need to somehow make it explicit how the user can either make a custom sink and how the file and or cout can be readable easily.

Can you create a example sinks which could push readable foreign characters (say Chinese or Cyrillic) to stdout as well as a log file. Or with text in the API doc describe how it could be done?

I.e. the number one block of this PR right now, for me, is usage. I'm suspecting most coders are also as locale ignorant as I am (due to lack of experience)

KjellKod · 2017-09-05T18:32:39Z

Although interesting. Due to lack use cases I am hesitant to merge this.
One option is to create a separate branch for g3log which incorporates this.
What is your take @AndreasSchoenle ?

AndreasSchoenle · 2017-10-13T09:33:19Z

I haven't gotten around to looking at it again for a long while and I am quite busy with other stuff at the moment. We can also just shelve it - after all our fork is more or less the same as having a separate branch for now.

However, here are a couple of quick thoughts, possibly repetitive w.r.t. what was already said:

Strictly speaking, the library can do without it: As I have outlined above, not matter whether wchar_t is supported in the log message itself, the logging application and the sinks have to agree on a character encoding anyhow unless you want to add all the overhead of a message carrying information about its specific encoding (UTF-8/UTF-16/Ascii/8bit with specific codepage...). I do not see the need for this at the moment. So in any scenario, instead of having an additional wchar_t payload, one can just convert any string to an 8-bit format (UTF-8 makes the most sense) before passing it to the LogMessage. If I would start to use g3log right now, I would probably do this due to lack of resources to think about the wchar_t issue.

However, one of the mission statements of g3log is speed. The wchar_t payload allows the thread that creates the log messages to defer string conversion to the sink side, i.e. the thread that passes the log message is slowed down considerably less.

Thus the use case is: In a scenario where large parts of the log string already exist in a wide string encoding, i.e. because they come from a Qt or Windows GUI or because the application uses wide strings throughout, this patch makes g3log faster.

Also, if you decide to stay out of the encoding business (i.e. not give the message additional state specifying the encoding but rather accept that sink and logging thread have to "know about each other" as it is the case at the moment) it "completes" the framework: The consumers of the lib has total freedom to create payloads in whichever string encoding they wish as long as the sink they implement recognizes it correctly and deals with it before pushing it into a file, network connection, widget, etc. It seems consistent to me.

Anyhow - it is your decision. I am pretty sure that if you decide to give it a shot I would probably add a litte thing that either gives a good error message when trying to pass a wide string to the LOG macro or even create a little stream wrapper that selects wstream() or stream() based on whether the first object passed in is a std::wstring or wchar_t * or anything else. In our code base this would cover every instance of LOGW at the moment but it might be considered a litte obscure.

KjellKod · 2017-10-13T11:19:50Z

I think the best approach for now is to create a (semi) permanent branch on G3log. I can make that happen. And have these changes there. That means it can easier be used by the community - I can add some doc around it and it'll immediately start having more value than hanging around as a PR Thanks for your reply Cheers Kjell

…

Sent from my iPhone

On Oct 13, 2017, at 3:33 AM, Andreas Schönle ***@***.***> wrote: I haven't gotten around to looking at it again for a long while and I am quite busy with other stuff at the moment. We can also just shelve it - after all our fork is more or less the same as having a separate branch for now. However, here are a couple of quick thoughts, possibly repetitive w.r.t. what was already said: Strictly speaking, the library can do without it: As I have outlined above, not matter whether wchar_t is supported in the log message itself, the logging application and the sinks have to agree on a character encoding anyhow unless you want to add all the overhead of a message carrying information about its specific encoding (UTF-8/UTF-16/Ascii/8bit with specific codepage...). I do not see the need for this at the moment. So in any scenario, instead of having an additional wchar_t payload, one can just convert any string to an 8-bit format (UTF-8 makes the most sense) before passing it to the LogMessage. If I would start to use g3log right now, I would probably do this due to lack of resources to think about the wchar_t issue. However, one of the mission statements of g3log is speed. The wchar_t payload allows the thread that creates the log messages to defer string conversion to the sink side, i.e. the thread that passes the log message is slowed down considerably less. Thus the use case is: In a scenario where large parts of the log string already exist in a wide string encoding, i.e. because they come from a Qt or Windows GUI or because the application uses wide strings throughout, this patch makes g3log faster. Also, if you decide to stay out of the encoding business (i.e. not give the message additional state specifying the encoding but rather accept that sink and logging thread have to "know about each other" as it is the case at the moment) it "completes" the framework: The consumers of the lib has total freedom to create payloads in whichever string encoding they wish as long as the sink they implement recognizes it correctly and deals with it before pushing it into a file, network connection, widget, etc. It seems consistent to me. Anyhow - it is your decision. I am pretty sure that if you decide to give it a shot I would probably add a litte thing that either gives a good error message when trying to pass a wide string to the LOG macro or even create a little stream wrapper that selects wstream() or stream() based on whether the first object passed in is a std::wstring or wchar_t * or anything else. In our code base this would cover every instance of LOGW at the moment but it might be considered a litte obscure. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

AndreasSchoenle · 2017-10-13T11:54:07Z

Entirely your choice, of course and thanks. Having it as a branch and hinting users to the fact that it exists may well help to decide whether it is a feature that seems useful to more than one person.
Let me know in case you want me to write a few lines basically outlining the motivation I gave above.

I will shall ponder how to change up our fork after it is done to keep maintenance overhead minimal.

Best wishes,
Andreas

KjellKod · 2017-10-13T15:58:21Z

Let me think it through over the weekend.

…

Sent from my iPhone

On Oct 13, 2017, at 5:54 AM, Andreas Schönle ***@***.***> wrote: Entirely your choice, of course and thanks. Having it as a branch and hinting users to the fact that it exists may well help to decide whether it is a feature that seems useful to more than one person. Let me know in case you want me to write a few lines basically outlining the motivation I gave above. I will shall ponder how to change up our fork after it is done to keep maintenance overhead minimal. Best wishes, Andreas — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Kidixovi · 2017-10-23T22:54:17Z

1210

KjellKod · 2017-10-24T01:14:54Z

@Kidixovi ?

KjellKod · 2017-12-03T00:06:25Z

closed. added as a permanent branch: https://github.com/KjellKod/g3log/tree/wchar_support

AndreasSchoenle added 30 commits May 2, 2017 12:36

unit tests running with windows shared dll

e04b9c0

CMake automatically uses DLL runtime - remove cmake entries modifying

32ecb13

compiler flags

Bpc package build complete

f1fe63f

missing DBUG vs DEBUG issue in Linux

9564004

Merge branch 'cmake-changes2' into ai-master

b923021

generated header added

05606f6

Merge branch 'cmake-changes2' into ai-master

42fe73e

correction in CMake: Set HEADER_FILES correctly

88426f0

Merge branch 'cmake-changes2' into ai-master

26b0e9e

HEADER_FILES set correctly

4092b65

check for codecvt

4ca40f8

Merge branch 'cmake-changes2' into wchar-support

82541ae

re-added old wchar support code

281f08f

Merge branch 'wchar-support' into ai-master

df737e5

added static library option

61f8aa8

Merge branch 'cmake-changes2' into wchar-support

694b402

Merge branch 'master' into cmake-changes2

16395d9

Merge branch 'cmake-changes2' into wchar-support

b7b6c4f

switched to powershell and included WORKING_DIRECTORY

9055389

powershell use in windows and WORKING_DIRECTORY instead of cd

cdec35b

Merge branch 'git-version' into cmake-changes2

0364c0c

Merge branch 'master' into cmake-changes2

d93f33b

Fixed appveyor.yml to use Release configuration and run unit tests

dae9008

Merge branch 'cmake-changes2' into wchar-support

99065bb

trying to make appveyor work again ...

85047a3

make sure ERRORLEVEL is 0 when unit tests run successfully

5f71e98

Still trying to fix appveyor ...

cae261a

Defaulting to shared library in linux, too

a0f98a9

Removed runtime loading test when g3log is not a shared library in linux

42154b3

Run unit tests verbosly using ctest.exe

b305134

AndreasSchoenle added 8 commits May 10, 2017 10:26

Moved G3LOG_USE_CODECVT to Options.cmake and generated header file

71704f4

Fixing linux buffer overflow.

41c8c24

removed superfluous target_compile_definition for CODECVT

5fb3404

Fixed error in LogMessage::message()

1100b25

Fixed error when linux and shared library is on

74c4c4b

Fixed linux/windows shared lib handling: FATAL_ERROR for wrong cmake

210ef9a

version and no message in linux

Merge branch 'cmake-changes3' into wchar-support

681a7e6

Fixed unit tests for DEBUG to DBUG

1a9798d

Needed compiler option for header test. (NOTE: Better to move this to…

f479e4c

… Build.cmake to have them already set and to move generation of the header to after calling Build.cmake) %S vs %s windows vs unix headache taken into account in test_io wq

test_message now compiles in c++11 when cross-compiling for ARM

6293ddb

AndreasSchoenle added 3 commits June 4, 2017 13:31

Merge branch 'master' into wchar-support

db4280e

Merge branch 'master' into wchar-support

f8b2a45

Remove superfluous check for codecvt in Build.cmake (already checked in

054d7d1

Options.cmake)

KjellKod mentioned this pull request Jul 19, 2017

Log Macro expansion doesn't work properly #215

Closed

KjellKod closed this Dec 3, 2017

AndreasSchoenle deleted the wchar-support branch April 3, 2021 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wchar support #191

Wchar support #191

AndreasSchoenle commented May 10, 2017

AndreasSchoenle commented May 10, 2017

AndreasSchoenle commented May 11, 2017

KjellKod commented May 30, 2017

KjellKod commented Jun 27, 2017 •

edited

Loading

AndreasSchoenle commented Jun 28, 2017

KjellKod commented Jun 28, 2017 •

edited

Loading

KjellKod commented Sep 5, 2017 •

edited

Loading

AndreasSchoenle commented Oct 13, 2017

KjellKod commented Oct 13, 2017 via email

AndreasSchoenle commented Oct 13, 2017

KjellKod commented Oct 13, 2017 via email

Kidixovi commented Oct 23, 2017

KjellKod commented Oct 24, 2017

KjellKod commented Dec 3, 2017

Wchar support #191

Wchar support #191

Conversation

AndreasSchoenle commented May 10, 2017

AndreasSchoenle commented May 10, 2017

AndreasSchoenle commented May 11, 2017

KjellKod commented May 30, 2017

KjellKod commented Jun 27, 2017 • edited Loading

AndreasSchoenle commented Jun 28, 2017

KjellKod commented Jun 28, 2017 • edited Loading

KjellKod commented Sep 5, 2017 • edited Loading

AndreasSchoenle commented Oct 13, 2017

KjellKod commented Oct 13, 2017 via email

AndreasSchoenle commented Oct 13, 2017

KjellKod commented Oct 13, 2017 via email

Kidixovi commented Oct 23, 2017

KjellKod commented Oct 24, 2017

KjellKod commented Dec 3, 2017

KjellKod commented Jun 27, 2017 •

edited

Loading

KjellKod commented Jun 28, 2017 •

edited

Loading

KjellKod commented Sep 5, 2017 •

edited

Loading