Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wchar support #191

Closed

Conversation

AndreasSchoenle
Copy link
Contributor

Thanks for reviewing my pull requests so quickly and your kind comments. Your recent merge makes the pull request for my current wchar_t implementation quite slim. I also took into account a few of your comments from the previous attempt. I am creating this PR to open the discussion.

A few notes:

  • I propose to test for the codecvt header in the cmake file.
  • It may well make sense to add a flag to the generated header file as you proposed. (I fact, the public compile definition is in fact the same as long as you use export to generate a cmake config file along with your install). On the other hand, availability of codecvt, mainly determines how the message() call (which should be rarely used in production) will translate the wstring part of the message.
  • One proposal to reduce the "problem" would be to replace codecvt with a g3log internal function if it is not available. A first simple shot would be to take all ANSI characters as they are and replace everything else with a "?".

@AndreasSchoenle
Copy link
Contributor Author

Sorry, I am not sure why the LOGFW test fails in Linux. I will have a linux box for testing available tomorrow or the day after and will look into it then. It lead me to a couple of real bugs already, though ... just none that fixed the test for good.

… Build.cmake to have them already set and to move generation of the header to after calling Build.cmake)

%S vs %s windows vs unix headache taken into account in test_io
wq
@AndreasSchoenle
Copy link
Contributor Author

OK, got my linux dev box up and running and found the reason for the failure: MSVC vs. gcc incompatibility in the wprintf format specifications. Thus not a library patch, just the unit test corrected.

@KjellKod
Copy link
Owner

@AndreasSchoenle I haven't forgotten about this one. This one is on top of my current to- review list

@KjellKod
Copy link
Owner

KjellKod commented Jun 27, 2017

@AndreasSchoenle : how would you suggest that I test this end to end?
I.e. say chinese characters going in one log entry (LOGW) and ascii characters going in another log entry (LOG). Wouldn't that screw up the viewing of the log since they have different encodings?

I've delayed this PR since I don't have enough knowledge, except knowing it is tricky, to get language support correctly.

What would you suggest be a comprehensive test setup to verify this?
(input from this should of course also be added later by me or you to the docs)

@AndreasSchoenle
Copy link
Contributor Author

@KjellKod : It seems to me that the problem you describe is not really internal to g3log but just a question whether the program logging uses LOG/LOGW statements with the encodings the sinks in use assume to be used:

Quite obviously any encoding could be stored both in _message or _wmessage. For _message this also applies to the master branch. If you would i.e. call LOG and pass some strings which are UTF-8 encoded and others with latin-1 encoding, the sink will not be able to distinguish them and the log file cannot be displayed correctly. It thus always was and still is a requirement that the program logging and the sink have to agree about the use of encodings unless you want to store both the string and an encoding identifier in the message - which I think is too much effort for too little benefit.

In the end it is the sink that has to make sure that all text written to the log file will be in the same encoding and the current implementation of the LogMessage::message() function will cause all sinks using it to write a correct UTF-8 encoded file as long as all LOG() statements pass UTF-8 and all LOGW() statements pass UCS-2 in windows and UCS-4 in Linux (see http://en.cppreference.com/w/cpp/locale/codecvt_utf8).

But this merely introduces a convenient convention which may be ignored when implementing custom sinks. E.g. In my own windows environment I assume that LOG is UTF-8 and LOGW UTF-16. For file sinks all is then converted to UTF-8 and for console or GUI output to UTF-16. If I were to make a LOG call with latin-1 encoded strings, both files and GUI output would not be correct.

I think it is a strength of the implementation that it does not impose an encoding. In a way it is the same for the wchar_t part as it always was for the char part: The message is encoding agnostic.

As far as tests go, I believe the existing ones are more or less sufficient. Anything beyond this would not really test g3log but rather the codecvt_utf8 implementation.

@KjellKod
Copy link
Owner

KjellKod commented Jun 28, 2017

@AndreasSchoenle I agree with you completely.
From a usage perspective I do think I need to somehow make it explicit how the user can either make a custom sink and how the file and or cout can be readable easily.

Can you create a example sinks which could push readable foreign characters (say Chinese or Cyrillic) to stdout as well as a log file. Or with text in the API doc describe how it could be done?

I.e. the number one block of this PR right now, for me, is usage. I'm suspecting most coders are also as locale ignorant as I am (due to lack of experience)

@KjellKod
Copy link
Owner

KjellKod commented Sep 5, 2017

Although interesting. Due to lack use cases I am hesitant to merge this.
One option is to create a separate branch for g3log which incorporates this.
What is your take @AndreasSchoenle ?

@AndreasSchoenle
Copy link
Contributor Author

I haven't gotten around to looking at it again for a long while and I am quite busy with other stuff at the moment. We can also just shelve it - after all our fork is more or less the same as having a separate branch for now.

However, here are a couple of quick thoughts, possibly repetitive w.r.t. what was already said:

Strictly speaking, the library can do without it: As I have outlined above, not matter whether wchar_t is supported in the log message itself, the logging application and the sinks have to agree on a character encoding anyhow unless you want to add all the overhead of a message carrying information about its specific encoding (UTF-8/UTF-16/Ascii/8bit with specific codepage...). I do not see the need for this at the moment. So in any scenario, instead of having an additional wchar_t payload, one can just convert any string to an 8-bit format (UTF-8 makes the most sense) before passing it to the LogMessage. If I would start to use g3log right now, I would probably do this due to lack of resources to think about the wchar_t issue.

However, one of the mission statements of g3log is speed. The wchar_t payload allows the thread that creates the log messages to defer string conversion to the sink side, i.e. the thread that passes the log message is slowed down considerably less.

Thus the use case is: In a scenario where large parts of the log string already exist in a wide string encoding, i.e. because they come from a Qt or Windows GUI or because the application uses wide strings throughout, this patch makes g3log faster.

Also, if you decide to stay out of the encoding business (i.e. not give the message additional state specifying the encoding but rather accept that sink and logging thread have to "know about each other" as it is the case at the moment) it "completes" the framework: The consumers of the lib has total freedom to create payloads in whichever string encoding they wish as long as the sink they implement recognizes it correctly and deals with it before pushing it into a file, network connection, widget, etc. It seems consistent to me.

Anyhow - it is your decision. I am pretty sure that if you decide to give it a shot I would probably add a litte thing that either gives a good error message when trying to pass a wide string to the LOG macro or even create a little stream wrapper that selects wstream() or stream() based on whether the first object passed in is a std::wstring or wchar_t * or anything else. In our code base this would cover every instance of LOGW at the moment but it might be considered a litte obscure.

@KjellKod
Copy link
Owner

KjellKod commented Oct 13, 2017 via email

@AndreasSchoenle
Copy link
Contributor Author

Entirely your choice, of course and thanks. Having it as a branch and hinting users to the fact that it exists may well help to decide whether it is a feature that seems useful to more than one person.
Let me know in case you want me to write a few lines basically outlining the motivation I gave above.

I will shall ponder how to change up our fork after it is done to keep maintenance overhead minimal.

Best wishes,
Andreas

@KjellKod
Copy link
Owner

KjellKod commented Oct 13, 2017 via email

@Kidixovi
Copy link

1210

@KjellKod
Copy link
Owner

@Kidixovi ?

@KjellKod
Copy link
Owner

KjellKod commented Dec 3, 2017

closed. added as a permanent branch: https://github.com/KjellKod/g3log/tree/wchar_support

@KjellKod KjellKod closed this Dec 3, 2017
@AndreasSchoenle AndreasSchoenle deleted the wchar-support branch April 3, 2021 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants