-
Notifications
You must be signed in to change notification settings - Fork 429
setlocale( LC_ALL, "") changes the locale for the entire application #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dinghram yes, Unfortunately, the Is there another way to get/query the system default But this Either successfully using the tidySetLanguage API, called from their own app before the call to Or compiling a Maybe we need to improve the documentation in this regard? Suggestions appreciated... thanks... Does this help? |
@dinghram that setlocale call seems to be required. Take a look at Would it be possible to get the output from |
I believe the |
It is unacceptable for any library function to have changing the locale as side-effect. |
|
No it isn't! I think you are missing the point on how setlocale() works in the context of an application. setlocale() keeps state for this process. In my case a php thread is setlocale()'d to the locale set for a logged-in user on a website. The user gets correctly formatted dates and such because of this. Possibly you should not change the locale if it is set to a non-'C' value. Autodetecting the locale according to the environment variables should only be done if no locale is set. |
Also, you should use LC_MESSAGES for this, not LC_ALL. I really think you should not be doing any initialization. If library code finds an uninitialized locale situation that's just bad luck and the code should fallback to a decent default language. That is wat "C" is for. setlocale() documentation http://man7.org/linux/man-pages/man3/setlocale.3.html says:
Emphasis on program is mine. |
|
What about: /* Backup the previous locale */
char *previous_locale = setlocale(LC_ALL, NULL);
/* Inherit the locale from the environment and use it to set Tidy's language */
TY_(tidySetLanguage)( setlocale( LC_ALL, "" ) );
/* Restore the previous locale */
setlocale(LC_ALL, previous_locale); Does that make sense and solve the OP's problem?
|
I certainly agree with @langemeijer -
@Lin-Buo-Ren maybe the above libTidy only wants a way to auto-detect the users It certainly does not want the In unix, at least, the Look forward to further feedback on this, and related #783 ... looking positive... thanks... |
@Lin-Buo-Ren: The "check, then change back" is not great in a library either. For multi-threaded users this all falls apart, since there's no guarantee that you'll cache or restore the locale appropriately (T2 caches T1's modified state, and "restores" that modified state when T2 is done; if T1 has finished first, global locale has been reset). Although maybe Tidy isn't really meant for multi-threaded use; the only spots that I've found so far that break this are the setlocale (which is fine with SUPPORT_LOCALIZATIONS off, or by pre-initializing the language), and the tag_defs set-reset usage (although here due to concurrent table use; easy enough to thread-local the thing). |
@geoffmcl wrote
If the application sets a locale, which is should if it is deemed appropriate, that locale is returned. In my opinion 'C' should fallback to 'en_EN'.
As an alternative, (still not very elegant, because I think setlocale( LC_ALL, "" ) should not be in a library) I would suggest the following approach: (pseudo-code)
|
@ler762, @Lin-Buo-Ren, @russianfool, @langemeijer, all, thank you for the helpful comments... I too have tried lots of things... see While #783 is more about Since this is like
Given that a The following suggested pseudo-code seems to add to that, in that the user's app can further ensure
This does not address any threading issue, but if neccessary the I am not sure the new suggested compile flag, like Also this does not yet address any tidy doc changes... nor maybe any potential cross-platform issues... To repeat, just looking for a non-invasive way to setup the Further feedback very welcome to reach the best solution... thanks... |
This used to be in the console application. A change early this year moved it to the library. I would suggest moving it back to the console application. |
I agree with the others about library functions shouldn't change the locale. The calling program should be responsible for setting the locale - as well as calling tidySetLanguage. Take another look at #770 (comment)
Unless the calling program has per-thread environment variables and the calling program sets all the LC_xxx, LANG, whatevers correctly, setlocale( LC_ALL, "") is going to do the wrong thing occasionally. So this bit in src/tidylib.c goes:
as well as the
and add something like
to console/tidy.c before the call to tidyCreate() and update the tidylib docs + examples to show the calling program setting the locale & tidy language. |
No time tonight, but it seems we are mostly agreed... and now there is #780 as further proof... thanks... I will try to work on this with my ideas, discoveries... in the coming days... Feel PR #785 is little step in the right direction... but needs more... And then there is doc changes to match... Look forward to more patch/code suggestions... thanks... |
I second that. In my opinion, reverting 57f623e is the most reasonable way forward. Clients of libtidy could still call |
While it seems all fully agreed that a library should not do Yes, In our case the console app But it seems the following non-invasive library pseudo-code would keep auto-detection -
Note this is guarded by Still doing research on the Windows Anyway, just a suggestion for discussion... Look forward to further feedback... thanks... |
Well, to keep things simple. Anyhow, would checking the environment variables be really necessary? Both the POSIX and the MSDN docs say that passing |
@cmb69 thank you for your feedback... Agree to keep things simple ;=)) But as you point out, not all In all that I can read, in the UNIX environment, the environment variables In Windows, those environment variables do not normally exist, and
Or even some other native Windows API, like
Ok, if ALL fears of I personally have several apps using And are we to now force an additional burden on the likes of PHP, Perl, Edbrowse, Notepad++, etc, etc... to name a few... to modify their interface with the That's my thoughts on Look forward to further feedback... thanks... |
yes
Because the caller is the only one who knows how it should be set.
It's your app, presumably you still have the source code, add this line before calling tidyCreate() Recall
Consider what happens in a German locale with this snippet in the library:
Tidy doesn't have a German translation, so TY_(tidyGetLanguageSetByUser) will be no. Or do you get a different result after calling tidySetLanguage("de_DE")? You can't look at environment variables because they're set for the calling php program, not the logged-in user on the website. And you can't call You have to make the calling program responsible for setting the locale & selecting the language. |
To add to @ler762: setlocale()'s behaviour to read the environment variables only makes any sense for applications started by a user on the command line. It does not make sense for a daemon (service) running in the background. |
Is there currently any way to set the desired locale for Tidy in PHP? |
If the desired locale is one tidy has a translation for - yes,
would probably be enough to keep tidy from calling |
@ler762 I'm particularly interested whether it is possible to set tidy's locale from PHP (without relying on any |
src/tidylib.c has this bit:
As far as I can tell, I know zip about PHP, so you're going to have to do your own testing, but it sure looks like the calling program doing
before calling |
Thanks to the comments here, though not all!, others, refs and testing, am now convinced Accordingly propose the following patch to fix this bug -
Will get to it shortly, probably in conjuction with PR #785... maybe the weekend... hopefully... |
@dinghram have eventually removed Hope you, and/or others, get the chance to pull, build, and test from version |
I have tested PHP 7.3.1 with tidy 5.7.18, and can confirm that #780 is fixed, and I couldn't detect any regressions. Thanks! Sorry, for the off-topic discussion which I started above. |
You can't realistically expect two different runtimes, following two different standards, to have a common unambiguous behavior in every specific situation. There's no such thing as "default language" other than "C" in POSIX, unless you make it to be. I.e. using variables such as LANG, LC_MESSAGES et cetera. For Windows, the situation is much better, as that explicitly defined in the runtime and can be queried using appropriate API calls. |
@AnrDaemon if you're responding to my cygwin msg, take a look at http://www.html-tidy.org/developer/ |
It was a reference to the very attempt at using setlocale and expecting it to produce any usable results. |
@cmb69 thank you for the testing and report on 5.7.18... good news... @AnrDaemon agree with you, one can not Your MSDN link suggests, clearly, you have to understand It is my understanding, Tidy did, for a time, incorrectly used @ler762, maybe I do not understand correctly! Why do you want to And the API docs already document how to Have now also done lots more testing, and while there may be other issues, to be addressed separately, in their own issue, this seems closed... thanks for all the participation... |
I'd rather not have the library change anything; it still might change the language, so like to update the tidylib example program. The example program doesn't set the locale or tidy language. Now take a look at https://github.com/htacg/tidy-html5/blob/next/src/tidylib.c#L104 If the calling program doesn't set the language for tidy to use the library will
Now pretend the example program has this bit before calling tidyCreate
So now when the program does the |
@ler762 seem we cross posted... Agreed, If you have another issue please open it as a new issue... thanks... |
Is there a change to src/tidylib.c that hasn't been committed yet? It sure looks like the library code might still change the language: |
@ler762 that's the code... what is the issue? Do I have to REPEAT? If you have another issue please open it as a new issue... thanks... |
If no missing commit, there's still code in the library to change the language. If you want to leave it in, fine, but you just said
so I guess the issue is you said the library doesn't change anything now & it sure looks like it still might. |
As I understand it, libtidy still tries to set its internal language from environment variables, but it doesn't change the general locale anymore. While this still might be arguable, this ticket is about the call to |
The correct way would have been to use 0 (zero, the round one), as the documentation also says:
Good job ... |
In tidylib.c:
TY_(tidySetLanguage)( setlocale( LC_ALL, "") );
The purpose of this line, as far as I can tell, is not to change the application locale, but only to query it. However, that is not what happens. Rather, it changes the behavior of the entire application to the point where my customers in Europe could not enter floating point values as they were parsed incorrectly after this call that I did not expect to find in a 3rd party library.
From the documentation, if you want to query the locale without changing it:
The text was updated successfully, but these errors were encountered: