-
-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support of UTF-8 for Windows #728
Conversation
src/host/os_isdir.c
Outdated
@@ -42,6 +44,10 @@ int os_isdir(lua_State* L) | |||
{ | |||
lua_pushboolean(L, 0); | |||
} | |||
|
|||
#ifdef _WIN32 | |||
lua_pop(L, -2); /* pop wide string */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, should be lua_remove
, can't fix it right now
src/host/utf8handling.c
Outdated
lua_pushstring(L, unicode_str); | ||
free(unicode_str); | ||
|
||
return unicode_str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you just free'd that pointer.... should really not return that, as that will be a "use after free" access violation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, in my original design I wasn't returning anything. Will change that.
src/host/utf8handling.c
Outdated
#include "stdlib.h" | ||
|
||
#ifdef PLATFORM_WINDOWS | ||
const char* utf8_fromwide(lua_State* L, const wchar_t* wstr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a size_t utf8_fromwide(char* destination, size_t destSize, const wchar_t* wstr);
which would not need to allocate memory, would allow for the call to WideCharToMultiByte to simply be called once. And wouldn't need the LuaState... pushing the string to the state would be up to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though using Lua GC to manage memory was a good idea, but I will make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmh, actually a wrapper function is useless if all it does is calling WideCharToMultiByte
, since we can do that in places where we are calling this function (since we're already checking for the Windows part).
What if I make something like that:
const char* utf8_fromwide(const wchar_t* wstr)
{
int size_required = WideCharToMultiByte(CP_UTF8, 0, wstr, -1, NULL, 0, NULL, NULL);
char* unicode_str = (char*) malloc(size_required * sizeof(char));
WideCharToMultiByte(CP_UTF8, 0, wstr, -1, unicode_str, size_required, NULL, NULL);
return unicode_str;
}
be a better solution? Of course the caller has to free the buffer then (which I dislike).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that that is not preferable... In general I was just hoping that a malloc for this isn't something we need... mostly because these match functions are used ALOT, and so have quite a bit of performance impact... on our project we're already hitting 2GB of memory use.
if that removes the need for this method entirely, then I'm OK with just calling it in place to be entirely honest, and simply use stack for this...
Other then the use after free, I have no real objections... I think API wise relying less on lua and more on just C would be nice, but it's no breaking point for me.. we can always refactor that further if we wish for that.. |
I'll try to think of something that will not impact performances much then,
especially there.
I'll use the stack wherever I can and remove those two functions then, I
hope I can make it in a not-so-dirty way, despite the Win32 calls.
Le 3 avr. 2017 8:24 PM, "Tom van Dijck" <notifications@github.com> a écrit :
… ***@***.**** commented on this pull request.
------------------------------
In src/host/utf8handling.c
<#728 (comment)>:
> @@ -0,0 +1,33 @@
+/**
+ * \file utf8handking.c
+ * \brief Handles conversions between Unicode (UTF-8) and system native encoding (wide chars on Windows)
+ * \author Copyright (c) 2017 Jérôme Leclercq and the Premake project
+ */
+
+#include "premake.h"
+#include "stdlib.h"
+
+#ifdef PLATFORM_WINDOWS
+const char* utf8_fromwide(lua_State* L, const wchar_t* wstr)
I agree that that is not preferable... In general I was just hoping that a
malloc for this isn't something we need... mostly because these match
functions are used ALOT, and so have quite a bit of performance impact...
on our project we're already hitting 2GB of memory use.
if that removes the need for this method entirely, then I'm OK with just
calling it in place to be entirely honest, and simply use stack for this...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#728 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC3QXXZ5ZGwbTQqZrj32mBKwtbFqfRQAks5rsTldgaJpZM4Mw07h>
.
|
Some of this stuff was discussed as part of #629. I never got round to doing anything that was proposed but I think this might fix that issue as well. |
I remade this using only stack memory (and I'm wondering why I didn't do that in the first place..). I'm wondering, should I apply the same changes to the integrated Lua library to support unicode operations? (Talking about |
Thank you very much, that looks pretty clean to me... ready to merge as far as I'm concerned.
We could, but lets make that a separate pull request.. I'm not too fond of actually changing lua itself to be honest, since that makes moving between versions of Lua really cumbersome.. So I'd rather see extensions to the io library to support this... like an io.open_utf8(...) or something of that kind, I don't know how that would look really.. so I don't really have a very well thought out opinion on this, but it should be fairly straightforward to 'override' the standard io library with methods that support utf8 instead. I would imagine that really all you need to override is the io.open method? all the other methods work on stream handles (FILE*, etc), so those should not be affected. |
No problem with making another pull request. I don't think making a separate Unicode open function will work in this case because of the io metatable (for calls like |
Hi there,
We're using Premake at my job and we had some troubles when trying to copy files from/to locations containing unicode characters, the reason for this is that Premake uses the ASCII version of the Windows API.
I made a fix by adding two helper functions (
utf8_fromwide
andutf8_towide
) and using them with the Unicode Windows API everywhere I could (I just didn't do it for the os.getpass function, making the Windows console working with unicode is not an easy thing to do).Anyway, tell me if I need to fix some stuff, as I'm not sure I did everything in the "right" (premake-team) way 😄