-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable non-ASCII extension module names #64684
Comments
Currently, the name of .pyd modules is limited within 7 bit US-ASCII. I want to do "import X" to import X.pyd, where X contains Unicode characters. About the solution:
Notes: Or if you have any question, please tell me that. Regards, |
I think that if you need a module with non-ASCII name, you can wrap it in Python module. === späm.py === from _spam import *
from _spam import __all__, __doc__
=== späm.py
```=== |
I'd use a much simpler encoding. |
Thank you for reply. The hack msg209998 is interesting, but how to name submodule with non latin like languages, especially keeping native reable? X( The reason I don't use like "name.encode('unicode-escape').replace(b'\\', b'_')" is the length limits of the identifiers. In fact, Visual C++ can accept 2047 chars(bytes) and gcc have no logical limits. But the PEP-7 says we should use C89. And even C99 assumes first 63 bytes are significant. I don't know what C89 says, And my C99 reference is below, this means real-C99 is possibly different: If we should keep C99 order above, 63 chars are too short to use 'unicode-escape' like. 'PyInit_' takes 7, remains 56. When each characters encoded as 5 chars like '_3010', only we can use 11 unicode-codepoints. When 6 chars, only 9 chars. a) If we can break C99 or real-C89/C99 don't have 63 chars rule, we can simply use as Amaury Forgeot d'Arc says. |
The PyInit_NAME symbol is not the only place where NAME is used. The NAME is also present in the PyModuleDef structure. It looks lie Python expects UTF-8 here. You encode it to UTF-8 and use "\xHH\xHH\xHH..." syntax to keep ASCII encoding for the C file? The NAME may also be mentionned in docstrings, C comments, type names, etc. I don't like the idea of a new encoding just for one very specific function in C. There are already too many encodings in the world :-( The C language supports non-ASCII identifiers, but I don't know how they are encoded in the symbol table. I would prefer to rely on the C compiler if you would like to play in the playground of non-ASCII identifiers. In Python/dynload_win.c, _PyImport_GetDynLoadWindows() uses GetProcAddressA(). Is it a theorical feature request, or you really have a Python module with a non-ASCII name? I'm not sure that it's really useful to support non-ASCII module names for C modules, even if I spend many months to support non-ASCII module names for Python modules :-) |
Thank you for reply, STINNER.
The main purpose of this issue is "I want use Cython like Python without any trouble." You don't have to worry about mentioned above. I made, and will fix when needed, the patch for Cython to convert them automatically. The sample C codes I posted uses UTF-8 directly only OUTSIDE of the "quotation", but they can be fixed if we really have to fix.
Of course I will accept any encoding and/or any solution to resolve this issue. I made new encoding only to keep the condition as possible as I can, and not to limit the naming too short when using non-ASCII characters. The patch don't include encoding-module for any purpose. For this issue, decoding is not required inside the Python.
That's why we should resolve this problem, shouldn't we? Also the standards don't define about the symbol table.
The problem is we CAN'T as you say. Or, at least, if you really think that, any ASCII limiting against dynamic loading should be removed.
_PyImport_GetDynLoadWindows() seems to be called only to resolve PyInit_xxx entry from _PyImport_LoadDynamicModule() in Python/importdl.c. I have already resolved with the posted patch before.
As I told, NO to 1st, YES to 2nd. I have many '<non-ASCII>.py' which I want to convert using Cython to '<non-ASCII>.pyd' files.
Because you are both English and Python expert. Thanks a lot to daily Python work! Thank you for reading this long description. |
It is left to your discretion. You can use idna, punycode, utf-7, szm62 or romaji. |
Updating the C extension loading API to take advantage of PEP-451 is on the |
Thanks for taking into account this issue for PEP-451. Honestly to say, I can't imagine why or/and how this issue(or my patches) causes any problems especially compatibility issues. If someone can point them, I will try to resolve. Note that I extend only the definition of "PyInit_xxxx". I don't touch the code for loading modules. |
As Victor noted, inventing our own encoding scheme just for this use case The other aspect is that changes to the extension module initialisation API |
Oh, the topic was already discussed some years ago. Start: |
Thank you Nick, I understand the behavior of this issue should be written on PEP. By the way, Can I continue the discussion here? or is there elsewhere suitable place for the PEP? |
import-sig@python.org would be the appropriate list for this one. However, we can't do anything about it until Python 3.5 next year at the earliest, and I'm already planning to write a follow-up to http://www.python.org/dev/peps/pep-0451/ that adapts the extension module import mechanism to support those APIs (addressing a number of longstanding feature requests from the Cython developers). That said, this is an independent proposal, so if you were willing to write it up as a separate PEP, that would be probably be a good idea. Our two choices to consider would be:
Option 2 is what I think we *should* do, but there will be some research involved in figuring out how good the current support for UCN C identifiers is in at least gcc, clang and Visual Studio 2013, as well as what the dynamic linker APIs support in terms of passing identifiers containing Unicode escapes to be looked up in the exported symbols. |
Python 3.4 uses Visual Studio 2010. I'm not sure that you can build an |
Oh, you're right - I temporarily forgot that the C runtime compatibility was compiler version specific on Windows. So such an approach *would* require updating the CPython compiler on Windows to at least VS2013 for 3.5. Still, we're likely to want to do that anyway - VS2010 will be as old in 2015 as VS2008 is now, and the latter is already causing hassles for building 2.7 extension modules. |
Both Visual Studio 2012 and 2013 CANNOT install on Windows Vista. That's OK for you even Vista alive until April 2017? |
Thank you Victor about msg210125, I read the discussion on ML, May 2011. Inside the articles, the previous discussion on tracker is found: Here is my memo, might be helpful to review the discussions. -- About Window CE --
-- About Windows Desktop and Servers --
I checked the last fact with my Window Japanese Editions:
GetProcAddress (Windows CE) GetProcAddress (Windows Desktop/Server) PythonCE (seems stopped at Python 2.5 compatible) Symbols seem to be encoded utf-8 inside Windows Executable -- About C/C++ Standards --
-- About C/C++ tool kits --
|
Thank you Nick about msg210209. I would like to try making PEP, but the work looks somewhat difficult. It may take the time. BTW, C/C++ Standards only allow the encoding of source code as platform dependent. They don't define "the standard encoding of source codes"... This means we have to choose to resolve this issue, one is giving up readability, the other is allowing platform-dependent feature, using UTF-8 to write the C code. |
PEP-489 (Redesigning extension module loading) includes the proposal to fix this by using punycode. |
PEP-489 was accepted and implemented, so Python 3.5+ supports non-ASCII extension module names as described in https://www.python.org/dev/peps/pep-0489/#export-hook-name |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: