Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python native extension module support #81

Open
jammm opened this issue Feb 28, 2021 · 13 comments
Open

Python native extension module support #81

jammm opened this issue Feb 28, 2021 · 13 comments
Labels
contributions welcome We'll commit to review and maintenance if the people who need it write the changes. jovian scope This is great but would be a huge undertaking and we have limited resources

Comments

@jammm
Copy link

jammm commented Feb 28, 2021

First of all, huge kudos for the amazing work on cosmopolitan.

While I'm still studying how it works, I hope you can pardon me for asking a noob question here. I was wondering if it's possible to compile cross-platform shared libraries (e..g, .so+.dll for Windows/Linux) so you can simply use a single shared library file to load dynamically in other programs e.g., via LoadLibrary on Windows or dlopen on Linux?

@jart jart added jovian scope This is great but would be a huge undertaking and we have limited resources contributions welcome We'll commit to review and maintenance if the people who need it write the changes. labels Mar 1, 2021
@jart
Copy link
Owner

jart commented Mar 1, 2021

I would be willing to merge a change that makes it possible to build Python extensions using Cosmopolitan.

  1. The assembly code is mostly PIC compatible. There are a few places where out of laziness I didn't use the ezlea macro that would need to be updated.
  2. The APE bootloader would need a linker overridable symbol that can change ET_EXEC in https://github.com/jart/cosmopolitan/blob/master/ape/ape.S
  3. A new entrypoint would need to be defined that can set the values of symbols such as hook_malloc appropriately.
  4. A small fragment of Python loading code would need to be generated which would run during the setup.py phase, which overwrites the first 64-bytes of the shared object binary so it has an MZ header on Windows.
  5. A program would need to be written to replace the objcopy -S -O binary foo.com.dbg foo.com step with something that would read the external symbol definitions from the .com.dbg binary and then insert them somehow into the .so

If all that doesn't discourage you so far, then contributions are welcome.

@jart jart changed the title Universal shared libraries? Python native extension module support Mar 1, 2021
@pkulchenko
Copy link
Collaborator

@jart, why limit this to Python extensions? I'd like to be able to build Lua extensions as well and load them into redbean or Lua executable. I'm good with re-compiling Lua modules using cosmopolitan if needed. Also, Lua support should be fairly simple, as it only relies on being able to load the library and find luaopen_libname function in the library.

Now that I think more about it, would this allow the same dynamic library to be loaded on all supported platforms? That would be very interesting...

@jart
Copy link
Owner

jart commented Jul 5, 2021

Because Python extensions have a .py file that gets run before the native code is loaded and we can put the monkey patching code in there, which turns APE blob into an SO/DLL/DYLIB. Without a script it's not possible to polyglot macho/pe/elf.

@pkulchenko
Copy link
Collaborator

We can do the same thing with Lua modules: load .lua file first, which will tweak the blob and then load it.

@pkulchenko
Copy link
Collaborator

@jart, is there a cosmopolitan-provided method to detect the system it's running on? Since the Lua code may need to patch the code differently depending on the platform, what's the best way to find it? Lua provides its own mechanism for this, but it's a compile-time one and since it's compiled on one platform, but runs on multiple ones, I don't think it's going to help here.

Repository owner deleted a comment from juanmaneo Jul 5, 2021
@jart
Copy link
Owner

jart commented Jul 5, 2021

We should fix the Lua provided one. The C functions are if (IsLinux()) { .... }, if (IsWindows()) { .... }, etc.

@pkulchenko
Copy link
Collaborator

@jart, so something like the following in luaconf.h:

#if defined(_WIN32)
#define LUA_DIRSEP	"\\"
#else
#define LUA_DIRSEP	"/"
#endif

would need to be replaced with #define LUA_DIRSEP ((IsWindows()) ? "\\" : "/")? If so, I can submit a PR.

@jart
Copy link
Owner

jart commented Jul 5, 2021

Just use forward slash. The system calls automatically change / to \ when it converts from utf-8 to utf-16.

@pkulchenko
Copy link
Collaborator

It's not (just) for the file IO conversion; Lua stores it in a config variable available from the interpreter (package.config), which is often used for detecting the host system (windows or not).

@jart
Copy link
Owner

jart commented Jul 5, 2021

Does Lua change its behavior based on that? I want the portability guff to be abstracted by the POSIX interfaces. How about we just add an API to redbean. What do you think of:

static int LuaGetHostOs(lua_State *L) {
  const char *s;
  if (IsLinux()) {
    s = "linux";
  } else if (IsMetal()) {
    s = "metal";
  } else if (IsWindows()) {
    s = "windows";
  } else if (IsXnu()) {
    s = "xnu";
  } else if (IsOpenbsd()) {
    s = "openbsd";
  } else if (IsFreebsd()) {
    s = "freebsd";
  } else if (IsNetbsd()) {
    s = "netbsd";
  } else {
    s = "wut";
  }
  lua_pushstring(L, s);
  return 1;
}

@pkulchenko
Copy link
Collaborator

Does Lua change its behavior based on that?

It does not, as far as I know.

I want the portability guff to be abstracted by the POSIX interfaces. How about we just add an API to redbean. What do you think of:

Yes, this should work. Would it be better to have GetHostOs with that logic and then wrap it into LuaGetHostOs in case other languages want to use this as well? Or maybe we can cross that bridge when we get there ;).

@pkulchenko
Copy link
Collaborator

Also discussed in #137. As far as I understand, the approach discussed earlier in this ticket and in #137 are slightly different: here @jart is proposing to tweak the binary (by either Python or Lua code that is executed before the library is loaded), so that it can be used by the system loader and in #137 I'm proposing to implement the actual loader that would load ape-based dynamic libraries directly (possibly from memory in addition to the file). This would provide cross-platform support for dynamic libraries, but they would obviously need to be recompiled using the cosmopolitan libraries.

@ahgamut
Copy link
Collaborator

ahgamut commented Aug 25, 2021

I have been trying to find a nice way to get numpy into the Python APE for a while now. Since @jart mentioned numpy in #141, here I outline my partial solution for Python extensions, in case someone wants to try.

The current build process in Cosmopolitan results in python.com and libpython.a. Here's how custom extensions can be added statically (we can use setuptools or a smaller tool to convert into Makefile):

  1. download and compile the required extension from source, with Cosmopolitan's build constraints
  2. add the compiled parts of the extension to libpython.a via ar
  3. add the Python parts of the extension to the APE ZIP store
  4. write minimal necessary glue code to ensure the relative imports work correctly
    (stuff like from .extension import compiled_func)
  5. re-link python.com with the new libpython.a and use the extension.

This works for simple CPython extensions: I added the greenlet and markupsafe modules to the Python APE following these steps. I don't see any reason why it shouldn't work for something like numpy.

The only issue is step 4. Since Python allows relative imports inside a package, splitting the C and Python parts of a library means I have to ensure that such imports are handled correctly. This usually means creating .pyi-like stub files, and redirecting the imports to a correctly named C extension. It's easy to do by hand for something like greenlet, numpy would just require more files to be changed.

However, I think the entire process can be automated by correctly customizing the default build process in Python's setuptools.
I have been trying to write the setuptools customization for a while on the weekends now, but nowhere close to figuring it out.

It would be pretty cool if this was solved: we could create a Python APE, do python.com -m pip install -r requirements.txt, wait while the new extensions got compiled, and then use the larger APE with all the necessary packages available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome We'll commit to review and maintenance if the people who need it write the changes. jovian scope This is great but would be a huge undertaking and we have limited resources
Projects
None yet
Development

No branches or pull requests

4 participants