Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python and C3 #91

Open
darleybarreto opened this issue Jun 15, 2020 · 9 comments
Open

Python and C3 #91

darleybarreto opened this issue Jun 15, 2020 · 9 comments

Comments

@darleybarreto
Copy link
Contributor

First I would like to thank everyone who has contributed to this work, it is really impressive. I wonder if there's a way of linking a python function in a c3 program. I was trying to do this on IR level, but it seems it doesn't work. For example:

def concat(s1: str, s2:str) -> str:
    return s1 + s2
import io;

public function void main(){
    var string res = concat("I","am");
    io.println(res);
}

I want to play around making subset of python code and compiling it to interact with the system, which C3 makes possible.

@windelbouwman
Copy link
Owner

Thanks for your interest in this project!

There is several options here. First of, you could compile python to IR-code, and C3 to IR-code and then link this IR-code, using ir_link. Next, you could take this IR code and translate it to machine code.

Another option would be to compile the C3 code to machine code and dynamically link it into the running python process. This allows to specify python callbacks as well.

Things to note here, are the usage of str will not work. This is not implemented yet, only float and int will work when calling python functions. str will require some extra attention to convert the string from python str to a pointer to chars or something alike like a pascal like string.

I hope this helps a bit?

@darleybarreto
Copy link
Contributor Author

I see, I didn't know the ir_link function. How difficult (in terms of changes and new code) would be work with the ctypes (or cffi) to use strings from and to python?

@windelbouwman
Copy link
Owner

There are several issues with str:

  • python to ir compilation --> for str type this is not implemented. This might be tricky part. First thing which has to be decided is how to represent strings in memory. Probably the same way as either C (a char* to a 0 terminated buffer), or the C3 way, a pointer to a struct with a length integer and a char buffer.
  • for linking python functions with native code, a ctypes scheme must be implemented, this might be not that hard to do, but also requires handling of the specific string storage scheme. The most straight forward way would be to implement str as a char *. For this, it would be wise to first try link and load C code instead of C3, since C is more widely used, and answers can be found online.
  • Then there is unicode, python str is unicode, whereas C3 and C use ascii only. Solution here is to go for ascii for now?

A good place to start might be this part of the code --> https://github.com/windelbouwman/ppci/blob/master/ppci/utils/codepage.py#L108

@pfalcon
Copy link
Contributor

pfalcon commented Jun 15, 2020

I'd suggest to skip C3 altogether and use C instead. As @windelbouwman suggests, sticking to int for starters may be a good idea (I'm glad to hear that float works too, but I wouldn't run to test it right away ;-) ).

I myself interested to do more hacking on Python subset compiler, and what I would do is: implement print_int(x: int) function, as without output, hacking on this stuff is indeed not rewarding. My next idea was to make for loop compliant to Python semantics (assignment to loop control variable doesn't affect looping).

Sadly, I'm too full of ideas and too short of time. So, @darleybarreto, please see if ideas above resonate with you, and if so, feel free to beat be on that.

use strings from and to python?

Manipulating string generally means GC. Surely, trivial operations on static strings (like passing them to print()) can be implemented ahead of that. I'd personally still start with printing int's.

@darleybarreto
Copy link
Contributor Author

I was playing with @pfalcon's picompile for a couple of hours and I managed to get some basic str compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType). I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 16, 2020

Hey, cool! Except it's not mine, but a humble fork of https://github.com/sdiehl/numpile, to which I didn't yet even apply any interesting changes. First would be getting rid of intermediate AST, and instead do type inferencing and other processing on real Python AST, because this intermediate AST makes toy processing, like done by numpile, easier, but only complicates further extension.

But if you think about type inference, you immediately think about hilarious case of Shedskin, which can't grok the following:

a = 1
a = "str"

Obviously, there's nothing wrong with the above, and type of a isn't int | str either. It's just first a has type of int, while second a - str. That leads us to pfalcon/python-ast-hacking-challenges#2 . And well, as soon one touches SSA, there're enough rabbit holes to follow, for example, in my mind, I'm doing register allocation on SSA (instead of converting Python source to it) :-D.

for a couple of hours and I managed to get some basic str compilation down to LLVM using llvmlite's ir.Constant(ir.ArrayType)

Well, cool, you've got some experience with LLVM API. But you can't get around need for garbage collection when dealing with strings, it's not a value type. So, as soon as you get to:

a = "foo"
b = "bar"
c = a + b

- - you'll need to deal with it. Bu otherwise yes, a nice easy start.

I imagine that the fast way would be making something working with ctypes/cffi (no GC here) for simple operations like passing to functions (e.g. to print), casting, and concatenating.

"Fast way" in which sense? It won't lead to fast or unbloated code, and would be a chore to code up, given that it's largely a throw-away (YMMV) exercise (at least for this usecase). Unless it's already coded up, and it seems that PPCI already supports it, even without ctypes/cffi being exposed (I guess ppci/utils/codepage.py#L108 quoted by @windelbouwman is the underlying impl): https://ppci.readthedocs.io/en/latest/howto/jitting.html#calling-python-functions-from-native-code ;-)

@darleybarreto
Copy link
Contributor Author

I'm doing register allocation on SSA (instead of converting Python source to it) :-D.

Interesting, although I have to say I don't know much of SSA stuff.

But you can't get around need for garbage collection when dealing with strings, it's not a value type.

I thought on loading and linking .so's of malloc and realloc for loading strings and concatenation.

"Fast way" in which sense?

In the sense of getting something working.

@pfalcon
Copy link
Contributor

pfalcon commented Jun 17, 2020

In the sense of getting something working.

Makes sense, please keep us posted of your progress!

@darleybarreto
Copy link
Contributor Author

So, instead of dealing with ctypes, what if we use an independent rust shared lib with #[no_std]? I made a simple working example where one could load a perfect utf-8 string to rust and receive it back:

from cffi import FFI
ffi = FFI()
ffi.cdef("""
    char * load_str(char *);
    void free_str(char *);
""")
p_str = "I will go down to the rabbit hole!".encode("utf-8")
size = len(p_str)
lib = ffi.dlopen("path/to/rustlib.so")
pointer = lib.load_str(p)
p_str_2 = ffi.buffer(pointer,size)[:].decode("utf-8") # a perfect string
lib.free_str(pointer) # freeing 

In this example, the load_str is

use core::{ptr,str};
use ustr::Ustr;
use cstr_core::{CString, CStr, c_char};

#[no_mangle]
pub extern "C" fn load_str(d_ptr: *mut c_char) -> *mut c_char {
    // Based on https://bheisler.github.io/post/calling-rust-in-python/

    if d_ptr.is_null() {
        return ptr::null_mut();
    }
    let data = unsafe { CStr::from_ptr(d_ptr).to_bytes() };
    
    match str::from_utf8(data){
        Ok(data_str) => {
            let gc_string = Ustr::from(data_str);
            CString::new(gc_string.as_str()).unwrap().into_raw()
        },
        Err(e) => {
            println!("Error while converting raw pointer back to str: {}", e);
            ptr::null_mut()
        },
    }
}

The lib in question would do the whole work as a simple wrapper to rust's String capabilities, also we could use things like ustr which enables caching and fun stuff such as concurrency safety.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants