Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add unsafe_ptr and unsafe_cstr_ptr to StringRef and migrate the stdlib to use them #3601

Open
1 task done
soraros opened this issue Oct 3, 2024 · 10 comments
Labels
enhancement New feature or request mojo-repo Tag all issues with this label

Comments

@soraros
Copy link
Contributor

soraros commented Oct 3, 2024

Review Mojo's priorities

What is your request?

As title. In the process, maybe also rename field data to _data.

What is your motivation for this change?

They are needed in e.g. Python interop.

Any other details?

N/A

@soraros soraros added enhancement New feature or request mojo-repo Tag all issues with this label labels Oct 3, 2024
@martinvuyk
Copy link
Contributor

AFAIK we are migrating away from StringRef in favor of StringSlice which has a lifetime, so the goal is to actually replace ._strref_dangerous() usage with .as_string_slice() by migrating the methods.

@soraros
Copy link
Contributor Author

soraros commented Oct 3, 2024

@martinvuyk Indeed, we should migrate most things to StringSlice.

However, I suspected StringRef won't go away in the short term for it's an interface type to C++/CPython. For instance, the type passed to get_function isn't exactly right since PyImport_AddModule takes char* (UnsafePointer[Int8]) but str_ref.data has type UnsafePointer[UInt8].

fn PyImport_AddModule(inout self, name: StringRef) -> PyObjectPtr:
var value = self.lib.get_function[
fn (UnsafePointer[UInt8]) -> UnsafePointer[Int8]
]("PyImport_AddModule")(name.data)
return PyObjectPtr {value: value}

@martinvuyk
Copy link
Contributor

Couldn't we just add a .unsafe_cstr_ptr() to StringSlice then ? It is literally just return self.unsafe_ptr().bitcast[Int8]()

I suspected StringRef won't go away in the short term

FWIW I'm quite aggressively trying to do just that in:

  • .strip already does not depend on StringRef though the implementation is still there since the PR where I added _isspace()
  • preparing for also moving String.split() (and deleting the unused just merged StringRef.split()) to StringSlice with PR [stdlib] Make StringSlice CollectionElement #3597

and the latest and most impactful

  • PR [stdlib] Move StringRef find() implementation to Span #3548 moves .find() to Span, replaces uses of StringRef for StringSlice for .count, .startswith, .endswith which are literally all of the functions left for StringRef except the helpers for slicing. The biggest dependence to StringRef is really the .find() function.

@soraros
Copy link
Contributor Author

soraros commented Oct 4, 2024

Couldn't we just add a .unsafe_cstr_ptr() to StringSlice then ?

Yeah, we should probably do that as well.

I suspected StringRef won't go away in the short term

This doesn't even have much to do with the available API. Moving away from StringRef also means we need to figure out the correct lifetime for each of the external_calls, which isn't as mechanical as in user-facing code.

In any case, I think migrating to StringSlice while making QoL improvements to StringRef are not mutually exclusive.

@martinvuyk
Copy link
Contributor

Moving away from StringRef also means we need to figure out the correct lifetime for each of the external_calls, which isn't as mechanical as in user-facing code.

If you mean interfacing with C then my answer is that you'd simply use UnsafePointer directly. Your above example would become: (or, pass a String and have null terminator guarantees since this debug_assert still leaves the door open for memory vulnerabilities.)

fn PyImport_AddModule(inout self, name: StringSlice) -> PyObjectPtr:
    var ptr = name.unsafe_cstr_ptr()
    debug_assert(ptr[name.byte_length()] == 0, "slice must be null terminated")
    var value = self.lib.get_function("PyImport_AddModule")(ptr) 
    return PyObjectPtr {value: value} 

and we still benefit from borrowed argument lifetime guarantees (what you taught me :D)

@soraros
Copy link
Contributor Author

soraros commented Oct 4, 2024

It all technically can™ be done. It's just the said function is an internal function that I don't care to fix all the call-site right now. One problem though is that fn f(s: StringSlice = "lit") doesn't work.

If you are interested, maybe you can look into having the methods on Python to take Strings instead of StringRef (I don't know if it's the right design though).

@martinvuyk
Copy link
Contributor

martinvuyk commented Oct 4, 2024

One problem though is that fn f(s: StringSlice = "lit") doesn't work.

StringSlice has a constructor for StringLiteral, I have a test here where I'm doing that. Or is it an issue specifically about it not working for function signatures?

If you are interested, maybe you can look into having the methods on Python to take Strings instead of StringRef (I don't know if it's the right design though).

I'm totally up for changing it to StringSlice/String, but I'm already overloading the team with my 8+ open 100+ line PRs 🤣

@martinvuyk
Copy link
Contributor

I've been working on some ffi stuff in PR #3632 and I've introduced a helper function that I think makes more sense than adding public APIs to types that are used only for low level things (I'd actually like to deprecate any use of .unsafe_cstr_ptr() in a follow up PR) :

trait _UnsafePtrU8:
    fn unsafe_ptr(self) -> UnsafePointer[UInt8]:
        ...


@always_inline
fn char_ptr[T: _UnsafePtrU8](item: T) -> UnsafePointer[c_char]:
    """Get the C.char pointer.

    Parameters:
        T: The type.

    Args:
        item: The item.

    Returns:
        The pointer.
    """
    return item.unsafe_ptr().bitcast[c_char]()


@always_inline
fn char_ptr[T: AnyType](ptr: UnsafePointer[T]) -> UnsafePointer[c_char]:
    """Get the C.char pointer.

    Parameters:
        T: The type.

    Args:
        ptr: The pointer.

    Returns:
        The pointer.
    """
    return ptr.bitcast[c_char]()

@soraros
Copy link
Contributor Author

soraros commented Oct 10, 2024

@martinvuyk I like this approach. Could you make the change into its own PR (and maybe rename to c_char_ptr since we can afford to have ugly names for these things).

@martinvuyk
Copy link
Contributor

Could you make the change into its own PR (and maybe rename to c_char_ptr since we can afford to have ugly names for these things).

Yeah I think you're right this will be better to separate to another PR, also on the name since this is in ffi I just copy pasted from my WIP c ffi package which I'd like to bring over to the stlib once we have the top level ffi package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

2 participants