Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Mojo-Python slice interoperability #3549

Closed
wants to merge 1 commit into from

Conversation

jjvraw
Copy link
Contributor

@jjvraw jjvraw commented Sep 25, 2024

Leveraging Mojo's Slice for more natural interop between PythonObjects, enabling something like below:

a = PythonObject([1, 2, 3, 4])
a_reversed = a[::-1]
  • We leave C Python to handle slice parameters.
  • Negative cases such as b[1.3:10] or b["1":10] are handled by parser which would normally throw a TypeError in Python.

@jjvraw jjvraw requested a review from a team as a code owner September 25, 2024 11:48
Copy link
Contributor

@martinvuyk martinvuyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jjvraw this is awesome, thank you for tackling it!

assert_equal("[]", str(a[5:]))
assert_equal("[5, 4, 3, 2]", str(a[:-5:-1]))

var b = Python.evaluate("[i for i in range(1000)]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea we could already do this 🤯

_ = d[1:3]

var custom = Python.evaluate(
"type('CustomSliceable', (), {'__getitem__': lambda self, key: key})()"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done and seen some shady metaclass manipulation in Python but I've never seen this manual use of type 🤣

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shady indeed: SO post

Comment on lines 124 to 177

```mojo
a = PythonObject([1, 2, ,3, 4])
a_reversed = a[::-1]
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (changelog file is very long already):

Suggested change
```mojo
a = PythonObject([1, 2, ,3, 4])
a_reversed = a[::-1]
```
```mojo
print(PythonObject([1, 2, ,3, 4])[::-1]) # [4, 3, 2, 1]

@jjvraw
Copy link
Contributor Author

jjvraw commented Oct 1, 2024

Going to mark as draft, and wait for PyObject_GetItem of #3583 to be merged.

@jjvraw jjvraw marked this pull request as draft October 1, 2024 09:56
@JoeLoser
Copy link
Collaborator

JoeLoser commented Oct 2, 2024

Going to mark as draft, and wait for PyObject_GetItem of #3583 to be merged.

#3583 has been merged, so feel free to rebase whenever it's convenient for you.

@jjvraw
Copy link
Contributor Author

jjvraw commented Oct 2, 2024

Going to mark as draft, and wait for PyObject_GetItem of #3583 to be merged.

#3583 has been merged, so feel free to rebase whenever it's convenient for you.

Made some additional changes. Since the compiler doesn't implicitly handle Mojo-Python conversion for slice, and the syntax [0:1, 3:] is probably not recognised, this gives the user some play with creating slice PythonObjectss to comma seperate indices.

Can always revert these, however.

@jjvraw jjvraw marked this pull request as ready for review October 2, 2024 15:58
Comment on lines +543 to +584
with_2d = Python.evaluate(
"""type('With2D', (), {
'__init__': lambda self: setattr(self, 'data', [[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
'__getitem__': lambda self, key: (
[row[key[1]] for row in self.data[key[0]]] if isinstance(key, tuple) and all(isinstance(k, slice) for k in key)
else (self.data[key[0]][key[1]] if isinstance(key, tuple)
else self.data[key])
)
})()"""
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would integrating something like stdlib/test/python/my_module.py be preferred over this..? @JoeLoser

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring these sort of things to a separate test file (class with methods like you have here) may improve readability over raw strings. Feel free to defer to a follow-up though :)

modularbot pushed a commit that referenced this pull request Oct 4, 2024
…r handling (#48447)

[External] [stdlib] Integrate `PyObject_SetItem` for speedup and error
handling

Following the trend of:
- #3549
- #3583

Again, noticing ~1.4x speedup on basic benchmarks:
```mojo
for _ in range(iterations):
    var idx_a = random_si64(0, 999)
    var idx_b = random_si64(0, 9999)
    var idx_c = random_si64(0, 99999)

    a[idx_a] = idx_a
    b[idx_b] = idx_b
    c[idx_c] = idx_c
```

and improved error handling, preventing crashes.

Co-authored-by: Joshua James Venter <venter.joshua@gmail.com>
Closes #3595
MODULAR_ORIG_COMMIT_REV_ID: 107c0b478171f86869b7cb94e57f5ea59d83e994
self.Py_DecRef(py_step)

return py_slice

Copy link
Collaborator

@ConnorGray ConnorGray Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion This file (_cpython.mojo) is mostly intended to be low level wrappers around functions in the CPython CAPI. Since PySlice_FromSlice isn't a CPython function, I think the logic from this method might fit better manually inlined in the PythonObject initializer:

struct PythonObject(..):
    # ...

    fn __init__(inout self, slice: Slice):
        ...

Additionally, instead of calling low-level methods like PyLong_FromLong() or Py_None() directly, the code might be improved by using the existing higher level safe wrappers for those functions (provided by PythonObject as well). So the final code might look something like:

    fn __init__(inout self, slice: Slice) -> PyObjectPtr:
        # Convert Mojo Slice to Python slice parameters
        # Note: Deliberately avoid using `span.indices()` here and instead pass
        # the Slice parameters directly to Python. Python's C implementation
        # already handles such conditions, allowing Python to apply its own slice
        # handling.
        var py_start = PythonObject(None)
        var py_stop = PythonObject(None)
        var py_step = PythonObject(slice.step)

        if slice.start:
            py_start = PythonObject(slice.start.value())
        if slice.end:
            py_stop = PythonObject(slice.end.value())

        var py_slice = cpython.PySlice_New(
            py_start.unsafe_as_py_object_ptr(),
            py_stop.unsafe_as_py_object_ptr(),
            py_step.unsafe_as_py_object_ptr()
        )

        return PythonObject(py_slice)

This has the advantage of allowing us to omit the manual ref count management logic as well 🙂

Let me know what you think, or if there's a subtlety I missed about how you're using that means this might not be feasible for your use-case.


For context, very roughly speaking, there are currently three files in the stdlib pertaining to interop, roughly in order from low-level to high-level:

  • _cpython.mojo — low-level, "raw" bindings to the CPython API. In some future world, these might even become auto-generated bindings. They should have minimal logic.

  • python_object.mojo — defines PythonObject, which interfaces between Mojo types and valid Python object values.

  • python.mojo — high-level, safe wrappers around functionality from the CPython API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for the time and feedback! :)

Let me know what you think, or if there's a subtlety I missed about how you're using that means this might not be feasible for your use-case.

I originally wanted to avoid PythonObject within __getitem__, as i noticed marginal gains. This was the intent of PySlice_FromSlice, to construct a PyObjectPtr. However, I overlooked the fact that the other From* functions are actually part of the C API.

I agree with your point about maintaining the minimalism of _cpython.mojo. What do you think of refactoring PySlice_FromSlice not inline of __init__ but rather as a helper within python_object.mojo/python.mojo? This way we can maintain PyObjectPtr within __getitem__.

Additionally, instead of calling low-level methods like PyLong_FromLong() or Py_None() directly, the code might be improved by using the existing higher level safe wrappers for those functions (provided by PythonObject as well). So the final code might look something like:

On this note, I noticed what seems to be lifetime issues. To apply this suggestion, I would have to do something like the following:

  ...
        
  cpython = _get_global_python_itf().cpython()
  var py_slice = cpython.PySlice_New(
    py_start.unsafe_as_py_object_ptr(),
    py_stop.unsafe_as_py_object_ptr(),
    py_step.unsafe_as_py_object_ptr()
  )
    
  _ = py_start^    
  _ = py_stop^    
  _ = py_step^    
  
  return py_slice

This might be worth exploring, following 400866c

@ConnorGray
Copy link
Collaborator

Hey @jjvraw thanks for tackling this, and all your related work in this area recently 🙂

I did a quick pass and made a couple of suggestions for some possible improvements. Let me know what you think, or if there's anything I might have missed.

Thanks again for the 🔥 contributions, this is great polish work that will make interop that much more pleasant and intuitive to use 🙂

@jjvraw
Copy link
Contributor Author

jjvraw commented Oct 5, 2024

Hey @jjvraw thanks for tackling this, and all your related work in this area recently 🙂

I did a quick pass and made a couple of suggestions for some possible improvements. Let me know what you think, or if there's anything I might have missed.

Thanks again for the 🔥 contributions, this is great polish work that will make interop that much more pleasant and intuitive to use 🙂

Hey Connor. No problem :)

Please see my comment, and changes of the last commit 39a9e23. Note that I don't have a strong opinion on these, and happy to apply changes.

martinvuyk pushed a commit to martinvuyk/mojo that referenced this pull request Oct 6, 2024
…r handling (#48447)

[External] [stdlib] Integrate `PyObject_SetItem` for speedup and error
handling

Following the trend of:
- modularml#3549
- modularml#3583

Again, noticing ~1.4x speedup on basic benchmarks:
```mojo
for _ in range(iterations):
    var idx_a = random_si64(0, 999)
    var idx_b = random_si64(0, 9999)
    var idx_c = random_si64(0, 99999)

    a[idx_a] = idx_a
    b[idx_b] = idx_b
    c[idx_c] = idx_c
```

and improved error handling, preventing crashes.

Co-authored-by: Joshua James Venter <venter.joshua@gmail.com>
Closes modularml#3595
MODULAR_ORIG_COMMIT_REV_ID: 107c0b478171f86869b7cb94e57f5ea59d83e994

Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@JoeLoser JoeLoser assigned ConnorGray and unassigned jackos Oct 13, 2024
@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Oct 14, 2024
Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>
@jjvraw
Copy link
Contributor Author

jjvraw commented Oct 16, 2024

Not sure if this has been worked on internally or not, but a few conflicts came up with #3160 and #3639. Have pushed the changes required.

@JoeLoser
Copy link
Collaborator

!sync

@JoeLoser
Copy link
Collaborator

Not sure if this has been worked on internally or not, but a few conflicts came up with #3160 and #3639. Have pushed the changes required.

It was failing CI, but I'm taking a look today at it. Thanks for the ping - just re-synced.

@jjvraw
Copy link
Contributor Author

jjvraw commented Oct 16, 2024

It was failing CI, but I'm taking a look today at it. Thanks for the ping - just re-synced.

Hmm, well both of the mentioned PRs broke the prior implementation--not sure if these were imported first.

Comment on lines +543 to +584
with_2d = Python.evaluate(
"""type('With2D', (), {
'__init__': lambda self: setattr(self, 'data', [[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
'__getitem__': lambda self, key: (
[row[key[1]] for row in self.data[key[0]]] if isinstance(key, tuple) and all(isinstance(k, slice) for k in key)
else (self.data[key[0]][key[1]] if isinstance(key, tuple)
else self.data[key])
)
})()"""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring these sort of things to a separate test file (class with methods like you have here) may improve readability over raw strings. Feel free to defer to a follow-up though :)

var d = Python.dict()
d["a"] = 1
d["b"] = 2
with assert_raises(contains="slice(1, 3, None)"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this fails internally in CI since it says unhashable type: 'slice'. I haven't looked closely as to why it fails internally but passes in OSS CI (perhaps different CPython versions as we're using 3.11 internally it looks like primarily). I disabled the test so I can merge this, but we should look into it as a follow-up.

@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Oct 16, 2024
@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Oct 16, 2024
@modularbot
Copy link
Collaborator

Landed in a41934b! Thank you for your contribution 🎉

modularbot pushed a commit that referenced this pull request Oct 16, 2024
[External] [stdlib] Mojo-Python slice interoperability

Leveraging Mojo's `Slice` for more natural interop between
`PythonObject`s, enabling something like below:

```mojo
a = PythonObject([1, 2, 3, 4])
a_reversed = a[::-1]
```

- We leave C Python to handle slice parameters.
- Negative cases such as `b[1.3:10]` or `b["1":10]` are handled by
parser which would normally throw a `TypeError` in Python.

Co-authored-by: Joshua James Venter <venter.joshua@gmail.com>
Closes #3549
MODULAR_ORIG_COMMIT_REV_ID: d88cc6a73af06bfb317d6d993611f013c706bce7
@modularbot modularbot closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants