Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(add): numpy ndarray--template-less init #255

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Khushiyant
Copy link

@Khushiyant Khushiyant commented Sep 10, 2024

Initially following example use to raise error:

Example

from cppyy.gbl.std import vector
import numpy as np

def test():
    arr = np.random.randint(0, 10, size=(2,3,4))
    v1 = vector(arr)
    print(v1)
test()

Error:

Traceback (most recent call last):
  File "/workspaces/codespaces-blank/test.py", line 12, in <module>
    test()
  File "/workspaces/codespaces-blank/test.py", line 9, in test
    v1 = vector(arr)
         ^^^^^^^^^^^
  File "/workspaces/codespaces-blank/.venv/lib/python3.12/site-packages/cppyy/_cpython_cppyy.py", line 117, in __call__
    if args0 and (type(args0) is tuple or type(args0) is list):
       ^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Hence, it was not possible to initialise the vector from numpy array without defining vector template and numpy array dtype

Current support

from cppyy.gbl.std import vector
import numpy as np

def test():
    arr = np.random.randint(0, 10, size=(2,3,4,2), dtype=np.int32)
    v1 = vector[vector[vector[vector[int]]]](arr)
    print(v1)
test()

It is not a feasible option to define the vector templates at time of writing code especially in with usage with various ML backends

CPyCppyy#28 was the cpp side implementation to provide this support but cppyy frondend implementation is better approach in comparison

tests: add template and dtpye less tests

chore: remove cppyy import

chore: remove numpy--import

recover: deleted valgrind--file
@@ -755,4 +755,4 @@
fun:_rl_init_terminal_io
obj:/lib/libreadline.so.4.3
fun:rl_initialize
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed?

):
t = args0.dtype.type.__name__
if t.startswith("int"):
t = "int"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incorrect, float32 should map to a float type. This seems to be something that cppyy already does at a later stage in the PyBuffer interface for numpy arrays. Why do we type translate here?

and type(args0).__name__ == "ndarray"
and hasattr(args0, "dtype")
):
t = args0.dtype.type.__name__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this makes dtype optional if not passing the template. but required if you pass the template type. Which are two independent things.

Whether you specify dtype or not, the array always has a dtype that is either default or deduced based on the type of values. Current support always requires specifying a template type and hence the compatible type of numpy array, which is the expected support.

This works:

a = np.array([1, 2, 3], dtype= np.int32)
b = vector[int](a)

This doesn't:

a = np.array([1, 2, 3], dtype= np.int32)
b = vector(a)

Now the TypeError you attempt to solve is not a result of dtype being missing. I believe you use something like:

 a = np.array([1, 2, 3])
 b = vector['int'](a)

And the resulting type error as an example.

But this has nothing to do with dtype. The TypeError we were seeing (in our meeting with @vgvassilev) is because numpy defaults to np.int64 in the absence of dtype which requires long in the template argument. This is an expected error and can be fixed with:

b = vector['long'](a)

What I would suggest here is that if we want to drop the requirement for template args for std.vector (which is a question for @wlav) then we need a robust mapping from the dtype to equivalent C++ types, in order for something like:

 a = np.array([1, 2, 3])
 b = vector(a)

to work. In my opinion, this type requirement mixup should be dropped as the current support enforces the most typesafety, and can lead to failures when passing a vector of objects in which case you have to specify the class type as a template param for the overload selection to work.

The rest of the PR looks good in terms of removing the requirement to nest template types for nested std:: vectors initialized from multidimensional numpy arrays

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would suggest here is that if we want to drop the requirement for template args for std.vector

Not requiring template args for vector came about b/c since C++17 it's legal not to do that in C++. That is, something like this "auto v = std::vector({1, 2, 3});" is legal C++ and the argument was that Python should be at least as simple. (It isn't really, b/c C++ literals are a richer type set than Python literals, but resolving basic cases was/is simple enough.)

@Khushiyant Khushiyant requested a review from aaronj0 October 22, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants