Load data that contain pytorch tensors #739
-
Hi, I'm using heterogeneous data structures to store my experiment results and I prefer to use data = [
dict(
x='abc',
y=torch.tensor([1.0, 2.0]),
),
dict(
x='efg',
y=torch.tensor([3.0, 4.0]),
),
]
ak.from_iter(data)
# ValueError: cannot convert tensor(1.) (type Tensor) to an array element Best regards, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
My solution so far: def deepmap(fun, data):
if isinstance(data, (list, tuple, set, frozenset)):
return type(data)(to_numpy(x) for x in data)
if isinstance(data, dict):
return {key: to_numpy(x) for key, x in data.items()}
return fun(data)
to_numpy = partial(deepmap, lambda x: x.numpy() if isinstance(x, torch.Tensor) else x)
ak.from_iter(to_numpy(data)) |
Beta Was this translation helpful? Give feedback.
-
We intend to write code that recognizes all the major array-like types, and currently we're working on JAX, not pytorch yet. This Consortium for Python Data API Standards may help make that a one step thing. For the moment, all of these array like types can be converted to NumPy, albeit manually as in your solution, and NumPy arrays can be viewed directly as Awkward Arrays. There is some documentation on that. Passing a numpy array directly to the ak.Array constructor or (equivalently but more explicitly) the ak.from_numpy function would avoid a conversion of the data into and back out of python objects, which can be slow and use up memory. Don't use ak.from_iter unless your data are already Python objects, which they are not in this case. Other than that, your solution of deeply iterating over the Python dicts containing pytorch Tensors is a good one. That has to be explicit, because only you know how you have structured the data at that level. That can be considered just a normal part of Python bookkeeping. |
Beta Was this translation helpful? Give feedback.
We intend to write code that recognizes all the major array-like types, and currently we're working on JAX, not pytorch yet. This Consortium for Python Data API Standards may help make that a one step thing.
For the moment, all of these array like types can be converted to NumPy, albeit manually as in your solution, and NumPy arrays can be viewed directly as Awkward Arrays. There is some documentation on that. Passing a numpy array directly to the ak.Array constructor or (equivalently but more explicitly) the ak.from_numpy function would avoid a conversion of the data into and back out of python objects, which can be slow and use up memory. Don't use ak.from_iter unless your data are alre…