ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer #1220

pcmoritz · 2017-10-20T18:50:55Z

This uses the NumPyBuffer built into Arrow's Tensor facility to protect the numpy arrays holding the Tensors to be serialized. See also the problem description in https://issues.apache.org/jira/browse/ARROW-1695.

wesm

+1. So to confirm I understand what is going on -- the custom serializer was producing a temporary NumPy array which was being decref'd in a ScopedRef or OwnedRef before it was able to get boxed properly in an arrow::Tensor. Right?

pcmoritz · 2017-10-20T19:13:14Z

Yes, this is correct! It is decref'd explicitly here: https://github.com/pcmoritz/arrow/blob/7e23bb5e7cd666e311595f76d60dbd08bf71920e/cpp/src/arrow/python/python_to_arrow.cc#L674

pcmoritz · 2017-10-20T21:26:15Z

The test failure is in the go bindings and unrelated to this PR (cc @kou are you aware of this problem?). I'll merge this since I'd like to do some follow up work that needs this fix.

robertnishihara · 2017-10-20T21:28:20Z

python/pyarrow/tests/test_serialization.py

+        custom_serializer=serialize_dummy_class,
+        custom_deserializer=deserialize_dummy_class)
+
+    pa.serialize(DummyClass())


Should also deserialize the object and assert that it is the correct value.

Oops too late. This is a regression test to catch the segfault which used to already occur in the serialize call, so only testing that is fine; the actual codepath of serializing a numpy array via custom serializer will be tested in the PR I'm about to create, which is about serializing pytorch Tensors.

You want to handle pytorch tensors in Arrow? As opposed to in Ray?

Yes, I'd like to handle them in Arrow conditionally on pytorch being installed, much in the same way as we handle pandas. There is precedent for that, the glib bindings do it for raw lua torch tensors I think.

Ok, sounds good.

kou · 2017-10-22T06:43:29Z

The test failure is in the go bindings and unrelated to this PR

I created #1234 to fix it.

pcmoritz · 2017-10-22T06:58:13Z

Awesome, thanks :)

fix handling of numpy arrays generated in the custom serializer methods

dce92ad

pcmoritz changed the title ~~ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serialializer~~ ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer Oct 20, 2017

fix linting

7e23bb5

pcmoritz force-pushed the fix-serialize-tensors branch from 9d87906 to 7e23bb5 Compare October 20, 2017 18:56

wesm approved these changes Oct 20, 2017

View reviewed changes

asfgit closed this in 971e99d Oct 20, 2017

robertnishihara reviewed Oct 20, 2017

View reviewed changes

robertnishihara deleted the fix-serialize-tensors branch October 20, 2017 21:34

robertnishihara mentioned this pull request Oct 20, 2017

Segfault when using custom serializer that defines and returns an object. ray-project/ray#1138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer #1220

ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer #1220

pcmoritz commented Oct 20, 2017 •

edited

Loading

wesm left a comment

pcmoritz commented Oct 20, 2017

pcmoritz commented Oct 20, 2017 •

edited

Loading

robertnishihara Oct 20, 2017

pcmoritz Oct 20, 2017

robertnishihara Oct 20, 2017

pcmoritz Oct 20, 2017

robertnishihara Oct 20, 2017

kou commented Oct 22, 2017

pcmoritz commented Oct 22, 2017

ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer #1220

ARROW-1695: [Serialization] Fix reference counting of numpy arrays created in custom serializer #1220

Conversation

pcmoritz commented Oct 20, 2017 • edited Loading

wesm left a comment

Choose a reason for hiding this comment

pcmoritz commented Oct 20, 2017

pcmoritz commented Oct 20, 2017 • edited Loading

robertnishihara Oct 20, 2017

Choose a reason for hiding this comment

pcmoritz Oct 20, 2017

Choose a reason for hiding this comment

robertnishihara Oct 20, 2017

Choose a reason for hiding this comment

pcmoritz Oct 20, 2017

Choose a reason for hiding this comment

robertnishihara Oct 20, 2017

Choose a reason for hiding this comment

kou commented Oct 22, 2017

pcmoritz commented Oct 22, 2017

pcmoritz commented Oct 20, 2017 •

edited

Loading

pcmoritz commented Oct 20, 2017 •

edited

Loading