Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix as_tensor not keeping the parent alive in Python #60

Merged
merged 4 commits into from
Jul 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 40 additions & 8 deletions dali/pipeline/data/tensor_list.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,12 @@ typedef vector<Index> Dims;
template <typename Backend>
class DLL_PUBLIC TensorList : public Buffer<Backend> {
public:
DLL_PUBLIC TensorList() : layout_(DALI_NHWC) {}
DLL_PUBLIC ~TensorList() = default;
DLL_PUBLIC TensorList() : layout_(DALI_NHWC),
tensor_view_(nullptr) {}

DLL_PUBLIC ~TensorList() {
delete tensor_view_;
}

/**
* @brief Resizes this TensorList to match the shape of the input.
Expand Down Expand Up @@ -104,6 +108,10 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
}
DALI_ENFORCE(new_size >= 0, "Invalid negative buffer size.");

// Tensor view of this TensorList is no longer valid
delete tensor_view_;
tensor_view_ = nullptr;

// Resize the underlying allocation and save the new shape
ResizeHelper(new_size);
shape_ = new_shape;
Expand All @@ -116,15 +124,19 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
*
* When this function is called, the calling object shares the
* underlying allocation of the input TensorList. Its size, type
* and shape are set to match the calling TensorList. While this
* list shares data with another list, 'shares_data()' will
* and shape are set to match the calling TensorList. While this
* list shares data with another list, 'shares_data()' will
* return 'true'.
*/
DLL_PUBLIC inline void ShareData(TensorList<Backend> *other) {
DALI_ENFORCE(other != nullptr, "Input TensorList is nullptr");
DALI_ENFORCE(IsValidType(other->type_), "To share data, "
"the input TensorList must have a valid data type");

// Tensor view of this TensorList is no longer valid
delete tensor_view_;
tensor_view_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if python will handle this, what if we call AsTensor, return pointer, call ShareData, then pointer we just returned to python is no longer valid but python still can reference it, or I'm wrong?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much longer discussion - in order to handle this we would need to store information about all the objects sharing data with a particular tensor or tensorlist and notify them that the data is no longer valid. This would potentially have performance implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean. I was referring to the other problem with sharing data. Hmmm, instead of deleting the pointer I guess I could keep it but invoke ShareData again after the internal pointer to data_ changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is some solution to our problem.
Regarding cores reference I following approaches

  • an object which shares its own data could keep a list of reference and when change notify and invalidate the state of other objects
  • an object that shares data of other object checks its validity
  • some kind of RCU mechanism - then we will keep all data valid
    First and second requires a lot of synchronization that could hit performance.
    Another solution is to leave it as design assumption - like iterator invalidation when data we are iterating over changes.


// Save the calling TensorLists meta-data
data_ = other->data_;
shape_ = other->shape_;
Expand All @@ -143,10 +155,10 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
* @brief Wraps the raw allocation. The input pointer must not be nullptr.
* if the size of the allocation is zero, the TensorList is reset to
* a default state and is NOT marked as sharing data.
*
* After wrapping the allocation, the TensorLists size is set to 0,
* and its type is reset to NoType. Future calls to Resize or setting
* of the Tensor type will evaluate whether or not the current
*
* After wrapping the allocation, the TensorLists size is set to 0,
* and its type is reset to NoType. Future calls to Resize or setting
* of the Tensor type will evaluate whether or not the current
* allocation is large enough to be used and proceed appropriately.
*
* The TensorList object assumes no ownership of the input allocation,
Expand All @@ -157,6 +169,10 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
DLL_PUBLIC inline void ShareData(void *ptr, size_t bytes) {
DALI_ENFORCE(ptr != nullptr, "Input pointer must not be nullptr.");

// Tensor view of this TensorList is no longer valid
delete tensor_view_;
tensor_view_ = nullptr;

// Save our new pointer and bytes. Reset our type, shape, and size
data_.reset(ptr, [](void *) {});
num_bytes_ = bytes;
Expand Down Expand Up @@ -266,6 +282,16 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
return true;
}

Tensor<Backend> * AsTensor() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docs to this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

if (tensor_view_ == nullptr) {
tensor_view_ = new Tensor<Backend>();
tensor_view_->ShareData(this);
}

return tensor_view_;
}


// So we can access the members of other TensorListes
// with different template types
template <typename InBackend>
Expand All @@ -289,6 +315,12 @@ class DLL_PUBLIC TensorList : public Buffer<Backend> {
vector<Index> offsets_;
DALITensorLayout layout_;

// In order to not leak memory (and make it slightly faster)
// when sharing data with a Tensor, we will store a pointer to
// Tensor that shares the data with this TensorList (valid only
// if IsDenseTensor returns true)
Tensor<Backend> * tensor_view_;

USE_BUFFER_MEMBERS();
};

Expand Down
18 changes: 4 additions & 14 deletions dali/python/backend_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -295,18 +295,13 @@ void ExposeTensorList(py::module &m) { // NOLINT
Parameters
----------
)code")
.def("as_tensor",
[](TensorList<CPUBackend> &t) -> Tensor<CPUBackend>* {
Tensor<CPUBackend> * ret = new Tensor<CPUBackend>();
ret->ShareData(&t);
return ret;
},
.def("as_tensor", &TensorList<CPUBackend>::AsTensor,
R"code(
Returns a tensor that is a view of this `TensorList`.

This function can only be called if `is_dense_tensor` returns `True`.
)code",
py::return_value_policy::take_ownership);
py::return_value_policy::reference_internal);

py::class_<TensorList<GPUBackend>>(m, "TensorListGPU", py::buffer_protocol())
.def("__init__", [](TensorList<GPUBackend> &t) {
Expand Down Expand Up @@ -357,18 +352,13 @@ void ExposeTensorList(py::module &m) { // NOLINT
Parameters
----------
)code")
.def("as_tensor",
[](TensorList<GPUBackend> &t) -> Tensor<GPUBackend>* {
Tensor<GPUBackend> * ret = new Tensor<GPUBackend>();
ret->ShareData(&t);
return ret;
},
.def("as_tensor", &TensorList<GPUBackend>::AsTensor,
R"code(
Returns a tensor that is a view of this `TensorList`.

This function can only be called if `is_dense_tensor` returns `True`.
)code",
py::return_value_policy::take_ownership);
py::return_value_policy::reference_internal);
}

static vector<string> GetRegisteredCPUOps() {
Expand Down