Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 7th No.25】为 Paddle 新增 is_coalesced -part #68334

Merged
merged 18 commits into from
Oct 28, 2024

Conversation

NKNaN
Copy link
Contributor

@NKNaN NKNaN commented Sep 20, 2024

PR Category

User Experience

PR Types

New features

Description

新增 paddle.Tensor.is_coalesced
rfc: PaddlePaddle/community#961

@NKNaN NKNaN changed the title Hackathon 7th No.25】为 Paddle 新增 is_coalesced -part 【Hackathon 7th No.25】为 Paddle 新增 is_coalesced -part Sep 20, 2024
@paddle-bot paddle-bot bot added the contributor External developers label Sep 20, 2024
Copy link
Contributor

@lxd-cumt lxd-cumt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NKNaN
Copy link
Contributor Author

NKNaN commented Sep 26, 2024

这个单测写的有点问题,我再改一下

Comment on lines 32 to 36
def is_coalesced_naive_static(indices):
indices = list(zip(*indices))
duplicated_len = len(indices)
remove_duplicated_len = len(set(indices))
return duplicated_len == remove_duplicated_len
Copy link
Contributor

@jeff41404 jeff41404 Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function seem to be unused?

Copy link
Contributor Author

@NKNaN NKNaN Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It has been removed.

@NKNaN
Copy link
Contributor Author

NKNaN commented Sep 27, 2024

由于 SparseCooTensor 类中的 coalesce_ 属性在实例化时默认都为 false,当直接实例化一个 coalesced sparsecootensor 后 coalesce_ 属性仍会是 false,因此 is_coalesced() 不能直接使用底层的 SparseCooTensor::coalesced() 方法返回 coalesce_ 属性。重新修改了一下实现方式。

@jeff41404
Copy link
Contributor

According to API specifications, each API needs to support both dynamic and static graphs, so is_coalesced needs to support static graph(PIR Value), can refer to pir.cc

@NKNaN
Copy link
Contributor Author

NKNaN commented Sep 27, 2024

According to API specifications, each API needs to support both dynamic and static graphs, so is_coalesced needs to support static graph(PIR Value), can refer to pir.cc

Could we design the returned value of this api to be a 0-D bool type Tensor?
It seems difficult to make a bool value returned directly in static mode espcially when using existing paddle API or calling any kernel functions to implement is_coalesced.

Copy link

paddle-ci-bot bot commented Oct 5, 2024

Sorry to inform you that eaacb9e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Oct 9, 2024

  1. 这种类似于属性API,返回bool值就可以,类似于paddle.Tensor.is_sparsepaddle.Tensor.is_sparse_coopaddle.Tensor.is_sparse_csr,你要在eager_method中和pir中都实现一遍。

paddle/fluid/pybind/eager_method.cc
infoflow 2024-10-09 11-40-14

paddle/fluid/pybind/pir.cc
infoflow 2024-10-09 11-53-25

  1. 关于实现方式,为何不在C++中实现,维护一个coalesced_的Tensor的bool属性并返回就行?Pytorch是这样实现的。

@NKNaN
Copy link
Contributor Author

NKNaN commented Oct 9, 2024

  1. 这种类似于属性API,返回bool值就可以,类似于paddle.Tensor.is_sparsepaddle.Tensor.is_sparse_coopaddle.Tensor.is_sparse_csr,你要在eager_method中和pir中都实现一遍。

好的

  1. 关于实现方式,为何不在C++中实现,维护一个coalesced_的Tensor的bool属性并返回就行?Pytorch是这样实现的。

目前 SaprseCooTensor 底层中有 coalesced_ 这个 bool 属性,默认值是 false,但是目前的问题是在创建 tensor 的时候不会对这个属性做修改,只有调用 coalesce() 方法后才会把它改成 true。如果在创建 tensor 的时候就开始判断 coalesced_ 需不需要修改,这样是不是会影响创建 tensor 的效率?

@luotao1 luotao1 added the API label Oct 11, 2024
@NKNaN
Copy link
Contributor Author

NKNaN commented Oct 15, 2024

另外在维护 coalesced_ 的时候,如何确保静态图中的 coalesced_ 也能被正确设置呢?
要判断是否 coalesced 必须要用 indices,因为 indices 是 Tensor,涉及不同的设备,所以也只能在创建sparsecootensor的kernel里维护这个属性。

如果静态图的 is_coalesced() 接口是这样

.def("is_coalesced",
     [](Value self) {
       auto sparse_coo_tensor_type = self.type().dyn_cast<SparseCooTensorType>();
       if (sparse_coo_tensor_type) {
        return sparse_coo_tensor_type.coalesced();
       }
       return false;
     })

再将 phi::sparse::SparseCooTensorKernel 修改如下,在判断完是否是coalesced之后调用 SetCoalesced 方法。但是静态图中用paddle.sparse.sparse_coo_tensor 创建后不论哪种情况,SparseCooTensorType 的 coalesced_ 属性始终都是默认值。

template <typename T, typename Context>
void CheckCoalesced(const DenseTensor& indices,
                    DDim dims,
                    int64_t sparse_dim,
                    int64_t nnz,
                    bool* coalesced) {
  std::vector<T> sparse_offsets(sparse_dim), x_indexs(nnz);
  const T* ind_data = indices.data<T>();
  T* offset_data = sparse_offsets.data();
  T* x_indexs_data = x_indexs.data();
  T offset = 1;
  for (T i = sparse_dim - 1; i >= 0; i--) {
    offset_data[i] = offset;
    offset *= dims[i];
    printf("spase_offsets[%d]: %d\n", i, offset_data[i]);
  }
  for (int64_t i = 0; i < nnz; ++i) {
    x_indexs_data[i] = 0;
    for (T j = 0; j < sparse_dim; j++) {
      printf("j = %d, ind_data[%d]: %d\n", j, j * nnz + i, ind_data[j * nnz + i]);
      x_indexs_data[i] += ind_data[j * nnz + i] * offset_data[j];
    }
  }
  std::map<T, std::vector<int64_t>> indices_to_index;
  for (uint64_t i = 0; i < x_indexs.size(); i++) {
    T index = x_indexs[i];
    if (indices_to_index.find(index) == indices_to_index.end()) {
      std::vector<int64_t> indexs;
      indexs.push_back(static_cast<int>(i));
      indices_to_index[index] = indexs;
    } else {
      *coalesced = false;
      return;
    }
  }
  *coalesced = true;
}

template <typename Context>
void CheckAndSetCoalesced(const Context& dev_ctx,
                          SparseCooTensor* out) {
  bool coalesced = false;
  DenseTensor indices = out->indices();
  auto dims = out->dims();
  int64_t sparse_dim = static_cast<int64_t>(indices.dims()[0]);
  int64_t nnz = out->nnz();
  PD_VISIT_BASE_INTEGRAL_TYPES(
      indices.dtype(), "CheckCoalesced", ([&] {
        CheckCoalesced<data_t, Context>(indices, dims, sparse_dim, nnz, &coalesced);
      }));
  out->SetCoalesced(coalesced);
}

template <typename T, typename Context>
void SparseCooTensorKernel(const Context& dev_ctx,
                           const DenseTensor& values,
                           const DenseTensor& indices,
                           const std::vector<int64_t>& shape,
                           SparseCooTensor* out) {
  *out = SparseCooTensor(indices, values, common::make_ddim(shape));
  CheckAndSetCoalesced<Context>(dev_ctx, out);
}

是不是应该在静态图的 is_coalesced() 接口里面获取到 IrSparseCooTensor 或者其他对应的数据类型,应该如何转换呢?

@zhwesky2010
Copy link
Contributor

这个是不是想的太复杂了,应该就是维护一个属性就可以:

infoflow 2024-10-16 12-38-50

@NKNaN
Copy link
Contributor Author

NKNaN commented Oct 16, 2024

这个是不是想的太复杂了,应该就是维护一个属性就可以:

infoflow 2024-10-16 12-38-50

好的,我明白了

zhwesky2010
zhwesky2010 previously approved these changes Oct 25, 2024
Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -3503,6 +3559,10 @@ PyMethodDef variable_methods[] = { // NOLINT
(PyCFunction)(void (*)())tensor_method_to_sparse_csr,
METH_VARARGS | METH_KEYWORDS,
tensor_to_sparse_csr__doc__},
{"is_coalesced",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要在 python/paddle/tensor/tensor.prototype.pyi stub 中补充新 Tensor API 类型

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,已补充

self.expected_result = [False, False]

def test_is_coalesced(self):
if in_pir_mode():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有测不到的风险,in_pir_modeuse_pir_api and in_static_mode,如果此时不是静态图模式就会有测不到的风险

因此建议改为 use_pir_api 或者直接删掉该条件,现在默认就是 PIR 模式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除 in_pir_mode

std::dynamic_pointer_cast<phi::SparseCooTensor>(self->tensor.impl());
return ToPyObject(sparse_coo_tensor->coalesced());
} else {
return ToPyObject(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只是一个疑问,这里传入错误类型的情况下,感觉报错比返回 False 更好?这点在原来的设计中是有考虑到的么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是参考 pytorch 的设计

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> x = torch.as_tensor([[1., 2., 3.]])
>>> x.is_coalesced()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: is_coalesced expected sparse coordinate tensor layout but got Strided
>>> 

可是 PyTorch 貌似是报错?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我再改一下吧

paddle/fluid/pybind/eager_method.cc Outdated Show resolved Hide resolved
paddle/fluid/pybind/pir.cc Outdated Show resolved Hide resolved
test/legacy_test/test_sparse_is_coalesced.py Outdated Show resolved Hide resolved
test/legacy_test/test_sparse_is_coalesced.py Outdated Show resolved Hide resolved
test/legacy_test/test_sparse_is_coalesced.py Outdated Show resolved Hide resolved
NKNaN and others added 2 commits October 27, 2024 20:28
Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>
Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>
Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit e8eb28b into PaddlePaddle:develop Oct 28, 2024
27 checks passed
@luotao1
Copy link
Contributor

luotao1 commented Oct 28, 2024

请提交对应的中文文档

fxfxfxfxfxfxfxfx pushed a commit to fxfxfxfxfxfxfxfx/Paddle that referenced this pull request Oct 29, 2024
* add is_coalesced

* update test

* fix test_tensor_attr_consistency

* delete pir mode test and fix docs code example

* add is_coalesced

* update test

* fix test_tensor_attr_consistency

* delete pir mode test and fix docs code example

* fix test

* update

* update

* delete unnecessary

* fix test

* update test

* update: raise error if not sparsecootensor

* Update paddle/fluid/pybind/eager_method.cc

Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>

* Apply suggestions from code review

Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>

---------

Co-authored-by: Nyakku Shigure <sigure.qaq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants