Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(cuBLAS): store device pointers in ggml_tensor #1194

Closed
jon-chuang opened this issue Apr 26, 2023 · 1 comment
Closed

perf(cuBLAS): store device pointers in ggml_tensor #1194

jon-chuang opened this issue Apr 26, 2023 · 1 comment
Labels

Comments

@jon-chuang
Copy link
Contributor

jon-chuang commented Apr 26, 2023

We do not need to do (DTH/HTD) copy of tensor data. This is like how pytorch does it.

In self-attention, the kv cache could still be on host, but the host launches kernels on the device data based on the cache.

The ggml_tensor provides methods to sync to the operator device type to hide complexity.

Unfortunately, lazy sync is not the smartest way - knowing the full compute graph is much better to identify sync points; then one can overlap copy and compute wherever sync is required.

example:

if graph.sync_required(&tensor) {
  cudaAsyncCopy(...); // e.g. DTH
}
@jon-chuang jon-chuang changed the title perf(cuBLAS): store device pointers in ggml_tensor perf(cuBLAS): store device pointers in ggml_tensor; lazily copy Apr 26, 2023
@jon-chuang jon-chuang changed the title perf(cuBLAS): store device pointers in ggml_tensor; lazily copy perf(cuBLAS): store device pointers in ggml_tensor; lazily copy based on operator CUDA support Apr 26, 2023
@jon-chuang jon-chuang changed the title perf(cuBLAS): store device pointers in ggml_tensor; lazily copy based on operator CUDA support perf(cuBLAS): store device pointers in ggml_tensor Apr 26, 2023
@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant