Commit d3eced5
committed
[RFC] Support DLPACK C Functions for Speed Exchange and Stream Handling
This PR adds support for three C functions to speedup DLPack exchange.
As of now, DLPack exchange relies on python functions such as tensor.__dlpack__().
While they works well for common cases, the general overhead of such exchange is
at the level of 0.2-0.3 us for very well optimized version, and can go up to
0.4-1 us for less optimized implementation.
For a function that takes three arguments f(a, b, c), assume we run DLPack
exchange for each argument, the general conversion overhead usually gets to
around 1us and sometimes to 3us.
While such overhead can be acceptable in many settings, in GPU applications
the extra 1-3us overhead can still be significant.
This PR proposes three functions for speed exchange DLPack tensors without
going through python interpreter.
- DLPackPyObjectExporter: exports a PyObject Tensor to DLManagedTensorVesioned
- DLPackPyObjectImporter: DLManagedTensorVesioned import a PyObject Tensor
- DLPackTensorAllocator: Used to expose one package's tensor allocator to another package
- This allows for example we implement libraries that allocates intermediate tensor
based on the caller's specified Tensor Allocator.
Our preliminary results show that these functions, when incorporated correctly
via native extensions such as c/c++, can bring exchange cost to the level of
30ns - 80ns, giving us about one order of maginitude speedup. That means functions
like f(a, b, c) can finish at 0.2us-0.4us level, which is close to what native
cpp extension overhead do without exchange.1 parent 3ea601b commit d3eced5
1 file changed
+82
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
324 | 324 | | |
325 | 325 | | |
326 | 326 | | |
327 | | - | |
| 327 | + | |
328 | 328 | | |
329 | 329 | | |
330 | 330 | | |
| |||
358 | 358 | | |
359 | 359 | | |
360 | 360 | | |
361 | | - | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
362 | 441 | | |
363 | 442 | | |
364 | 443 | | |
| |||
0 commit comments