-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quality] Fewer recompiles with tensordict #1015
Conversation
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 52.5280μs | 25.4238μs | 39.3332 KOps/s | 50.6356 KOps/s | |
test_plain_set_stack_nested | 54.2810μs | 25.5983μs | 39.0652 KOps/s | 51.0738 KOps/s | |
test_plain_set_nested_inplace | 87.6840μs | 27.8584μs | 35.8958 KOps/s | 46.4838 KOps/s | |
test_plain_set_stack_nested_inplace | 67.7560μs | 27.8675μs | 35.8841 KOps/s | 46.7577 KOps/s | |
test_items | 22.8430μs | 4.2284μs | 236.4967 KOps/s | 237.9640 KOps/s | |
test_items_nested | 1.0146ms | 0.3888ms | 2.5718 KOps/s | 2.7244 KOps/s | |
test_items_nested_locked | 0.7068ms | 0.3880ms | 2.5775 KOps/s | 2.7401 KOps/s | |
test_items_nested_leaf | 0.1574ms | 80.9068μs | 12.3599 KOps/s | 14.6150 KOps/s | |
test_items_stack_nested | 0.6043ms | 0.3916ms | 2.5536 KOps/s | 2.6864 KOps/s | |
test_items_stack_nested_leaf | 0.2059ms | 81.9896μs | 12.1967 KOps/s | 14.2572 KOps/s | |
test_items_stack_nested_locked | 0.8242ms | 0.3897ms | 2.5663 KOps/s | 2.7190 KOps/s | |
test_keys | 17.1620μs | 3.5241μs | 283.7600 KOps/s | 283.2206 KOps/s | |
test_keys_nested | 0.2509ms | 0.1374ms | 7.2805 KOps/s | 10.0541 KOps/s | |
test_keys_nested_locked | 0.7868ms | 0.1427ms | 7.0057 KOps/s | 9.5525 KOps/s | |
test_keys_nested_leaf | 0.2207ms | 0.1197ms | 8.3540 KOps/s | 12.0936 KOps/s | |
test_keys_stack_nested | 0.2436ms | 0.1375ms | 7.2723 KOps/s | 10.0583 KOps/s | |
test_keys_stack_nested_leaf | 0.2105ms | 0.1192ms | 8.3889 KOps/s | 11.8403 KOps/s | |
test_keys_stack_nested_locked | 0.2999ms | 0.1416ms | 7.0615 KOps/s | 9.5520 KOps/s | |
test_values | 14.8156μs | 1.0374μs | 963.9754 KOps/s | 948.8590 KOps/s | |
test_values_nested | 0.1812ms | 93.3626μs | 10.7109 KOps/s | 13.1928 KOps/s | |
test_values_nested_locked | 0.1568ms | 91.9285μs | 10.8780 KOps/s | 13.1873 KOps/s | |
test_values_nested_leaf | 0.1452ms | 79.2201μs | 12.6231 KOps/s | 15.9650 KOps/s | |
test_values_stack_nested | 0.1936ms | 92.2655μs | 10.8383 KOps/s | 13.0281 KOps/s | |
test_values_stack_nested_leaf | 0.1482ms | 78.9214μs | 12.6708 KOps/s | 16.0242 KOps/s | |
test_values_stack_nested_locked | 0.1771ms | 93.2846μs | 10.7199 KOps/s | 12.7471 KOps/s | |
test_membership | 6.6439μs | 0.7568μs | 1.3213 MOps/s | 1.3491 MOps/s | |
test_membership_nested | 22.8120μs | 2.7387μs | 365.1401 KOps/s | 348.2162 KOps/s | |
test_membership_nested_leaf | 37.5100μs | 2.7510μs | 363.4992 KOps/s | 357.5259 KOps/s | |
test_membership_stacked_nested | 25.3270μs | 2.7637μs | 361.8325 KOps/s | 361.9061 KOps/s | |
test_membership_stacked_nested_leaf | 33.3830μs | 2.7464μs | 364.1086 KOps/s | 368.2879 KOps/s | |
test_membership_nested_last | 45.4150μs | 4.1904μs | 238.6422 KOps/s | 250.0458 KOps/s | |
test_membership_nested_leaf_last | 45.4340μs | 4.1551μs | 240.6652 KOps/s | 247.6444 KOps/s | |
test_membership_stacked_nested_last | 25.3170μs | 4.1902μs | 238.6539 KOps/s | 250.4574 KOps/s | |
test_membership_stacked_nested_leaf_last | 25.6780μs | 4.1752μs | 239.5083 KOps/s | 248.6750 KOps/s | |
test_nested_getleaf | 78.5870μs | 10.4906μs | 95.3235 KOps/s | 93.5569 KOps/s | |
test_nested_get | 44.5530μs | 10.2157μs | 97.8883 KOps/s | 97.7203 KOps/s | |
test_stacked_getleaf | 49.8430μs | 10.8574μs | 92.1032 KOps/s | 93.5413 KOps/s | |
test_stacked_get | 36.0570μs | 10.2756μs | 97.3179 KOps/s | 99.3885 KOps/s | |
test_nested_getitemleaf | 57.9690μs | 11.1672μs | 89.5478 KOps/s | 90.8757 KOps/s | |
test_nested_getitem | 49.2820μs | 10.4821μs | 95.4008 KOps/s | 97.5004 KOps/s | |
test_stacked_getitemleaf | 36.4280μs | 10.9320μs | 91.4749 KOps/s | 90.8016 KOps/s | |
test_stacked_getitem | 55.2030μs | 10.5930μs | 94.4024 KOps/s | 97.6922 KOps/s | |
test_lock_nested | 95.8604ms | 0.6257ms | 1.5981 KOps/s | 2.0043 KOps/s | |
test_lock_stack_nested | 0.5751ms | 0.4864ms | 2.0561 KOps/s | 2.1251 KOps/s | |
test_unlock_nested | 92.3215ms | 0.5208ms | 1.9202 KOps/s | 2.3966 KOps/s | |
test_unlock_stack_nested | 0.5040ms | 0.3991ms | 2.5056 KOps/s | 2.5641 KOps/s | |
test_flatten_speed | 0.2388ms | 0.1022ms | 9.7851 KOps/s | 11.5466 KOps/s | |
test_unflatten_speed | 0.7438ms | 0.5180ms | 1.9304 KOps/s | 2.1776 KOps/s | |
test_common_ops | 4.8549ms | 1.1665ms | 857.2712 Ops/s | 873.0206 Ops/s | |
test_creation | 32.8320μs | 2.0640μs | 484.4865 KOps/s | 459.2159 KOps/s | |
test_creation_empty | 57.3570μs | 19.0964μs | 52.3660 KOps/s | 57.6140 KOps/s | |
test_creation_nested_1 | 62.9380μs | 22.4894μs | 44.4654 KOps/s | 47.7511 KOps/s | |
test_creation_nested_2 | 89.4670μs | 27.0279μs | 36.9989 KOps/s | 39.0885 KOps/s | |
test_clone | 0.2677ms | 17.2934μs | 57.8257 KOps/s | 57.8280 KOps/s | |
test_getitem[int] | 1.3032ms | 17.2590μs | 57.9409 KOps/s | 58.6821 KOps/s | |
test_getitem[slice_int] | 0.1575ms | 31.2009μs | 32.0503 KOps/s | 31.7674 KOps/s | |
test_getitem[range] | 0.1930ms | 60.0571μs | 16.6508 KOps/s | 16.9484 KOps/s | |
test_getitem[tuple] | 0.2832ms | 27.3277μs | 36.5930 KOps/s | 38.4966 KOps/s | |
test_getitem[list] | 0.6674ms | 57.6022μs | 17.3604 KOps/s | 18.4638 KOps/s | |
test_setitem_dim[int] | 61.2740μs | 33.6754μs | 29.6953 KOps/s | 30.0420 KOps/s | |
test_setitem_dim[slice_int] | 0.1026ms | 62.7045μs | 15.9478 KOps/s | 16.0556 KOps/s | |
test_setitem_dim[range] | 0.1268ms | 84.6054μs | 11.8196 KOps/s | 11.6799 KOps/s | |
test_setitem_dim[tuple] | 0.1277ms | 52.0565μs | 19.2099 KOps/s | 20.0584 KOps/s | |
test_setitem | 0.3421ms | 31.5062μs | 31.7398 KOps/s | 34.1071 KOps/s | |
test_set | 0.3186ms | 30.7136μs | 32.5588 KOps/s | 34.7369 KOps/s | |
test_set_shared | 3.1929ms | 0.2199ms | 4.5468 KOps/s | 4.6999 KOps/s | |
test_update | 0.1912ms | 38.3671μs | 26.0640 KOps/s | 28.1376 KOps/s | |
test_update_nested | 0.4018ms | 50.2577μs | 19.8974 KOps/s | 21.7803 KOps/s | |
test_update__nested | 0.3299ms | 38.2712μs | 26.1293 KOps/s | 26.9184 KOps/s | |
test_set_nested | 78.5170μs | 32.8756μs | 30.4177 KOps/s | 31.6184 KOps/s | |
test_set_nested_new | 0.1018ms | 38.8190μs | 25.7606 KOps/s | 27.5345 KOps/s | |
test_select | 0.1153ms | 56.2961μs | 17.7632 KOps/s | 18.6830 KOps/s | |
test_select_nested | 0.1201ms | 59.8743μs | 16.7017 KOps/s | 16.7014 KOps/s | |
test_exclude_nested | 0.1493ms | 75.7426μs | 13.2026 KOps/s | 13.3628 KOps/s | |
test_empty[True] | 0.7206ms | 0.3593ms | 2.7834 KOps/s | 3.1811 KOps/s | |
test_empty[False] | 11.3765μs | 1.3322μs | 750.6229 KOps/s | 794.1036 KOps/s | |
test_unbind_speed | 0.4790ms | 0.3120ms | 3.2054 KOps/s | 3.2850 KOps/s | |
test_unbind_speed_stack0 | 0.7142ms | 0.3044ms | 3.2852 KOps/s | 3.2957 KOps/s | |
test_unbind_speed_stack1 | 96.1271ms | 0.8500ms | 1.1764 KOps/s | 1.3267 KOps/s | |
test_split | 3.2148ms | 2.0352ms | 491.3490 Ops/s | 451.7202 Ops/s | |
test_chunk | 0.1045s | 2.2313ms | 448.1766 Ops/s | 445.4621 Ops/s | |
test_creation[device0] | 0.2749ms | 0.1170ms | 8.5469 KOps/s | 8.3983 KOps/s | |
test_creation_from_tensor | 3.7066ms | 0.1207ms | 8.2820 KOps/s | 8.6276 KOps/s | |
test_add_one[memmap_tensor0] | 0.1195ms | 7.6861μs | 130.1050 KOps/s | 136.4302 KOps/s | |
test_contiguous[memmap_tensor0] | 21.1590μs | 1.8854μs | 530.3995 KOps/s | 524.9079 KOps/s | |
test_stack[memmap_tensor0] | 0.1138ms | 5.7498μs | 173.9196 KOps/s | 177.6803 KOps/s | |
test_memmaptd_index | 1.2492ms | 0.4184ms | 2.3902 KOps/s | 2.4647 KOps/s | |
test_memmaptd_index_astensor | 98.6600ms | 0.5763ms | 1.7352 KOps/s | 2.0359 KOps/s | |
test_memmaptd_index_op | 1.8819ms | 1.0908ms | 916.7726 Ops/s | 983.6657 Ops/s | |
test_serialize_model | 0.1303s | 0.1203s | 8.3130 Ops/s | 8.2267 Ops/s | |
test_serialize_model_pickle | 0.4746s | 0.3983s | 2.5105 Ops/s | 2.5050 Ops/s | |
test_serialize_weights | 0.1252s | 0.1177s | 8.4927 Ops/s | 7.7563 Ops/s | |
test_serialize_weights_returnearly | 0.1735s | 0.1613s | 6.2004 Ops/s | 6.3441 Ops/s | |
test_serialize_weights_pickle | 0.5297s | 0.4389s | 2.2783 Ops/s | 2.5403 Ops/s | |
test_serialize_weights_filesystem | 0.1458s | 0.1417s | 7.0583 Ops/s | 7.0527 Ops/s | |
test_serialize_model_filesystem | 0.1704s | 0.1514s | 6.6056 Ops/s | 6.2358 Ops/s | |
test_reshape_pytree | 91.5410μs | 40.2051μs | 24.8725 KOps/s | 25.7875 KOps/s | |
test_reshape_td | 0.1101ms | 46.3588μs | 21.5709 KOps/s | 21.3064 KOps/s | |
test_view_pytree | 87.5140μs | 40.0018μs | 24.9989 KOps/s | 25.9618 KOps/s | |
test_view_td | 0.1306ms | 53.3409μs | 18.7473 KOps/s | 19.2487 KOps/s | |
test_unbind_pytree | 91.7720μs | 37.9932μs | 26.3205 KOps/s | 27.5309 KOps/s | |
test_unbind_td | 0.3216ms | 45.7928μs | 21.8375 KOps/s | 22.1872 KOps/s | |
test_split_pytree | 82.1640μs | 38.7472μs | 25.8083 KOps/s | 26.6452 KOps/s | |
test_split_td | 0.2033ms | 58.6899μs | 17.0387 KOps/s | 17.1213 KOps/s | |
test_add_pytree | 96.7310μs | 46.8397μs | 21.3494 KOps/s | 22.4622 KOps/s | |
test_add_td | 0.1658ms | 89.2628μs | 11.2029 KOps/s | 12.2443 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1542ms | 59.7138μs | 16.7465 KOps/s | 17.3623 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3962ms | 0.1997ms | 5.0081 KOps/s | 5.6986 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1846ms | 57.8018μs | 17.3005 KOps/s | 17.8440 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2968ms | 0.1438ms | 6.9530 KOps/s | 6.9121 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 65.5930μs | 24.0129μs | 41.6444 KOps/s | 47.1960 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1656ms | 74.6927μs | 13.3882 KOps/s | 15.1157 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1968ms | 76.7510μs | 13.0292 KOps/s | 13.4275 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1293ms | 69.9979μs | 14.2861 KOps/s | 14.8034 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3870ms | 0.1834ms | 5.4532 KOps/s | 5.8249 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.4920ms | 0.2427ms | 4.1199 KOps/s | 5.2552 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1246ms | 49.9236μs | 20.0306 KOps/s | 21.5314 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1960ms | 79.1837μs | 12.6289 KOps/s | 14.0782 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.4037ms | 0.1796ms | 5.5685 KOps/s | 5.6730 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.4822ms | 0.2931ms | 3.4113 KOps/s | 3.4246 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.5525ms | 0.2802ms | 3.5684 KOps/s | 4.8574 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.6155ms | 0.1835ms | 5.4491 KOps/s | 5.8192 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1731ms | 75.0865μs | 13.3180 KOps/s | 15.9968 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1180ms | 49.1152μs | 20.3603 KOps/s | 20.9002 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.5159ms | 0.2333ms | 4.2860 KOps/s | 4.2307 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.2816ms | 0.1732ms | 5.7720 KOps/s | 5.7713 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1972ms | 0.1102ms | 9.0714 KOps/s | 9.7118 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1552ms | 79.0567μs | 12.6491 KOps/s | 17.2780 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1599ms | 79.7467μs | 12.5397 KOps/s | 13.0808 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1281ms | 70.4465μs | 14.1952 KOps/s | 14.4446 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.2905ms | 0.1925ms | 5.1940 KOps/s | 5.1037 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.3380ms | 1.7530ms | 570.4348 Ops/s | 608.9841 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.3001ms | 0.1913ms | 5.2286 KOps/s | 5.2751 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.3755ms | 1.0973ms | 911.3689 Ops/s | 898.9086 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.8140ms | 0.4165ms | 2.4010 KOps/s | 2.3875 KOps/s | |
test_compile_assign_and_add_stack[eager] | 4.4548ms | 4.2184ms | 237.0559 Ops/s | 272.4028 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1075ms | 34.1583μs | 29.2755 KOps/s | 29.4513 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 1.1431ms | 49.0107μs | 20.4037 KOps/s | 20.7049 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 78.3270μs | 30.5901μs | 32.6903 KOps/s | 33.9611 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 89.3140μs | 30.1788μs | 33.1359 KOps/s | 34.5231 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 84.0980μs | 30.1528μs | 33.1644 KOps/s | 34.2421 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.1375ms | 30.8045μs | 32.4628 KOps/s | 35.1459 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1407ms | 74.7249μs | 13.3824 KOps/s | 13.3555 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.4238ms | 29.4752μs | 33.9268 KOps/s | 35.2844 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1375ms | 68.6670μs | 14.5630 KOps/s | 14.5170 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 65.0520μs | 24.9948μs | 40.0083 KOps/s | 42.4864 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1324ms | 68.2543μs | 14.6511 KOps/s | 14.6519 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 80.7210μs | 24.2395μs | 41.2550 KOps/s | 42.5806 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.2010ms | 74.3842μs | 13.4437 KOps/s | 13.4987 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 1.2181ms | 28.4105μs | 35.1982 KOps/s | 35.6141 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1284ms | 68.3769μs | 14.6248 KOps/s | 14.8663 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 89.5810μs | 24.2711μs | 41.2013 KOps/s | 42.5452 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.7682ms | 70.6882μs | 14.1466 KOps/s | 14.9674 KOps/s | |
test_compile_indexing[int-pytree-eager] | 73.4470μs | 24.3050μs | 41.1438 KOps/s | 43.1024 KOps/s | |
test_mod_add[eager] | 67.6860μs | 26.5746μs | 37.6299 KOps/s | 38.9380 KOps/s | |
test_mod_add[compile] | 0.1275ms | 38.6500μs | 25.8732 KOps/s | 25.9469 KOps/s | |
test_mod_add[compile-overhead] | 0.1103ms | 38.0750μs | 26.2640 KOps/s | 25.9125 KOps/s | |
test_mod_wrap[eager] | 0.3974ms | 0.2129ms | 4.6981 KOps/s | 4.8251 KOps/s | |
test_mod_wrap[compile] | 0.4515ms | 0.2354ms | 4.2487 KOps/s | 4.2652 KOps/s | |
test_mod_wrap[compile-overhead] | 1.2495ms | 0.2477ms | 4.0376 KOps/s | 4.3002 KOps/s | |
test_mod_wrap_and_backward[eager] | 12.3820ms | 10.6665ms | 93.7514 Ops/s | 91.3536 Ops/s | |
test_mod_wrap_and_backward[compile] | 13.9107ms | 10.7753ms | 92.8049 Ops/s | 85.5322 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.4006ms | 10.8340ms | 92.3024 Ops/s | 81.5114 Ops/s | |
test_seq_add[eager] | 0.2246ms | 94.5870μs | 10.5723 KOps/s | 11.0526 KOps/s | |
test_seq_add[compile] | 0.1444ms | 64.3748μs | 15.5340 KOps/s | 15.5378 KOps/s | |
test_seq_add[compile-overhead] | 0.1605ms | 64.4678μs | 15.5116 KOps/s | 15.9670 KOps/s | |
test_seq_wrap[eager] | 0.6115ms | 0.3970ms | 2.5186 KOps/s | 2.5936 KOps/s | |
test_seq_wrap[compile] | 1.2852ms | 0.2725ms | 3.6692 KOps/s | 3.6500 KOps/s | |
test_seq_wrap[compile-overhead] | 1.2171ms | 0.2712ms | 3.6874 KOps/s | 3.3649 KOps/s | |
test_func_call_runtime[False-eager] | 0.9074ms | 0.5292ms | 1.8898 KOps/s | 1.8455 KOps/s | |
test_func_call_runtime[False-compile] | 0.9351ms | 0.5074ms | 1.9710 KOps/s | 1.9630 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.8807ms | 0.5054ms | 1.9786 KOps/s | 1.9653 KOps/s | |
test_func_call_runtime[True-eager] | 1.2326ms | 0.7630ms | 1.3106 KOps/s | 1.3123 KOps/s | |
test_func_call_runtime[True-compile] | 0.7451ms | 0.5153ms | 1.9408 KOps/s | 1.9162 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.8720ms | 0.5157ms | 1.9392 KOps/s | 1.9115 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.7117ms | 0.5280ms | 1.8938 KOps/s | 1.8673 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.6362ms | 0.5055ms | 1.9781 KOps/s | 1.9569 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6367ms | 0.5055ms | 1.9782 KOps/s | 1.9593 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.2548ms | 0.9095ms | 1.0995 KOps/s | 1.1198 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.0941ms | 0.7535ms | 1.3271 KOps/s | 1.3299 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.1965ms | 0.7638ms | 1.3092 KOps/s | 1.3201 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 3.4389ms | 1.9951ms | 501.2325 Ops/s | 527.3464 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.7899ms | 1.9808ms | 504.8469 Ops/s | 509.2792 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 6.6689ms | 2.0762ms | 481.6406 Ops/s | 513.7446 Ops/s | |
test_distributed | 0.3691ms | 0.1241ms | 8.0603 KOps/s | 7.8696 KOps/s | |
test_tdmodule | 34.2040μs | 19.3201μs | 51.7596 KOps/s | 54.1781 KOps/s | |
test_tdmodule_dispatch | 60.1820μs | 39.1860μs | 25.5193 KOps/s | 27.6035 KOps/s | |
test_tdseq | 43.1110μs | 21.5145μs | 46.4803 KOps/s | 46.5453 KOps/s | |
test_tdseq_dispatch | 81.6430μs | 42.8608μs | 23.3314 KOps/s | 23.5672 KOps/s | |
test_instantiation_functorch | 1.7652ms | 1.6223ms | 616.4221 Ops/s | 623.3867 Ops/s | |
test_instantiation_td | 1.8757ms | 1.2031ms | 831.2062 Ops/s | 835.3457 Ops/s | |
test_exec_functorch | 0.2944ms | 0.1894ms | 5.2785 KOps/s | 5.1546 KOps/s | |
test_exec_functional_call | 0.2969ms | 0.1790ms | 5.5868 KOps/s | 5.5183 KOps/s | |
test_exec_td | 0.3819ms | 0.2068ms | 4.8359 KOps/s | 5.7331 KOps/s | |
test_exec_td_decorator | 1.1741ms | 0.2398ms | 4.1704 KOps/s | 4.3629 KOps/s | |
test_vmap_mlp_speed[True-True] | 1.9191ms | 0.7025ms | 1.4235 KOps/s | 1.5104 KOps/s | |
test_vmap_mlp_speed[True-False] | 0.9944ms | 0.6884ms | 1.4526 KOps/s | 1.5303 KOps/s | |
test_vmap_mlp_speed[False-True] | 0.8796ms | 0.5424ms | 1.8437 KOps/s | 1.9976 KOps/s | |
test_vmap_mlp_speed[False-False] | 0.8445ms | 0.5395ms | 1.8537 KOps/s | 1.9855 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.9089ms | 0.6521ms | 1.5334 KOps/s | 1.5796 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.0291ms | 0.6507ms | 1.5369 KOps/s | 1.5732 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8402ms | 0.5336ms | 1.8739 KOps/s | 1.9201 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7785ms | 0.5327ms | 1.8772 KOps/s | 1.9214 KOps/s | |
test_to_module_speed[True] | 1.5057ms | 1.4113ms | 708.5466 Ops/s | 759.1578 Ops/s | |
test_to_module_speed[False] | 1.6392ms | 1.3667ms | 731.6943 Ops/s | 783.2558 Ops/s | |
test_tc_init | 87.3140μs | 47.1565μs | 21.2060 KOps/s | 23.4589 KOps/s | |
test_tc_init_nested | 0.1781ms | 93.9291μs | 10.6463 KOps/s | 12.4837 KOps/s | |
test_tc_first_layer_tensor | 13.4650μs | 1.5876μs | 629.8704 KOps/s | 664.2696 KOps/s | |
test_tc_first_layer_nontensor | 40.3850μs | 4.7671μs | 209.7718 KOps/s | 213.5247 KOps/s | |
test_tc_second_layer_tensor | 21.0000μs | 2.8561μs | 350.1292 KOps/s | 347.2367 KOps/s | |
test_tc_second_layer_nontensor | 49.2620μs | 6.1475μs | 162.6671 KOps/s | 164.7116 KOps/s | |
test_unbind | 0.4703s | 15.0249ms | 66.5560 Ops/s | 137.4057 Ops/s | |
test_full_like | 8.2996ms | 7.5221ms | 132.9412 Ops/s | 85.1622 Ops/s | |
test_zeros_like | 3.4132ms | 2.8825ms | 346.9211 Ops/s | 128.4417 Ops/s | |
test_ones_like | 3.6936ms | 3.2817ms | 304.7246 Ops/s | 130.2160 Ops/s | |
test_clone | 5.8623ms | 5.2524ms | 190.3908 Ops/s | 105.9270 Ops/s | |
test_squeeze | 65.0320μs | 12.6692μs | 78.9313 KOps/s | 81.9600 KOps/s | |
test_unsqueeze | 0.3572ms | 98.5459μs | 10.1476 KOps/s | 10.6506 KOps/s | |
test_split | 0.3821ms | 0.1978ms | 5.0566 KOps/s | 5.2849 KOps/s | |
test_permute | 0.3670ms | 0.2203ms | 4.5403 KOps/s | 4.5500 KOps/s | |
test_stack | 32.2810ms | 25.2936ms | 39.5357 Ops/s | 40.9458 Ops/s | |
test_cat | 28.3752ms | 24.7969ms | 40.3277 Ops/s | 41.5343 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 0.1295ms | 17.7268μs | 56.4117 KOps/s | 73.1101 KOps/s | |
test_plain_set_stack_nested | 49.3010μs | 18.2956μs | 54.6578 KOps/s | 72.4975 KOps/s | |
test_plain_set_nested_inplace | 58.4910μs | 19.0279μs | 52.5545 KOps/s | 67.6940 KOps/s | |
test_plain_set_stack_nested_inplace | 63.6220μs | 19.1253μs | 52.2868 KOps/s | 68.0528 KOps/s | |
test_items | 33.4410μs | 2.8844μs | 346.6866 KOps/s | 346.2110 KOps/s | |
test_items_nested | 0.3963ms | 0.3377ms | 2.9613 KOps/s | 3.0178 KOps/s | |
test_items_nested_locked | 0.4000ms | 0.3385ms | 2.9538 KOps/s | 3.0004 KOps/s | |
test_items_nested_leaf | 93.9020μs | 62.0087μs | 16.1268 KOps/s | 17.9075 KOps/s | |
test_items_stack_nested | 0.4678ms | 0.3415ms | 2.9279 KOps/s | 3.0183 KOps/s | |
test_items_stack_nested_leaf | 0.1078ms | 63.4636μs | 15.7571 KOps/s | 17.6214 KOps/s | |
test_items_stack_nested_locked | 0.4158ms | 0.3421ms | 2.9233 KOps/s | 3.0278 KOps/s | |
test_keys | 30.0200μs | 3.4248μs | 291.9882 KOps/s | 268.1407 KOps/s | |
test_keys_nested | 97.7920μs | 70.8110μs | 14.1221 KOps/s | 17.7196 KOps/s | |
test_keys_nested_locked | 2.5311ms | 76.3900μs | 13.0907 KOps/s | 16.0578 KOps/s | |
test_keys_nested_leaf | 89.8420μs | 61.7497μs | 16.1944 KOps/s | 21.1920 KOps/s | |
test_keys_stack_nested | 0.1043ms | 71.6869μs | 13.9495 KOps/s | 17.9184 KOps/s | |
test_keys_stack_nested_leaf | 99.6220μs | 63.0468μs | 15.8612 KOps/s | 20.8985 KOps/s | |
test_keys_stack_nested_locked | 0.1141ms | 77.2788μs | 12.9402 KOps/s | 16.4976 KOps/s | |
test_values | 5.2618μs | 0.8399μs | 1.1906 MOps/s | 1.1818 MOps/s | |
test_values_nested | 78.3820μs | 48.7115μs | 20.5290 KOps/s | 24.3705 KOps/s | |
test_values_nested_locked | 94.0320μs | 49.9304μs | 20.0279 KOps/s | 23.3115 KOps/s | |
test_values_nested_leaf | 74.7210μs | 42.4158μs | 23.5761 KOps/s | 28.1672 KOps/s | |
test_values_stack_nested | 83.5920μs | 49.4504μs | 20.2223 KOps/s | 23.9571 KOps/s | |
test_values_stack_nested_leaf | 70.6620μs | 43.0671μs | 23.2196 KOps/s | 28.1014 KOps/s | |
test_values_stack_nested_locked | 0.1049ms | 51.0987μs | 19.5700 KOps/s | 23.0356 KOps/s | |
test_membership | 1.5810μs | 0.5084μs | 1.9669 MOps/s | 1.9920 MOps/s | |
test_membership_nested | 17.6900μs | 1.8827μs | 531.1531 KOps/s | 513.3789 KOps/s | |
test_membership_nested_leaf | 15.5605μs | 1.8821μs | 531.3137 KOps/s | 515.2510 KOps/s | |
test_membership_stacked_nested | 27.7410μs | 1.9428μs | 514.7146 KOps/s | 489.0889 KOps/s | |
test_membership_stacked_nested_leaf | 26.3700μs | 1.9201μs | 520.8064 KOps/s | 493.4820 KOps/s | |
test_membership_nested_last | 33.1500μs | 2.9915μs | 334.2791 KOps/s | 349.7596 KOps/s | |
test_membership_nested_leaf_last | 36.9510μs | 2.9419μs | 339.9145 KOps/s | 351.1857 KOps/s | |
test_membership_stacked_nested_last | 42.7710μs | 2.9597μs | 337.8706 KOps/s | 126.4678 KOps/s | |
test_membership_stacked_nested_leaf_last | 28.9000μs | 2.9740μs | 336.2476 KOps/s | 127.0423 KOps/s | |
test_nested_getleaf | 29.7210μs | 6.1183μs | 163.4446 KOps/s | 163.2544 KOps/s | |
test_nested_get | 47.1010μs | 5.8565μs | 170.7494 KOps/s | 172.5090 KOps/s | |
test_stacked_getleaf | 39.8110μs | 6.0496μs | 165.3003 KOps/s | 163.4084 KOps/s | |
test_stacked_get | 37.0610μs | 5.7424μs | 174.1427 KOps/s | 172.7656 KOps/s | |
test_nested_getitemleaf | 46.2410μs | 6.1657μs | 162.1867 KOps/s | 161.9558 KOps/s | |
test_nested_getitem | 29.1110μs | 5.8380μs | 171.2927 KOps/s | 170.7546 KOps/s | |
test_stacked_getitemleaf | 50.8500μs | 6.0911μs | 164.1739 KOps/s | 161.7063 KOps/s | |
test_stacked_getitem | 41.1410μs | 5.7902μs | 172.7047 KOps/s | 171.7185 KOps/s | |
test_lock_nested | 6.9541ms | 0.4413ms | 2.2663 KOps/s | 2.3240 KOps/s | |
test_lock_stack_nested | 0.4517ms | 0.3912ms | 2.5565 KOps/s | 2.6693 KOps/s | |
test_unlock_nested | 0.7649ms | 0.3671ms | 2.7239 KOps/s | 2.7464 KOps/s | |
test_unlock_stack_nested | 0.3858ms | 0.3296ms | 3.0339 KOps/s | 3.1937 KOps/s | |
test_flatten_speed | 0.1514ms | 76.3147μs | 13.1036 KOps/s | 14.4747 KOps/s | |
test_unflatten_speed | 0.3729ms | 0.3223ms | 3.1029 KOps/s | 3.5131 KOps/s | |
test_common_ops | 1.6654ms | 1.3511ms | 740.1272 Ops/s | 807.7316 Ops/s | |
test_creation | 23.7600μs | 1.4831μs | 674.2649 KOps/s | 679.0048 KOps/s | |
test_creation_empty | 52.2310μs | 17.9995μs | 55.5571 KOps/s | 72.2246 KOps/s | |
test_creation_nested_1 | 53.4010μs | 19.8246μs | 50.4424 KOps/s | 64.8024 KOps/s | |
test_creation_nested_2 | 54.3110μs | 22.2775μs | 44.8884 KOps/s | 55.7237 KOps/s | |
test_clone | 81.9210μs | 28.5557μs | 35.0192 KOps/s | 33.1163 KOps/s | |
test_getitem[int] | 1.2844ms | 16.4376μs | 60.8362 KOps/s | 59.0853 KOps/s | |
test_getitem[slice_int] | 0.1207ms | 28.4941μs | 35.0950 KOps/s | 34.4121 KOps/s | |
test_getitem[range] | 0.2442ms | 0.1128ms | 8.8629 KOps/s | 8.8091 KOps/s | |
test_getitem[tuple] | 0.1296ms | 24.2210μs | 41.2865 KOps/s | 39.9714 KOps/s | |
test_getitem[list] | 0.2001ms | 0.1004ms | 9.9634 KOps/s | 9.8696 KOps/s | |
test_setitem_dim[int] | 69.8310μs | 45.9728μs | 21.7520 KOps/s | 21.2424 KOps/s | |
test_setitem_dim[slice_int] | 94.8720μs | 68.7276μs | 14.5502 KOps/s | 14.3918 KOps/s | |
test_setitem_dim[range] | 0.1802ms | 0.1356ms | 7.3746 KOps/s | 7.6192 KOps/s | |
test_setitem_dim[tuple] | 99.4220μs | 65.2238μs | 15.3318 KOps/s | 15.9912 KOps/s | |
test_setitem | 96.3520μs | 46.0683μs | 21.7069 KOps/s | 23.9023 KOps/s | |
test_set | 0.1266ms | 45.5377μs | 21.9598 KOps/s | 24.6599 KOps/s | |
test_set_shared | 0.3569ms | 54.9143μs | 18.2102 KOps/s | 19.2565 KOps/s | |
test_update | 94.6520μs | 53.8904μs | 18.5562 KOps/s | 20.4450 KOps/s | |
test_update_nested | 96.2120μs | 60.0308μs | 16.6581 KOps/s | 17.4711 KOps/s | |
test_update__nested | 0.1007ms | 60.6191μs | 16.4965 KOps/s | 16.0931 KOps/s | |
test_set_nested | 88.4420μs | 44.0309μs | 22.7113 KOps/s | 23.3141 KOps/s | |
test_set_nested_new | 88.1320μs | 47.9703μs | 20.8463 KOps/s | 21.2548 KOps/s | |
test_select | 0.1075ms | 60.7252μs | 16.4676 KOps/s | 16.5350 KOps/s | |
test_select_nested | 79.5120μs | 41.5630μs | 24.0599 KOps/s | 23.6543 KOps/s | |
test_exclude_nested | 95.0320μs | 58.1005μs | 17.2115 KOps/s | 16.7956 KOps/s | |
test_empty[True] | 0.3253ms | 0.2557ms | 3.9112 KOps/s | 4.0625 KOps/s | |
test_empty[False] | 3.8441μs | 0.7359μs | 1.3590 MOps/s | 1.3498 MOps/s | |
test_to | 55.3710μs | 26.4093μs | 37.8654 KOps/s | 39.1832 KOps/s | |
test_to_nonblocking | 60.4610μs | 24.2927μs | 41.1646 KOps/s | 40.5563 KOps/s | |
test_unbind_speed | 1.3502ms | 0.2793ms | 3.5800 KOps/s | 3.4322 KOps/s | |
test_unbind_speed_stack0 | 0.4206ms | 0.2808ms | 3.5610 KOps/s | 3.5925 KOps/s | |
test_unbind_speed_stack1 | 91.4909ms | 0.7159ms | 1.3968 KOps/s | 1.4244 KOps/s | |
test_split | 94.1870ms | 2.2209ms | 450.2599 Ops/s | 444.3765 Ops/s | |
test_chunk | 94.2362ms | 2.2102ms | 452.4537 Ops/s | 440.5474 Ops/s | |
test_creation[device0] | 0.3472ms | 0.1279ms | 7.8198 KOps/s | 7.7716 KOps/s | |
test_creation_from_tensor | 0.3711ms | 0.1349ms | 7.4156 KOps/s | 7.6562 KOps/s | |
test_add_one[memmap_tensor0] | 0.1672ms | 8.6489μs | 115.6216 KOps/s | 103.9359 KOps/s | |
test_contiguous[memmap_tensor0] | 24.5210μs | 2.2164μs | 451.1879 KOps/s | 446.2074 KOps/s | |
test_stack[memmap_tensor0] | 35.4300μs | 6.8832μs | 145.2813 KOps/s | 144.7319 KOps/s | |
test_memmaptd_index | 1.1076ms | 0.4482ms | 2.2311 KOps/s | 2.2441 KOps/s | |
test_memmaptd_index_astensor | 0.7627ms | 0.5157ms | 1.9389 KOps/s | 1.9654 KOps/s | |
test_memmaptd_index_op | 1.5007ms | 1.0867ms | 920.2104 Ops/s | 970.9678 Ops/s | |
test_serialize_model | 0.1325s | 0.1310s | 7.6331 Ops/s | 7.6698 Ops/s | |
test_serialize_model_pickle | 1.3477s | 1.2126s | 0.8246 Ops/s | 0.8237 Ops/s | |
test_serialize_weights | 0.2212s | 0.1437s | 6.9593 Ops/s | 7.7270 Ops/s | |
test_serialize_weights_returnearly | 0.2134s | 55.6685ms | 17.9635 Ops/s | 16.0743 Ops/s | |
test_serialize_weights_pickle | 1.3739s | 1.2179s | 0.8211 Ops/s | 0.8211 Ops/s | |
test_reshape_pytree | 67.4020μs | 36.2227μs | 27.6070 KOps/s | 27.0232 KOps/s | |
test_reshape_td | 72.3620μs | 41.4690μs | 24.1144 KOps/s | 23.7502 KOps/s | |
test_view_pytree | 69.4020μs | 35.3752μs | 28.2684 KOps/s | 27.8543 KOps/s | |
test_view_td | 85.6420μs | 45.3069μs | 22.0717 KOps/s | 21.0734 KOps/s | |
test_unbind_pytree | 64.4210μs | 34.4974μs | 28.9877 KOps/s | 28.5639 KOps/s | |
test_unbind_td | 0.5412ms | 42.3107μs | 23.6347 KOps/s | 22.9543 KOps/s | |
test_split_pytree | 89.8110μs | 45.6109μs | 21.9246 KOps/s | 21.4859 KOps/s | |
test_split_td | 95.6492ms | 66.6819μs | 14.9966 KOps/s | 17.1924 KOps/s | |
test_add_pytree | 0.1026ms | 55.7444μs | 17.9390 KOps/s | 17.7010 KOps/s | |
test_add_td | 0.1438ms | 98.7911μs | 10.1224 KOps/s | 11.4198 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.3133ms | 0.1617ms | 6.1861 KOps/s | 4.5973 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.2828ms | 0.1596ms | 6.2639 KOps/s | 6.6061 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2114ms | 0.1444ms | 6.9238 KOps/s | 6.6250 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2420ms | 0.1818ms | 5.4999 KOps/s | 5.3214 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1233ms | 22.0884μs | 45.2726 KOps/s | 46.2987 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 82.6110μs | 48.7647μs | 20.5066 KOps/s | 22.6726 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.2246ms | 65.1696μs | 15.3446 KOps/s | 15.4837 KOps/s | |
test_compile_copy_nested[pytree-eager] | 80.3920μs | 49.5206μs | 20.1936 KOps/s | 20.2051 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.3621ms | 0.3195ms | 3.1296 KOps/s | 3.1225 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3254ms | 0.2337ms | 4.2789 KOps/s | 4.8000 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1791ms | 0.1280ms | 7.8114 KOps/s | 7.6835 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1152ms | 63.5670μs | 15.7314 KOps/s | 16.8396 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.4139ms | 0.3184ms | 3.1406 KOps/s | 3.1364 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6583ms | 0.6110ms | 1.6368 KOps/s | 1.5773 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.3535ms | 0.2809ms | 3.5599 KOps/s | 4.0144 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.4729ms | 0.3215ms | 3.1105 KOps/s | 3.0947 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1310ms | 74.1377μs | 13.4884 KOps/s | 14.4687 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1835ms | 0.1294ms | 7.7290 KOps/s | 7.5376 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6366ms | 0.5267ms | 1.8987 KOps/s | 1.8780 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3915ms | 0.3178ms | 3.1465 KOps/s | 3.1410 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1037ms | 19.1548μs | 52.2061 KOps/s | 55.8024 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 74.2110μs | 37.7709μs | 26.4754 KOps/s | 36.3842 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1159ms | 69.7753μs | 14.3317 KOps/s | 14.2740 KOps/s | |
test_compile_copy_flat[pytree-eager] | 88.6320μs | 51.3590μs | 19.4708 KOps/s | 19.2397 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.3584ms | 0.8303ms | 1.2043 KOps/s | 1.0861 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.3067ms | 3.2344ms | 309.1746 Ops/s | 294.3255 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.2944ms | 0.8142ms | 1.2282 KOps/s | 1.1108 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.3044ms | 3.1902ms | 313.4570 Ops/s | 298.2858 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1500ms | 0.1083ms | 9.2356 KOps/s | 8.7228 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.1905ms | 62.8071μs | 15.9218 KOps/s | 15.1936 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1832ms | 0.1027ms | 9.7371 KOps/s | 9.5832 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 85.2820μs | 44.8216μs | 22.3107 KOps/s | 22.5693 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1464ms | 0.1077ms | 9.2818 KOps/s | 9.4690 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 91.9420μs | 45.4462μs | 22.0041 KOps/s | 22.6847 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1856ms | 0.1370ms | 7.2993 KOps/s | 7.1428 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1513ms | 26.1657μs | 38.2179 KOps/s | 37.6465 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1777ms | 0.1310ms | 7.6341 KOps/s | 7.4976 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 50.0210μs | 20.9494μs | 47.7340 KOps/s | 45.9984 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1726ms | 0.1318ms | 7.5877 KOps/s | 7.4484 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 52.3710μs | 21.0618μs | 47.4793 KOps/s | 46.3085 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1794ms | 0.1375ms | 7.2730 KOps/s | 7.0845 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.4948ms | 25.6478μs | 38.9898 KOps/s | 37.7604 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2134ms | 0.1335ms | 7.4927 KOps/s | 7.4363 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 49.9710μs | 21.0072μs | 47.6028 KOps/s | 46.2095 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1675ms | 0.1313ms | 7.6179 KOps/s | 7.4466 KOps/s | |
test_compile_indexing[int-pytree-eager] | 49.3610μs | 20.8915μs | 47.8664 KOps/s | 46.2590 KOps/s | |
test_mod_add[eager] | 83.3310μs | 33.1822μs | 30.1367 KOps/s | 32.2527 KOps/s | |
test_mod_add[compile] | 0.1179ms | 70.5028μs | 14.1838 KOps/s | 13.6951 KOps/s | |
test_mod_add[compile-overhead] | 0.2651ms | 0.1350ms | 7.4101 KOps/s | 6.5678 KOps/s | |
test_mod_wrap[eager] | 0.8864ms | 0.7775ms | 1.2862 KOps/s | 1.2622 KOps/s | |
test_mod_wrap[compile] | 2.0354ms | 0.8310ms | 1.2033 KOps/s | 1.1883 KOps/s | |
test_mod_wrap[compile-overhead] | 4.9553ms | 3.0873ms | 323.9059 Ops/s | 322.5178 Ops/s | |
test_mod_wrap_and_backward[eager] | 4.5499ms | 4.0493ms | 246.9546 Ops/s | 239.9512 Ops/s | |
test_mod_wrap_and_backward[compile] | 4.3180ms | 4.0886ms | 244.5843 Ops/s | 241.1546 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3826ms | 0.9755ms | 1.0251 KOps/s | 978.4724 Ops/s | |
test_seq_add[eager] | 0.1380ms | 0.1007ms | 9.9342 KOps/s | 10.0315 KOps/s | |
test_seq_add[compile] | 0.4795ms | 82.3602μs | 12.1418 KOps/s | 12.0535 KOps/s | |
test_seq_add[compile-overhead] | 0.5560ms | 0.1142ms | 8.7549 KOps/s | 8.5697 KOps/s | |
test_seq_wrap[eager] | 1.3484ms | 0.9326ms | 1.0723 KOps/s | 1.0764 KOps/s | |
test_seq_wrap[compile] | 0.9711ms | 0.8546ms | 1.1701 KOps/s | 1.1605 KOps/s | |
test_seq_wrap[compile-overhead] | 0.6096ms | 0.2224ms | 4.4957 KOps/s | 4.4364 KOps/s | |
test_func_call_runtime[False-eager] | 2.7758ms | 2.3541ms | 424.7984 Ops/s | 413.1138 Ops/s | |
test_func_call_runtime[False-compile] | 2.8150ms | 2.3735ms | 421.3265 Ops/s | 413.5533 Ops/s | |
test_func_call_runtime[False-compile-overhead] | 0.7599ms | 0.3610ms | 2.7700 KOps/s | 2.7188 KOps/s | |
test_func_call_runtime[True-eager] | 2.9207ms | 2.5119ms | 398.1100 Ops/s | 389.5727 Ops/s | |
test_func_call_runtime[True-compile] | 3.1641ms | 2.3883ms | 418.7025 Ops/s | 411.2629 Ops/s | |
test_func_call_runtime[True-compile-overhead] | 0.4331ms | 0.3831ms | 2.6100 KOps/s | 2.5990 KOps/s | |
test_func_call_cm_runtime[False-eager] | 2.7606ms | 2.3350ms | 428.2624 Ops/s | 414.9328 Ops/s | |
test_func_call_cm_runtime[False-compile] | 2.7717ms | 2.3863ms | 419.0569 Ops/s | 413.4612 Ops/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.4118ms | 0.3643ms | 2.7449 KOps/s | 2.7451 KOps/s | |
test_func_call_cm_runtime[True-eager] | 3.0048ms | 2.6192ms | 381.8023 Ops/s | 373.0363 Ops/s | |
test_func_call_cm_runtime[True-compile] | 2.8320ms | 2.4250ms | 412.3788 Ops/s | 405.7524 Ops/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.5519ms | 0.4079ms | 2.4514 KOps/s | 2.4196 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 4.2439ms | 3.7594ms | 266.0017 Ops/s | 263.3169 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 2.6707ms | 2.4532ms | 407.6304 Ops/s | 404.8802 Ops/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5023ms | 0.4115ms | 2.4301 KOps/s | 2.3937 KOps/s | |
test_distributed | 3.3686ms | 0.1811ms | 5.5214 KOps/s | 8.3881 KOps/s | |
test_tdmodule | 0.2895ms | 16.7653μs | 59.6468 KOps/s | 69.5160 KOps/s | |
test_tdmodule_dispatch | 61.2010μs | 32.2316μs | 31.0254 KOps/s | 35.7074 KOps/s | |
test_tdseq | 26.5310μs | 17.4486μs | 57.3112 KOps/s | 65.3221 KOps/s | |
test_tdseq_dispatch | 58.4610μs | 35.0457μs | 28.5342 KOps/s | 32.5858 KOps/s | |
test_instantiation_functorch | 2.0232ms | 1.8657ms | 535.9858 Ops/s | 521.9969 Ops/s | |
test_instantiation_td | 1.8124ms | 1.2044ms | 830.3211 Ops/s | 814.7713 Ops/s | |
test_exec_functorch | 1.0384ms | 0.9787ms | 1.0218 KOps/s | 1.0085 KOps/s | |
test_exec_functional_call | 1.2243ms | 1.0018ms | 998.2288 Ops/s | 1.0032 KOps/s | |
test_exec_td | 1.1640ms | 1.0199ms | 980.4649 Ops/s | 976.1708 Ops/s | |
test_exec_td_decorator | 1.7097ms | 1.0447ms | 957.1850 Ops/s | 937.6960 Ops/s | |
test_vmap_mlp_speed[True-True] | 1.6668ms | 1.2800ms | 781.2582 Ops/s | 793.0807 Ops/s | |
test_vmap_mlp_speed[True-False] | 1.6573ms | 1.2764ms | 783.4237 Ops/s | 796.2151 Ops/s | |
test_vmap_mlp_speed[False-True] | 1.5374ms | 1.1658ms | 857.7574 Ops/s | 873.1531 Ops/s | |
test_vmap_mlp_speed[False-False] | 1.5713ms | 1.1678ms | 856.3278 Ops/s | 876.2019 Ops/s | |
test_vmap_mlp_speed_decorator[True-True] | 2.0079ms | 1.2480ms | 801.2851 Ops/s | 805.6878 Ops/s | |
test_vmap_mlp_speed_decorator[True-False] | 1.6461ms | 1.2514ms | 799.1183 Ops/s | 804.6264 Ops/s | |
test_vmap_mlp_speed_decorator[False-True] | 1.5467ms | 1.1647ms | 858.5989 Ops/s | 863.0946 Ops/s | |
test_vmap_mlp_speed_decorator[False-False] | 1.5597ms | 1.1651ms | 858.2823 Ops/s | 864.3536 Ops/s | |
test_vmap_transformer_speed[True-True] | 13.4664ms | 13.1140ms | 76.2545 Ops/s | 76.2519 Ops/s | |
test_vmap_transformer_speed[True-False] | 13.5502ms | 13.1188ms | 76.2264 Ops/s | 76.2210 Ops/s | |
test_vmap_transformer_speed[False-True] | 13.2576ms | 12.8992ms | 77.5244 Ops/s | 77.6295 Ops/s | |
test_vmap_transformer_speed[False-False] | 13.4001ms | 12.9372ms | 77.2964 Ops/s | 77.8225 Ops/s | |
test_vmap_transformer_speed_decorator[True-True] | 34.3837ms | 33.8453ms | 29.5462 Ops/s | 29.4542 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 34.0018ms | 33.7567ms | 29.6237 Ops/s | 29.4860 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 34.1620ms | 33.6649ms | 29.7046 Ops/s | 29.5775 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 33.9425ms | 33.6803ms | 29.6909 Ops/s | 29.5524 Ops/s | |
test_to_module_speed[True] | 1.5207ms | 1.0005ms | 999.4692 Ops/s | 1.0562 KOps/s | |
test_to_module_speed[False] | 1.3625ms | 0.9659ms | 1.0353 KOps/s | 1.0848 KOps/s | |
test_tc_init | 72.7820μs | 36.3091μs | 27.5413 KOps/s | 30.4177 KOps/s | |
test_tc_init_nested | 0.4619ms | 73.9677μs | 13.5194 KOps/s | 14.4449 KOps/s | |
test_tc_first_layer_tensor | 54.1596μs | 0.6697μs | 1.4932 MOps/s | 1.5036 MOps/s | |
test_tc_first_layer_nontensor | 28.9200μs | 2.2379μs | 446.8575 KOps/s | 455.2834 KOps/s | |
test_tc_second_layer_tensor | 95.7270μs | 1.3614μs | 734.5636 KOps/s | 737.3980 KOps/s | |
test_tc_second_layer_nontensor | 0.1362ms | 2.9186μs | 342.6275 KOps/s | 344.3621 KOps/s | |
test_unbind | 0.1975s | 12.1025ms | 82.6277 Ops/s | 92.0460 Ops/s | |
test_full_like | 0.9463ms | 0.5735ms | 1.7437 KOps/s | 1.7471 KOps/s | |
test_zeros_like | 0.2582ms | 0.1979ms | 5.0539 KOps/s | 5.0555 KOps/s | |
test_ones_like | 0.5241ms | 0.1978ms | 5.0547 KOps/s | 5.0585 KOps/s | |
test_clone | 0.7773ms | 0.4139ms | 2.4162 KOps/s | 2.4166 KOps/s | |
test_squeeze | 34.4210μs | 9.9865μs | 100.1349 KOps/s | 101.1337 KOps/s | |
test_unsqueeze | 0.2893ms | 76.1952μs | 13.1242 KOps/s | 13.2861 KOps/s | |
test_split | 0.5291ms | 0.1612ms | 6.2030 KOps/s | 6.1978 KOps/s | |
test_permute | 0.5698ms | 0.1797ms | 5.5633 KOps/s | 5.4873 KOps/s | |
test_stack | 1.2557ms | 0.8583ms | 1.1650 KOps/s | 1.1482 KOps/s | |
test_cat | 1.4038ms | 1.2315ms | 812.0058 Ops/s | 812.0496 Ops/s |
tensordict/nn/cudagraphs.py
Outdated
@@ -222,6 +222,8 @@ def _call( | |||
return result | |||
|
|||
if not self._has_cuda or self.counter < self._warmup - 1: | |||
# We must clone the data because providing non-contiguous data will fail later when we clone | |||
tensordict = self._tensordict = tensordict.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this statement, we are making a clone of tensordict
and assigning it to tensordict
? (ignoring the assignment to self._tensordict
for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
I'm doing that bc otherwise you could have views in your tensordict and compile with inputs that are views. Then when you cudagraph you clone, but then it's not a view anymore! So compile will recompile and the whole warmup will be useless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh interesting! Thank you for the explanation.
No description provided.