Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quality] Fewer recompiles with tensordict #1015

Merged
merged 7 commits into from
Oct 3, 2024
Merged

[Quality] Fewer recompiles with tensordict #1015

merged 7 commits into from
Oct 3, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Sep 30, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024
Copy link

github-actions bot commented Sep 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}72$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 52.5280μs 25.4238μs 39.3332 KOps/s 50.6356 KOps/s $\textbf{\color{#d91a1a}-22.32\%}$
test_plain_set_stack_nested 54.2810μs 25.5983μs 39.0652 KOps/s 51.0738 KOps/s $\textbf{\color{#d91a1a}-23.51\%}$
test_plain_set_nested_inplace 87.6840μs 27.8584μs 35.8958 KOps/s 46.4838 KOps/s $\textbf{\color{#d91a1a}-22.78\%}$
test_plain_set_stack_nested_inplace 67.7560μs 27.8675μs 35.8841 KOps/s 46.7577 KOps/s $\textbf{\color{#d91a1a}-23.26\%}$
test_items 22.8430μs 4.2284μs 236.4967 KOps/s 237.9640 KOps/s $\color{#d91a1a}-0.62\%$
test_items_nested 1.0146ms 0.3888ms 2.5718 KOps/s 2.7244 KOps/s $\textbf{\color{#d91a1a}-5.60\%}$
test_items_nested_locked 0.7068ms 0.3880ms 2.5775 KOps/s 2.7401 KOps/s $\textbf{\color{#d91a1a}-5.93\%}$
test_items_nested_leaf 0.1574ms 80.9068μs 12.3599 KOps/s 14.6150 KOps/s $\textbf{\color{#d91a1a}-15.43\%}$
test_items_stack_nested 0.6043ms 0.3916ms 2.5536 KOps/s 2.6864 KOps/s $\color{#d91a1a}-4.94\%$
test_items_stack_nested_leaf 0.2059ms 81.9896μs 12.1967 KOps/s 14.2572 KOps/s $\textbf{\color{#d91a1a}-14.45\%}$
test_items_stack_nested_locked 0.8242ms 0.3897ms 2.5663 KOps/s 2.7190 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_keys 17.1620μs 3.5241μs 283.7600 KOps/s 283.2206 KOps/s $\color{#35bf28}+0.19\%$
test_keys_nested 0.2509ms 0.1374ms 7.2805 KOps/s 10.0541 KOps/s $\textbf{\color{#d91a1a}-27.59\%}$
test_keys_nested_locked 0.7868ms 0.1427ms 7.0057 KOps/s 9.5525 KOps/s $\textbf{\color{#d91a1a}-26.66\%}$
test_keys_nested_leaf 0.2207ms 0.1197ms 8.3540 KOps/s 12.0936 KOps/s $\textbf{\color{#d91a1a}-30.92\%}$
test_keys_stack_nested 0.2436ms 0.1375ms 7.2723 KOps/s 10.0583 KOps/s $\textbf{\color{#d91a1a}-27.70\%}$
test_keys_stack_nested_leaf 0.2105ms 0.1192ms 8.3889 KOps/s 11.8403 KOps/s $\textbf{\color{#d91a1a}-29.15\%}$
test_keys_stack_nested_locked 0.2999ms 0.1416ms 7.0615 KOps/s 9.5520 KOps/s $\textbf{\color{#d91a1a}-26.07\%}$
test_values 14.8156μs 1.0374μs 963.9754 KOps/s 948.8590 KOps/s $\color{#35bf28}+1.59\%$
test_values_nested 0.1812ms 93.3626μs 10.7109 KOps/s 13.1928 KOps/s $\textbf{\color{#d91a1a}-18.81\%}$
test_values_nested_locked 0.1568ms 91.9285μs 10.8780 KOps/s 13.1873 KOps/s $\textbf{\color{#d91a1a}-17.51\%}$
test_values_nested_leaf 0.1452ms 79.2201μs 12.6231 KOps/s 15.9650 KOps/s $\textbf{\color{#d91a1a}-20.93\%}$
test_values_stack_nested 0.1936ms 92.2655μs 10.8383 KOps/s 13.0281 KOps/s $\textbf{\color{#d91a1a}-16.81\%}$
test_values_stack_nested_leaf 0.1482ms 78.9214μs 12.6708 KOps/s 16.0242 KOps/s $\textbf{\color{#d91a1a}-20.93\%}$
test_values_stack_nested_locked 0.1771ms 93.2846μs 10.7199 KOps/s 12.7471 KOps/s $\textbf{\color{#d91a1a}-15.90\%}$
test_membership 6.6439μs 0.7568μs 1.3213 MOps/s 1.3491 MOps/s $\color{#d91a1a}-2.06\%$
test_membership_nested 22.8120μs 2.7387μs 365.1401 KOps/s 348.2162 KOps/s $\color{#35bf28}+4.86\%$
test_membership_nested_leaf 37.5100μs 2.7510μs 363.4992 KOps/s 357.5259 KOps/s $\color{#35bf28}+1.67\%$
test_membership_stacked_nested 25.3270μs 2.7637μs 361.8325 KOps/s 361.9061 KOps/s $\color{#d91a1a}-0.02\%$
test_membership_stacked_nested_leaf 33.3830μs 2.7464μs 364.1086 KOps/s 368.2879 KOps/s $\color{#d91a1a}-1.13\%$
test_membership_nested_last 45.4150μs 4.1904μs 238.6422 KOps/s 250.0458 KOps/s $\color{#d91a1a}-4.56\%$
test_membership_nested_leaf_last 45.4340μs 4.1551μs 240.6652 KOps/s 247.6444 KOps/s $\color{#d91a1a}-2.82\%$
test_membership_stacked_nested_last 25.3170μs 4.1902μs 238.6539 KOps/s 250.4574 KOps/s $\color{#d91a1a}-4.71\%$
test_membership_stacked_nested_leaf_last 25.6780μs 4.1752μs 239.5083 KOps/s 248.6750 KOps/s $\color{#d91a1a}-3.69\%$
test_nested_getleaf 78.5870μs 10.4906μs 95.3235 KOps/s 93.5569 KOps/s $\color{#35bf28}+1.89\%$
test_nested_get 44.5530μs 10.2157μs 97.8883 KOps/s 97.7203 KOps/s $\color{#35bf28}+0.17\%$
test_stacked_getleaf 49.8430μs 10.8574μs 92.1032 KOps/s 93.5413 KOps/s $\color{#d91a1a}-1.54\%$
test_stacked_get 36.0570μs 10.2756μs 97.3179 KOps/s 99.3885 KOps/s $\color{#d91a1a}-2.08\%$
test_nested_getitemleaf 57.9690μs 11.1672μs 89.5478 KOps/s 90.8757 KOps/s $\color{#d91a1a}-1.46\%$
test_nested_getitem 49.2820μs 10.4821μs 95.4008 KOps/s 97.5004 KOps/s $\color{#d91a1a}-2.15\%$
test_stacked_getitemleaf 36.4280μs 10.9320μs 91.4749 KOps/s 90.8016 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_getitem 55.2030μs 10.5930μs 94.4024 KOps/s 97.6922 KOps/s $\color{#d91a1a}-3.37\%$
test_lock_nested 95.8604ms 0.6257ms 1.5981 KOps/s 2.0043 KOps/s $\textbf{\color{#d91a1a}-20.27\%}$
test_lock_stack_nested 0.5751ms 0.4864ms 2.0561 KOps/s 2.1251 KOps/s $\color{#d91a1a}-3.25\%$
test_unlock_nested 92.3215ms 0.5208ms 1.9202 KOps/s 2.3966 KOps/s $\textbf{\color{#d91a1a}-19.88\%}$
test_unlock_stack_nested 0.5040ms 0.3991ms 2.5056 KOps/s 2.5641 KOps/s $\color{#d91a1a}-2.28\%$
test_flatten_speed 0.2388ms 0.1022ms 9.7851 KOps/s 11.5466 KOps/s $\textbf{\color{#d91a1a}-15.26\%}$
test_unflatten_speed 0.7438ms 0.5180ms 1.9304 KOps/s 2.1776 KOps/s $\textbf{\color{#d91a1a}-11.35\%}$
test_common_ops 4.8549ms 1.1665ms 857.2712 Ops/s 873.0206 Ops/s $\color{#d91a1a}-1.80\%$
test_creation 32.8320μs 2.0640μs 484.4865 KOps/s 459.2159 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_creation_empty 57.3570μs 19.0964μs 52.3660 KOps/s 57.6140 KOps/s $\textbf{\color{#d91a1a}-9.11\%}$
test_creation_nested_1 62.9380μs 22.4894μs 44.4654 KOps/s 47.7511 KOps/s $\textbf{\color{#d91a1a}-6.88\%}$
test_creation_nested_2 89.4670μs 27.0279μs 36.9989 KOps/s 39.0885 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_clone 0.2677ms 17.2934μs 57.8257 KOps/s 57.8280 KOps/s $-0.00\%$
test_getitem[int] 1.3032ms 17.2590μs 57.9409 KOps/s 58.6821 KOps/s $\color{#d91a1a}-1.26\%$
test_getitem[slice_int] 0.1575ms 31.2009μs 32.0503 KOps/s 31.7674 KOps/s $\color{#35bf28}+0.89\%$
test_getitem[range] 0.1930ms 60.0571μs 16.6508 KOps/s 16.9484 KOps/s $\color{#d91a1a}-1.76\%$
test_getitem[tuple] 0.2832ms 27.3277μs 36.5930 KOps/s 38.4966 KOps/s $\color{#d91a1a}-4.95\%$
test_getitem[list] 0.6674ms 57.6022μs 17.3604 KOps/s 18.4638 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_setitem_dim[int] 61.2740μs 33.6754μs 29.6953 KOps/s 30.0420 KOps/s $\color{#d91a1a}-1.15\%$
test_setitem_dim[slice_int] 0.1026ms 62.7045μs 15.9478 KOps/s 16.0556 KOps/s $\color{#d91a1a}-0.67\%$
test_setitem_dim[range] 0.1268ms 84.6054μs 11.8196 KOps/s 11.6799 KOps/s $\color{#35bf28}+1.20\%$
test_setitem_dim[tuple] 0.1277ms 52.0565μs 19.2099 KOps/s 20.0584 KOps/s $\color{#d91a1a}-4.23\%$
test_setitem 0.3421ms 31.5062μs 31.7398 KOps/s 34.1071 KOps/s $\textbf{\color{#d91a1a}-6.94\%}$
test_set 0.3186ms 30.7136μs 32.5588 KOps/s 34.7369 KOps/s $\textbf{\color{#d91a1a}-6.27\%}$
test_set_shared 3.1929ms 0.2199ms 4.5468 KOps/s 4.6999 KOps/s $\color{#d91a1a}-3.26\%$
test_update 0.1912ms 38.3671μs 26.0640 KOps/s 28.1376 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_update_nested 0.4018ms 50.2577μs 19.8974 KOps/s 21.7803 KOps/s $\textbf{\color{#d91a1a}-8.64\%}$
test_update__nested 0.3299ms 38.2712μs 26.1293 KOps/s 26.9184 KOps/s $\color{#d91a1a}-2.93\%$
test_set_nested 78.5170μs 32.8756μs 30.4177 KOps/s 31.6184 KOps/s $\color{#d91a1a}-3.80\%$
test_set_nested_new 0.1018ms 38.8190μs 25.7606 KOps/s 27.5345 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_select 0.1153ms 56.2961μs 17.7632 KOps/s 18.6830 KOps/s $\color{#d91a1a}-4.92\%$
test_select_nested 0.1201ms 59.8743μs 16.7017 KOps/s 16.7014 KOps/s $+0.00\%$
test_exclude_nested 0.1493ms 75.7426μs 13.2026 KOps/s 13.3628 KOps/s $\color{#d91a1a}-1.20\%$
test_empty[True] 0.7206ms 0.3593ms 2.7834 KOps/s 3.1811 KOps/s $\textbf{\color{#d91a1a}-12.50\%}$
test_empty[False] 11.3765μs 1.3322μs 750.6229 KOps/s 794.1036 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_unbind_speed 0.4790ms 0.3120ms 3.2054 KOps/s 3.2850 KOps/s $\color{#d91a1a}-2.42\%$
test_unbind_speed_stack0 0.7142ms 0.3044ms 3.2852 KOps/s 3.2957 KOps/s $\color{#d91a1a}-0.32\%$
test_unbind_speed_stack1 96.1271ms 0.8500ms 1.1764 KOps/s 1.3267 KOps/s $\textbf{\color{#d91a1a}-11.33\%}$
test_split 3.2148ms 2.0352ms 491.3490 Ops/s 451.7202 Ops/s $\textbf{\color{#35bf28}+8.77\%}$
test_chunk 0.1045s 2.2313ms 448.1766 Ops/s 445.4621 Ops/s $\color{#35bf28}+0.61\%$
test_creation[device0] 0.2749ms 0.1170ms 8.5469 KOps/s 8.3983 KOps/s $\color{#35bf28}+1.77\%$
test_creation_from_tensor 3.7066ms 0.1207ms 8.2820 KOps/s 8.6276 KOps/s $\color{#d91a1a}-4.01\%$
test_add_one[memmap_tensor0] 0.1195ms 7.6861μs 130.1050 KOps/s 136.4302 KOps/s $\color{#d91a1a}-4.64\%$
test_contiguous[memmap_tensor0] 21.1590μs 1.8854μs 530.3995 KOps/s 524.9079 KOps/s $\color{#35bf28}+1.05\%$
test_stack[memmap_tensor0] 0.1138ms 5.7498μs 173.9196 KOps/s 177.6803 KOps/s $\color{#d91a1a}-2.12\%$
test_memmaptd_index 1.2492ms 0.4184ms 2.3902 KOps/s 2.4647 KOps/s $\color{#d91a1a}-3.02\%$
test_memmaptd_index_astensor 98.6600ms 0.5763ms 1.7352 KOps/s 2.0359 KOps/s $\textbf{\color{#d91a1a}-14.77\%}$
test_memmaptd_index_op 1.8819ms 1.0908ms 916.7726 Ops/s 983.6657 Ops/s $\textbf{\color{#d91a1a}-6.80\%}$
test_serialize_model 0.1303s 0.1203s 8.3130 Ops/s 8.2267 Ops/s $\color{#35bf28}+1.05\%$
test_serialize_model_pickle 0.4746s 0.3983s 2.5105 Ops/s 2.5050 Ops/s $\color{#35bf28}+0.22\%$
test_serialize_weights 0.1252s 0.1177s 8.4927 Ops/s 7.7563 Ops/s $\textbf{\color{#35bf28}+9.49\%}$
test_serialize_weights_returnearly 0.1735s 0.1613s 6.2004 Ops/s 6.3441 Ops/s $\color{#d91a1a}-2.27\%$
test_serialize_weights_pickle 0.5297s 0.4389s 2.2783 Ops/s 2.5403 Ops/s $\textbf{\color{#d91a1a}-10.32\%}$
test_serialize_weights_filesystem 0.1458s 0.1417s 7.0583 Ops/s 7.0527 Ops/s $\color{#35bf28}+0.08\%$
test_serialize_model_filesystem 0.1704s 0.1514s 6.6056 Ops/s 6.2358 Ops/s $\textbf{\color{#35bf28}+5.93\%}$
test_reshape_pytree 91.5410μs 40.2051μs 24.8725 KOps/s 25.7875 KOps/s $\color{#d91a1a}-3.55\%$
test_reshape_td 0.1101ms 46.3588μs 21.5709 KOps/s 21.3064 KOps/s $\color{#35bf28}+1.24\%$
test_view_pytree 87.5140μs 40.0018μs 24.9989 KOps/s 25.9618 KOps/s $\color{#d91a1a}-3.71\%$
test_view_td 0.1306ms 53.3409μs 18.7473 KOps/s 19.2487 KOps/s $\color{#d91a1a}-2.60\%$
test_unbind_pytree 91.7720μs 37.9932μs 26.3205 KOps/s 27.5309 KOps/s $\color{#d91a1a}-4.40\%$
test_unbind_td 0.3216ms 45.7928μs 21.8375 KOps/s 22.1872 KOps/s $\color{#d91a1a}-1.58\%$
test_split_pytree 82.1640μs 38.7472μs 25.8083 KOps/s 26.6452 KOps/s $\color{#d91a1a}-3.14\%$
test_split_td 0.2033ms 58.6899μs 17.0387 KOps/s 17.1213 KOps/s $\color{#d91a1a}-0.48\%$
test_add_pytree 96.7310μs 46.8397μs 21.3494 KOps/s 22.4622 KOps/s $\color{#d91a1a}-4.95\%$
test_add_td 0.1658ms 89.2628μs 11.2029 KOps/s 12.2443 KOps/s $\textbf{\color{#d91a1a}-8.51\%}$
test_compile_add_one_nested[tensordict-compile] 0.1542ms 59.7138μs 16.7465 KOps/s 17.3623 KOps/s $\color{#d91a1a}-3.55\%$
test_compile_add_one_nested[tensordict-eager] 0.3962ms 0.1997ms 5.0081 KOps/s 5.6986 KOps/s $\textbf{\color{#d91a1a}-12.12\%}$
test_compile_add_one_nested[pytree-compile] 0.1846ms 57.8018μs 17.3005 KOps/s 17.8440 KOps/s $\color{#d91a1a}-3.05\%$
test_compile_add_one_nested[pytree-eager] 0.2968ms 0.1438ms 6.9530 KOps/s 6.9121 KOps/s $\color{#35bf28}+0.59\%$
test_compile_copy_nested[tensordict-compile] 65.5930μs 24.0129μs 41.6444 KOps/s 47.1960 KOps/s $\textbf{\color{#d91a1a}-11.76\%}$
test_compile_copy_nested[tensordict-eager] 0.1656ms 74.6927μs 13.3882 KOps/s 15.1157 KOps/s $\textbf{\color{#d91a1a}-11.43\%}$
test_compile_copy_nested[pytree-compile] 0.1968ms 76.7510μs 13.0292 KOps/s 13.4275 KOps/s $\color{#d91a1a}-2.97\%$
test_compile_copy_nested[pytree-eager] 0.1293ms 69.9979μs 14.2861 KOps/s 14.8034 KOps/s $\color{#d91a1a}-3.49\%$
test_compile_add_one_flat[tensordict-compile] 0.3870ms 0.1834ms 5.4532 KOps/s 5.8249 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_compile_add_one_flat[tensordict-eager] 0.4920ms 0.2427ms 4.1199 KOps/s 5.2552 KOps/s $\textbf{\color{#d91a1a}-21.60\%}$
test_compile_add_one_flat[tensorclass-compile] 0.1246ms 49.9236μs 20.0306 KOps/s 21.5314 KOps/s $\textbf{\color{#d91a1a}-6.97\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1960ms 79.1837μs 12.6289 KOps/s 14.0782 KOps/s $\textbf{\color{#d91a1a}-10.30\%}$
test_compile_add_one_flat[pytree-compile] 0.4037ms 0.1796ms 5.5685 KOps/s 5.6730 KOps/s $\color{#d91a1a}-1.84\%$
test_compile_add_one_flat[pytree-eager] 0.4822ms 0.2931ms 3.4113 KOps/s 3.4246 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_add_self_flat[tensordict-eager] 0.5525ms 0.2802ms 3.5684 KOps/s 4.8574 KOps/s $\textbf{\color{#d91a1a}-26.54\%}$
test_compile_add_self_flat[tensordict-compile] 0.6155ms 0.1835ms 5.4491 KOps/s 5.8192 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1731ms 75.0865μs 13.3180 KOps/s 15.9968 KOps/s $\textbf{\color{#d91a1a}-16.75\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1180ms 49.1152μs 20.3603 KOps/s 20.9002 KOps/s $\color{#d91a1a}-2.58\%$
test_compile_add_self_flat[pytree-eager] 0.5159ms 0.2333ms 4.2860 KOps/s 4.2307 KOps/s $\color{#35bf28}+1.31\%$
test_compile_add_self_flat[pytree-compile] 0.2816ms 0.1732ms 5.7720 KOps/s 5.7713 KOps/s $\color{#35bf28}+0.01\%$
test_compile_copy_flat[tensordict-compile] 0.1972ms 0.1102ms 9.0714 KOps/s 9.7118 KOps/s $\textbf{\color{#d91a1a}-6.59\%}$
test_compile_copy_flat[tensordict-eager] 0.1552ms 79.0567μs 12.6491 KOps/s 17.2780 KOps/s $\textbf{\color{#d91a1a}-26.79\%}$
test_compile_copy_flat[pytree-compile] 0.1599ms 79.7467μs 12.5397 KOps/s 13.0808 KOps/s $\color{#d91a1a}-4.14\%$
test_compile_copy_flat[pytree-eager] 0.1281ms 70.4465μs 14.1952 KOps/s 14.4446 KOps/s $\color{#d91a1a}-1.73\%$
test_compile_assign_and_add[tensordict-compile] 0.2905ms 0.1925ms 5.1940 KOps/s 5.1037 KOps/s $\color{#35bf28}+1.77\%$
test_compile_assign_and_add[tensordict-eager] 2.3380ms 1.7530ms 570.4348 Ops/s 608.9841 Ops/s $\textbf{\color{#d91a1a}-6.33\%}$
test_compile_assign_and_add[pytree-compile] 0.3001ms 0.1913ms 5.2286 KOps/s 5.2751 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_assign_and_add[pytree-eager] 1.3755ms 1.0973ms 911.3689 Ops/s 898.9086 Ops/s $\color{#35bf28}+1.39\%$
test_compile_assign_and_add_stack[compile] 0.8140ms 0.4165ms 2.4010 KOps/s 2.3875 KOps/s $\color{#35bf28}+0.57\%$
test_compile_assign_and_add_stack[eager] 4.4548ms 4.2184ms 237.0559 Ops/s 272.4028 Ops/s $\textbf{\color{#d91a1a}-12.98\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1075ms 34.1583μs 29.2755 KOps/s 29.4513 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_indexing[tensor-tensordict-eager] 1.1431ms 49.0107μs 20.4037 KOps/s 20.7049 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_indexing[tensor-tensorclass-compile] 78.3270μs 30.5901μs 32.6903 KOps/s 33.9611 KOps/s $\color{#d91a1a}-3.74\%$
test_compile_indexing[tensor-tensorclass-eager] 89.3140μs 30.1788μs 33.1359 KOps/s 34.5231 KOps/s $\color{#d91a1a}-4.02\%$
test_compile_indexing[tensor-pytree-compile] 84.0980μs 30.1528μs 33.1644 KOps/s 34.2421 KOps/s $\color{#d91a1a}-3.15\%$
test_compile_indexing[tensor-pytree-eager] 0.1375ms 30.8045μs 32.4628 KOps/s 35.1459 KOps/s $\textbf{\color{#d91a1a}-7.63\%}$
test_compile_indexing[slice-tensordict-compile] 0.1407ms 74.7249μs 13.3824 KOps/s 13.3555 KOps/s $\color{#35bf28}+0.20\%$
test_compile_indexing[slice-tensordict-eager] 0.4238ms 29.4752μs 33.9268 KOps/s 35.2844 KOps/s $\color{#d91a1a}-3.85\%$
test_compile_indexing[slice-tensorclass-compile] 0.1375ms 68.6670μs 14.5630 KOps/s 14.5170 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[slice-tensorclass-eager] 65.0520μs 24.9948μs 40.0083 KOps/s 42.4864 KOps/s $\textbf{\color{#d91a1a}-5.83\%}$
test_compile_indexing[slice-pytree-compile] 0.1324ms 68.2543μs 14.6511 KOps/s 14.6519 KOps/s $-0.01\%$
test_compile_indexing[slice-pytree-eager] 80.7210μs 24.2395μs 41.2550 KOps/s 42.5806 KOps/s $\color{#d91a1a}-3.11\%$
test_compile_indexing[int-tensordict-compile] 0.2010ms 74.3842μs 13.4437 KOps/s 13.4987 KOps/s $\color{#d91a1a}-0.41\%$
test_compile_indexing[int-tensordict-eager] 1.2181ms 28.4105μs 35.1982 KOps/s 35.6141 KOps/s $\color{#d91a1a}-1.17\%$
test_compile_indexing[int-tensorclass-compile] 0.1284ms 68.3769μs 14.6248 KOps/s 14.8663 KOps/s $\color{#d91a1a}-1.62\%$
test_compile_indexing[int-tensorclass-eager] 89.5810μs 24.2711μs 41.2013 KOps/s 42.5452 KOps/s $\color{#d91a1a}-3.16\%$
test_compile_indexing[int-pytree-compile] 0.7682ms 70.6882μs 14.1466 KOps/s 14.9674 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_compile_indexing[int-pytree-eager] 73.4470μs 24.3050μs 41.1438 KOps/s 43.1024 KOps/s $\color{#d91a1a}-4.54\%$
test_mod_add[eager] 67.6860μs 26.5746μs 37.6299 KOps/s 38.9380 KOps/s $\color{#d91a1a}-3.36\%$
test_mod_add[compile] 0.1275ms 38.6500μs 25.8732 KOps/s 25.9469 KOps/s $\color{#d91a1a}-0.28\%$
test_mod_add[compile-overhead] 0.1103ms 38.0750μs 26.2640 KOps/s 25.9125 KOps/s $\color{#35bf28}+1.36\%$
test_mod_wrap[eager] 0.3974ms 0.2129ms 4.6981 KOps/s 4.8251 KOps/s $\color{#d91a1a}-2.63\%$
test_mod_wrap[compile] 0.4515ms 0.2354ms 4.2487 KOps/s 4.2652 KOps/s $\color{#d91a1a}-0.39\%$
test_mod_wrap[compile-overhead] 1.2495ms 0.2477ms 4.0376 KOps/s 4.3002 KOps/s $\textbf{\color{#d91a1a}-6.11\%}$
test_mod_wrap_and_backward[eager] 12.3820ms 10.6665ms 93.7514 Ops/s 91.3536 Ops/s $\color{#35bf28}+2.62\%$
test_mod_wrap_and_backward[compile] 13.9107ms 10.7753ms 92.8049 Ops/s 85.5322 Ops/s $\textbf{\color{#35bf28}+8.50\%}$
test_mod_wrap_and_backward[compile-overhead] 12.4006ms 10.8340ms 92.3024 Ops/s 81.5114 Ops/s $\textbf{\color{#35bf28}+13.24\%}$
test_seq_add[eager] 0.2246ms 94.5870μs 10.5723 KOps/s 11.0526 KOps/s $\color{#d91a1a}-4.35\%$
test_seq_add[compile] 0.1444ms 64.3748μs 15.5340 KOps/s 15.5378 KOps/s $\color{#d91a1a}-0.02\%$
test_seq_add[compile-overhead] 0.1605ms 64.4678μs 15.5116 KOps/s 15.9670 KOps/s $\color{#d91a1a}-2.85\%$
test_seq_wrap[eager] 0.6115ms 0.3970ms 2.5186 KOps/s 2.5936 KOps/s $\color{#d91a1a}-2.89\%$
test_seq_wrap[compile] 1.2852ms 0.2725ms 3.6692 KOps/s 3.6500 KOps/s $\color{#35bf28}+0.53\%$
test_seq_wrap[compile-overhead] 1.2171ms 0.2712ms 3.6874 KOps/s 3.3649 KOps/s $\textbf{\color{#35bf28}+9.58\%}$
test_func_call_runtime[False-eager] 0.9074ms 0.5292ms 1.8898 KOps/s 1.8455 KOps/s $\color{#35bf28}+2.40\%$
test_func_call_runtime[False-compile] 0.9351ms 0.5074ms 1.9710 KOps/s 1.9630 KOps/s $\color{#35bf28}+0.41\%$
test_func_call_runtime[False-compile-overhead] 0.8807ms 0.5054ms 1.9786 KOps/s 1.9653 KOps/s $\color{#35bf28}+0.68\%$
test_func_call_runtime[True-eager] 1.2326ms 0.7630ms 1.3106 KOps/s 1.3123 KOps/s $\color{#d91a1a}-0.12\%$
test_func_call_runtime[True-compile] 0.7451ms 0.5153ms 1.9408 KOps/s 1.9162 KOps/s $\color{#35bf28}+1.28\%$
test_func_call_runtime[True-compile-overhead] 0.8720ms 0.5157ms 1.9392 KOps/s 1.9115 KOps/s $\color{#35bf28}+1.45\%$
test_func_call_cm_runtime[False-eager] 0.7117ms 0.5280ms 1.8938 KOps/s 1.8673 KOps/s $\color{#35bf28}+1.42\%$
test_func_call_cm_runtime[False-compile] 0.6362ms 0.5055ms 1.9781 KOps/s 1.9569 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6367ms 0.5055ms 1.9782 KOps/s 1.9593 KOps/s $\color{#35bf28}+0.97\%$
test_func_call_cm_runtime[True-eager] 1.2548ms 0.9095ms 1.0995 KOps/s 1.1198 KOps/s $\color{#d91a1a}-1.81\%$
test_func_call_cm_runtime[True-compile] 1.0941ms 0.7535ms 1.3271 KOps/s 1.3299 KOps/s $\color{#d91a1a}-0.22\%$
test_func_call_cm_runtime[True-compile-overhead] 1.1965ms 0.7638ms 1.3092 KOps/s 1.3201 KOps/s $\color{#d91a1a}-0.82\%$
test_vmap_func_call_cm_runtime[eager] 3.4389ms 1.9951ms 501.2325 Ops/s 527.3464 Ops/s $\color{#d91a1a}-4.95\%$
test_vmap_func_call_cm_runtime[compile] 2.7899ms 1.9808ms 504.8469 Ops/s 509.2792 Ops/s $\color{#d91a1a}-0.87\%$
test_vmap_func_call_cm_runtime[compile-overhead] 6.6689ms 2.0762ms 481.6406 Ops/s 513.7446 Ops/s $\textbf{\color{#d91a1a}-6.25\%}$
test_distributed 0.3691ms 0.1241ms 8.0603 KOps/s 7.8696 KOps/s $\color{#35bf28}+2.42\%$
test_tdmodule 34.2040μs 19.3201μs 51.7596 KOps/s 54.1781 KOps/s $\color{#d91a1a}-4.46\%$
test_tdmodule_dispatch 60.1820μs 39.1860μs 25.5193 KOps/s 27.6035 KOps/s $\textbf{\color{#d91a1a}-7.55\%}$
test_tdseq 43.1110μs 21.5145μs 46.4803 KOps/s 46.5453 KOps/s $\color{#d91a1a}-0.14\%$
test_tdseq_dispatch 81.6430μs 42.8608μs 23.3314 KOps/s 23.5672 KOps/s $\color{#d91a1a}-1.00\%$
test_instantiation_functorch 1.7652ms 1.6223ms 616.4221 Ops/s 623.3867 Ops/s $\color{#d91a1a}-1.12\%$
test_instantiation_td 1.8757ms 1.2031ms 831.2062 Ops/s 835.3457 Ops/s $\color{#d91a1a}-0.50\%$
test_exec_functorch 0.2944ms 0.1894ms 5.2785 KOps/s 5.1546 KOps/s $\color{#35bf28}+2.41\%$
test_exec_functional_call 0.2969ms 0.1790ms 5.5868 KOps/s 5.5183 KOps/s $\color{#35bf28}+1.24\%$
test_exec_td 0.3819ms 0.2068ms 4.8359 KOps/s 5.7331 KOps/s $\textbf{\color{#d91a1a}-15.65\%}$
test_exec_td_decorator 1.1741ms 0.2398ms 4.1704 KOps/s 4.3629 KOps/s $\color{#d91a1a}-4.41\%$
test_vmap_mlp_speed[True-True] 1.9191ms 0.7025ms 1.4235 KOps/s 1.5104 KOps/s $\textbf{\color{#d91a1a}-5.75\%}$
test_vmap_mlp_speed[True-False] 0.9944ms 0.6884ms 1.4526 KOps/s 1.5303 KOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_vmap_mlp_speed[False-True] 0.8796ms 0.5424ms 1.8437 KOps/s 1.9976 KOps/s $\textbf{\color{#d91a1a}-7.71\%}$
test_vmap_mlp_speed[False-False] 0.8445ms 0.5395ms 1.8537 KOps/s 1.9855 KOps/s $\textbf{\color{#d91a1a}-6.64\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9089ms 0.6521ms 1.5334 KOps/s 1.5796 KOps/s $\color{#d91a1a}-2.93\%$
test_vmap_mlp_speed_decorator[True-False] 1.0291ms 0.6507ms 1.5369 KOps/s 1.5732 KOps/s $\color{#d91a1a}-2.31\%$
test_vmap_mlp_speed_decorator[False-True] 0.8402ms 0.5336ms 1.8739 KOps/s 1.9201 KOps/s $\color{#d91a1a}-2.41\%$
test_vmap_mlp_speed_decorator[False-False] 0.7785ms 0.5327ms 1.8772 KOps/s 1.9214 KOps/s $\color{#d91a1a}-2.30\%$
test_to_module_speed[True] 1.5057ms 1.4113ms 708.5466 Ops/s 759.1578 Ops/s $\textbf{\color{#d91a1a}-6.67\%}$
test_to_module_speed[False] 1.6392ms 1.3667ms 731.6943 Ops/s 783.2558 Ops/s $\textbf{\color{#d91a1a}-6.58\%}$
test_tc_init 87.3140μs 47.1565μs 21.2060 KOps/s 23.4589 KOps/s $\textbf{\color{#d91a1a}-9.60\%}$
test_tc_init_nested 0.1781ms 93.9291μs 10.6463 KOps/s 12.4837 KOps/s $\textbf{\color{#d91a1a}-14.72\%}$
test_tc_first_layer_tensor 13.4650μs 1.5876μs 629.8704 KOps/s 664.2696 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_tc_first_layer_nontensor 40.3850μs 4.7671μs 209.7718 KOps/s 213.5247 KOps/s $\color{#d91a1a}-1.76\%$
test_tc_second_layer_tensor 21.0000μs 2.8561μs 350.1292 KOps/s 347.2367 KOps/s $\color{#35bf28}+0.83\%$
test_tc_second_layer_nontensor 49.2620μs 6.1475μs 162.6671 KOps/s 164.7116 KOps/s $\color{#d91a1a}-1.24\%$
test_unbind 0.4703s 15.0249ms 66.5560 Ops/s 137.4057 Ops/s $\textbf{\color{#d91a1a}-51.56\%}$
test_full_like 8.2996ms 7.5221ms 132.9412 Ops/s 85.1622 Ops/s $\textbf{\color{#35bf28}+56.10\%}$
test_zeros_like 3.4132ms 2.8825ms 346.9211 Ops/s 128.4417 Ops/s $\textbf{\color{#35bf28}+170.10\%}$
test_ones_like 3.6936ms 3.2817ms 304.7246 Ops/s 130.2160 Ops/s $\textbf{\color{#35bf28}+134.01\%}$
test_clone 5.8623ms 5.2524ms 190.3908 Ops/s 105.9270 Ops/s $\textbf{\color{#35bf28}+79.74\%}$
test_squeeze 65.0320μs 12.6692μs 78.9313 KOps/s 81.9600 KOps/s $\color{#d91a1a}-3.70\%$
test_unsqueeze 0.3572ms 98.5459μs 10.1476 KOps/s 10.6506 KOps/s $\color{#d91a1a}-4.72\%$
test_split 0.3821ms 0.1978ms 5.0566 KOps/s 5.2849 KOps/s $\color{#d91a1a}-4.32\%$
test_permute 0.3670ms 0.2203ms 4.5403 KOps/s 4.5500 KOps/s $\color{#d91a1a}-0.21\%$
test_stack 32.2810ms 25.2936ms 39.5357 Ops/s 40.9458 Ops/s $\color{#d91a1a}-3.44\%$
test_cat 28.3752ms 24.7969ms 40.3277 Ops/s 41.5343 Ops/s $\color{#d91a1a}-2.91\%$

Copy link

github-actions bot commented Sep 30, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 228. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}51$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1295ms 17.7268μs 56.4117 KOps/s 73.1101 KOps/s $\textbf{\color{#d91a1a}-22.84\%}$
test_plain_set_stack_nested 49.3010μs 18.2956μs 54.6578 KOps/s 72.4975 KOps/s $\textbf{\color{#d91a1a}-24.61\%}$
test_plain_set_nested_inplace 58.4910μs 19.0279μs 52.5545 KOps/s 67.6940 KOps/s $\textbf{\color{#d91a1a}-22.36\%}$
test_plain_set_stack_nested_inplace 63.6220μs 19.1253μs 52.2868 KOps/s 68.0528 KOps/s $\textbf{\color{#d91a1a}-23.17\%}$
test_items 33.4410μs 2.8844μs 346.6866 KOps/s 346.2110 KOps/s $\color{#35bf28}+0.14\%$
test_items_nested 0.3963ms 0.3377ms 2.9613 KOps/s 3.0178 KOps/s $\color{#d91a1a}-1.87\%$
test_items_nested_locked 0.4000ms 0.3385ms 2.9538 KOps/s 3.0004 KOps/s $\color{#d91a1a}-1.55\%$
test_items_nested_leaf 93.9020μs 62.0087μs 16.1268 KOps/s 17.9075 KOps/s $\textbf{\color{#d91a1a}-9.94\%}$
test_items_stack_nested 0.4678ms 0.3415ms 2.9279 KOps/s 3.0183 KOps/s $\color{#d91a1a}-3.00\%$
test_items_stack_nested_leaf 0.1078ms 63.4636μs 15.7571 KOps/s 17.6214 KOps/s $\textbf{\color{#d91a1a}-10.58\%}$
test_items_stack_nested_locked 0.4158ms 0.3421ms 2.9233 KOps/s 3.0278 KOps/s $\color{#d91a1a}-3.45\%$
test_keys 30.0200μs 3.4248μs 291.9882 KOps/s 268.1407 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_keys_nested 97.7920μs 70.8110μs 14.1221 KOps/s 17.7196 KOps/s $\textbf{\color{#d91a1a}-20.30\%}$
test_keys_nested_locked 2.5311ms 76.3900μs 13.0907 KOps/s 16.0578 KOps/s $\textbf{\color{#d91a1a}-18.48\%}$
test_keys_nested_leaf 89.8420μs 61.7497μs 16.1944 KOps/s 21.1920 KOps/s $\textbf{\color{#d91a1a}-23.58\%}$
test_keys_stack_nested 0.1043ms 71.6869μs 13.9495 KOps/s 17.9184 KOps/s $\textbf{\color{#d91a1a}-22.15\%}$
test_keys_stack_nested_leaf 99.6220μs 63.0468μs 15.8612 KOps/s 20.8985 KOps/s $\textbf{\color{#d91a1a}-24.10\%}$
test_keys_stack_nested_locked 0.1141ms 77.2788μs 12.9402 KOps/s 16.4976 KOps/s $\textbf{\color{#d91a1a}-21.56\%}$
test_values 5.2618μs 0.8399μs 1.1906 MOps/s 1.1818 MOps/s $\color{#35bf28}+0.74\%$
test_values_nested 78.3820μs 48.7115μs 20.5290 KOps/s 24.3705 KOps/s $\textbf{\color{#d91a1a}-15.76\%}$
test_values_nested_locked 94.0320μs 49.9304μs 20.0279 KOps/s 23.3115 KOps/s $\textbf{\color{#d91a1a}-14.09\%}$
test_values_nested_leaf 74.7210μs 42.4158μs 23.5761 KOps/s 28.1672 KOps/s $\textbf{\color{#d91a1a}-16.30\%}$
test_values_stack_nested 83.5920μs 49.4504μs 20.2223 KOps/s 23.9571 KOps/s $\textbf{\color{#d91a1a}-15.59\%}$
test_values_stack_nested_leaf 70.6620μs 43.0671μs 23.2196 KOps/s 28.1014 KOps/s $\textbf{\color{#d91a1a}-17.37\%}$
test_values_stack_nested_locked 0.1049ms 51.0987μs 19.5700 KOps/s 23.0356 KOps/s $\textbf{\color{#d91a1a}-15.04\%}$
test_membership 1.5810μs 0.5084μs 1.9669 MOps/s 1.9920 MOps/s $\color{#d91a1a}-1.26\%$
test_membership_nested 17.6900μs 1.8827μs 531.1531 KOps/s 513.3789 KOps/s $\color{#35bf28}+3.46\%$
test_membership_nested_leaf 15.5605μs 1.8821μs 531.3137 KOps/s 515.2510 KOps/s $\color{#35bf28}+3.12\%$
test_membership_stacked_nested 27.7410μs 1.9428μs 514.7146 KOps/s 489.0889 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_membership_stacked_nested_leaf 26.3700μs 1.9201μs 520.8064 KOps/s 493.4820 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_membership_nested_last 33.1500μs 2.9915μs 334.2791 KOps/s 349.7596 KOps/s $\color{#d91a1a}-4.43\%$
test_membership_nested_leaf_last 36.9510μs 2.9419μs 339.9145 KOps/s 351.1857 KOps/s $\color{#d91a1a}-3.21\%$
test_membership_stacked_nested_last 42.7710μs 2.9597μs 337.8706 KOps/s 126.4678 KOps/s $\textbf{\color{#35bf28}+167.16\%}$
test_membership_stacked_nested_leaf_last 28.9000μs 2.9740μs 336.2476 KOps/s 127.0423 KOps/s $\textbf{\color{#35bf28}+164.67\%}$
test_nested_getleaf 29.7210μs 6.1183μs 163.4446 KOps/s 163.2544 KOps/s $\color{#35bf28}+0.12\%$
test_nested_get 47.1010μs 5.8565μs 170.7494 KOps/s 172.5090 KOps/s $\color{#d91a1a}-1.02\%$
test_stacked_getleaf 39.8110μs 6.0496μs 165.3003 KOps/s 163.4084 KOps/s $\color{#35bf28}+1.16\%$
test_stacked_get 37.0610μs 5.7424μs 174.1427 KOps/s 172.7656 KOps/s $\color{#35bf28}+0.80\%$
test_nested_getitemleaf 46.2410μs 6.1657μs 162.1867 KOps/s 161.9558 KOps/s $\color{#35bf28}+0.14\%$
test_nested_getitem 29.1110μs 5.8380μs 171.2927 KOps/s 170.7546 KOps/s $\color{#35bf28}+0.32\%$
test_stacked_getitemleaf 50.8500μs 6.0911μs 164.1739 KOps/s 161.7063 KOps/s $\color{#35bf28}+1.53\%$
test_stacked_getitem 41.1410μs 5.7902μs 172.7047 KOps/s 171.7185 KOps/s $\color{#35bf28}+0.57\%$
test_lock_nested 6.9541ms 0.4413ms 2.2663 KOps/s 2.3240 KOps/s $\color{#d91a1a}-2.48\%$
test_lock_stack_nested 0.4517ms 0.3912ms 2.5565 KOps/s 2.6693 KOps/s $\color{#d91a1a}-4.23\%$
test_unlock_nested 0.7649ms 0.3671ms 2.7239 KOps/s 2.7464 KOps/s $\color{#d91a1a}-0.82\%$
test_unlock_stack_nested 0.3858ms 0.3296ms 3.0339 KOps/s 3.1937 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_flatten_speed 0.1514ms 76.3147μs 13.1036 KOps/s 14.4747 KOps/s $\textbf{\color{#d91a1a}-9.47\%}$
test_unflatten_speed 0.3729ms 0.3223ms 3.1029 KOps/s 3.5131 KOps/s $\textbf{\color{#d91a1a}-11.68\%}$
test_common_ops 1.6654ms 1.3511ms 740.1272 Ops/s 807.7316 Ops/s $\textbf{\color{#d91a1a}-8.37\%}$
test_creation 23.7600μs 1.4831μs 674.2649 KOps/s 679.0048 KOps/s $\color{#d91a1a}-0.70\%$
test_creation_empty 52.2310μs 17.9995μs 55.5571 KOps/s 72.2246 KOps/s $\textbf{\color{#d91a1a}-23.08\%}$
test_creation_nested_1 53.4010μs 19.8246μs 50.4424 KOps/s 64.8024 KOps/s $\textbf{\color{#d91a1a}-22.16\%}$
test_creation_nested_2 54.3110μs 22.2775μs 44.8884 KOps/s 55.7237 KOps/s $\textbf{\color{#d91a1a}-19.44\%}$
test_clone 81.9210μs 28.5557μs 35.0192 KOps/s 33.1163 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_getitem[int] 1.2844ms 16.4376μs 60.8362 KOps/s 59.0853 KOps/s $\color{#35bf28}+2.96\%$
test_getitem[slice_int] 0.1207ms 28.4941μs 35.0950 KOps/s 34.4121 KOps/s $\color{#35bf28}+1.98\%$
test_getitem[range] 0.2442ms 0.1128ms 8.8629 KOps/s 8.8091 KOps/s $\color{#35bf28}+0.61\%$
test_getitem[tuple] 0.1296ms 24.2210μs 41.2865 KOps/s 39.9714 KOps/s $\color{#35bf28}+3.29\%$
test_getitem[list] 0.2001ms 0.1004ms 9.9634 KOps/s 9.8696 KOps/s $\color{#35bf28}+0.95\%$
test_setitem_dim[int] 69.8310μs 45.9728μs 21.7520 KOps/s 21.2424 KOps/s $\color{#35bf28}+2.40\%$
test_setitem_dim[slice_int] 94.8720μs 68.7276μs 14.5502 KOps/s 14.3918 KOps/s $\color{#35bf28}+1.10\%$
test_setitem_dim[range] 0.1802ms 0.1356ms 7.3746 KOps/s 7.6192 KOps/s $\color{#d91a1a}-3.21\%$
test_setitem_dim[tuple] 99.4220μs 65.2238μs 15.3318 KOps/s 15.9912 KOps/s $\color{#d91a1a}-4.12\%$
test_setitem 96.3520μs 46.0683μs 21.7069 KOps/s 23.9023 KOps/s $\textbf{\color{#d91a1a}-9.18\%}$
test_set 0.1266ms 45.5377μs 21.9598 KOps/s 24.6599 KOps/s $\textbf{\color{#d91a1a}-10.95\%}$
test_set_shared 0.3569ms 54.9143μs 18.2102 KOps/s 19.2565 KOps/s $\textbf{\color{#d91a1a}-5.43\%}$
test_update 94.6520μs 53.8904μs 18.5562 KOps/s 20.4450 KOps/s $\textbf{\color{#d91a1a}-9.24\%}$
test_update_nested 96.2120μs 60.0308μs 16.6581 KOps/s 17.4711 KOps/s $\color{#d91a1a}-4.65\%$
test_update__nested 0.1007ms 60.6191μs 16.4965 KOps/s 16.0931 KOps/s $\color{#35bf28}+2.51\%$
test_set_nested 88.4420μs 44.0309μs 22.7113 KOps/s 23.3141 KOps/s $\color{#d91a1a}-2.59\%$
test_set_nested_new 88.1320μs 47.9703μs 20.8463 KOps/s 21.2548 KOps/s $\color{#d91a1a}-1.92\%$
test_select 0.1075ms 60.7252μs 16.4676 KOps/s 16.5350 KOps/s $\color{#d91a1a}-0.41\%$
test_select_nested 79.5120μs 41.5630μs 24.0599 KOps/s 23.6543 KOps/s $\color{#35bf28}+1.71\%$
test_exclude_nested 95.0320μs 58.1005μs 17.2115 KOps/s 16.7956 KOps/s $\color{#35bf28}+2.48\%$
test_empty[True] 0.3253ms 0.2557ms 3.9112 KOps/s 4.0625 KOps/s $\color{#d91a1a}-3.72\%$
test_empty[False] 3.8441μs 0.7359μs 1.3590 MOps/s 1.3498 MOps/s $\color{#35bf28}+0.68\%$
test_to 55.3710μs 26.4093μs 37.8654 KOps/s 39.1832 KOps/s $\color{#d91a1a}-3.36\%$
test_to_nonblocking 60.4610μs 24.2927μs 41.1646 KOps/s 40.5563 KOps/s $\color{#35bf28}+1.50\%$
test_unbind_speed 1.3502ms 0.2793ms 3.5800 KOps/s 3.4322 KOps/s $\color{#35bf28}+4.30\%$
test_unbind_speed_stack0 0.4206ms 0.2808ms 3.5610 KOps/s 3.5925 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_speed_stack1 91.4909ms 0.7159ms 1.3968 KOps/s 1.4244 KOps/s $\color{#d91a1a}-1.94\%$
test_split 94.1870ms 2.2209ms 450.2599 Ops/s 444.3765 Ops/s $\color{#35bf28}+1.32\%$
test_chunk 94.2362ms 2.2102ms 452.4537 Ops/s 440.5474 Ops/s $\color{#35bf28}+2.70\%$
test_creation[device0] 0.3472ms 0.1279ms 7.8198 KOps/s 7.7716 KOps/s $\color{#35bf28}+0.62\%$
test_creation_from_tensor 0.3711ms 0.1349ms 7.4156 KOps/s 7.6562 KOps/s $\color{#d91a1a}-3.14\%$
test_add_one[memmap_tensor0] 0.1672ms 8.6489μs 115.6216 KOps/s 103.9359 KOps/s $\textbf{\color{#35bf28}+11.24\%}$
test_contiguous[memmap_tensor0] 24.5210μs 2.2164μs 451.1879 KOps/s 446.2074 KOps/s $\color{#35bf28}+1.12\%$
test_stack[memmap_tensor0] 35.4300μs 6.8832μs 145.2813 KOps/s 144.7319 KOps/s $\color{#35bf28}+0.38\%$
test_memmaptd_index 1.1076ms 0.4482ms 2.2311 KOps/s 2.2441 KOps/s $\color{#d91a1a}-0.58\%$
test_memmaptd_index_astensor 0.7627ms 0.5157ms 1.9389 KOps/s 1.9654 KOps/s $\color{#d91a1a}-1.35\%$
test_memmaptd_index_op 1.5007ms 1.0867ms 920.2104 Ops/s 970.9678 Ops/s $\textbf{\color{#d91a1a}-5.23\%}$
test_serialize_model 0.1325s 0.1310s 7.6331 Ops/s 7.6698 Ops/s $\color{#d91a1a}-0.48\%$
test_serialize_model_pickle 1.3477s 1.2126s 0.8246 Ops/s 0.8237 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_weights 0.2212s 0.1437s 6.9593 Ops/s 7.7270 Ops/s $\textbf{\color{#d91a1a}-9.93\%}$
test_serialize_weights_returnearly 0.2134s 55.6685ms 17.9635 Ops/s 16.0743 Ops/s $\textbf{\color{#35bf28}+11.75\%}$
test_serialize_weights_pickle 1.3739s 1.2179s 0.8211 Ops/s 0.8211 Ops/s $+0.00\%$
test_reshape_pytree 67.4020μs 36.2227μs 27.6070 KOps/s 27.0232 KOps/s $\color{#35bf28}+2.16\%$
test_reshape_td 72.3620μs 41.4690μs 24.1144 KOps/s 23.7502 KOps/s $\color{#35bf28}+1.53\%$
test_view_pytree 69.4020μs 35.3752μs 28.2684 KOps/s 27.8543 KOps/s $\color{#35bf28}+1.49\%$
test_view_td 85.6420μs 45.3069μs 22.0717 KOps/s 21.0734 KOps/s $\color{#35bf28}+4.74\%$
test_unbind_pytree 64.4210μs 34.4974μs 28.9877 KOps/s 28.5639 KOps/s $\color{#35bf28}+1.48\%$
test_unbind_td 0.5412ms 42.3107μs 23.6347 KOps/s 22.9543 KOps/s $\color{#35bf28}+2.96\%$
test_split_pytree 89.8110μs 45.6109μs 21.9246 KOps/s 21.4859 KOps/s $\color{#35bf28}+2.04\%$
test_split_td 95.6492ms 66.6819μs 14.9966 KOps/s 17.1924 KOps/s $\textbf{\color{#d91a1a}-12.77\%}$
test_add_pytree 0.1026ms 55.7444μs 17.9390 KOps/s 17.7010 KOps/s $\color{#35bf28}+1.34\%$
test_add_td 0.1438ms 98.7911μs 10.1224 KOps/s 11.4198 KOps/s $\textbf{\color{#d91a1a}-11.36\%}$
test_compile_add_one_nested[tensordict-compile] 0.3133ms 0.1617ms 6.1861 KOps/s 4.5973 KOps/s $\textbf{\color{#35bf28}+34.56\%}$
test_compile_add_one_nested[tensordict-eager] 0.2828ms 0.1596ms 6.2639 KOps/s 6.6061 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_compile_add_one_nested[pytree-compile] 0.2114ms 0.1444ms 6.9238 KOps/s 6.6250 KOps/s $\color{#35bf28}+4.51\%$
test_compile_add_one_nested[pytree-eager] 0.2420ms 0.1818ms 5.4999 KOps/s 5.3214 KOps/s $\color{#35bf28}+3.35\%$
test_compile_copy_nested[tensordict-compile] 0.1233ms 22.0884μs 45.2726 KOps/s 46.2987 KOps/s $\color{#d91a1a}-2.22\%$
test_compile_copy_nested[tensordict-eager] 82.6110μs 48.7647μs 20.5066 KOps/s 22.6726 KOps/s $\textbf{\color{#d91a1a}-9.55\%}$
test_compile_copy_nested[pytree-compile] 0.2246ms 65.1696μs 15.3446 KOps/s 15.4837 KOps/s $\color{#d91a1a}-0.90\%$
test_compile_copy_nested[pytree-eager] 80.3920μs 49.5206μs 20.1936 KOps/s 20.2051 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_add_one_flat[tensordict-compile] 0.3621ms 0.3195ms 3.1296 KOps/s 3.1225 KOps/s $\color{#35bf28}+0.23\%$
test_compile_add_one_flat[tensordict-eager] 0.3254ms 0.2337ms 4.2789 KOps/s 4.8000 KOps/s $\textbf{\color{#d91a1a}-10.86\%}$
test_compile_add_one_flat[tensorclass-compile] 0.1791ms 0.1280ms 7.8114 KOps/s 7.6835 KOps/s $\color{#35bf28}+1.66\%$
test_compile_add_one_flat[tensorclass-eager] 0.1152ms 63.5670μs 15.7314 KOps/s 16.8396 KOps/s $\textbf{\color{#d91a1a}-6.58\%}$
test_compile_add_one_flat[pytree-compile] 0.4139ms 0.3184ms 3.1406 KOps/s 3.1364 KOps/s $\color{#35bf28}+0.13\%$
test_compile_add_one_flat[pytree-eager] 0.6583ms 0.6110ms 1.6368 KOps/s 1.5773 KOps/s $\color{#35bf28}+3.77\%$
test_compile_add_self_flat[tensordict-eager] 0.3535ms 0.2809ms 3.5599 KOps/s 4.0144 KOps/s $\textbf{\color{#d91a1a}-11.32\%}$
test_compile_add_self_flat[tensordict-compile] 0.4729ms 0.3215ms 3.1105 KOps/s 3.0947 KOps/s $\color{#35bf28}+0.51\%$
test_compile_add_self_flat[tensorclass-eager] 0.1310ms 74.1377μs 13.4884 KOps/s 14.4687 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1835ms 0.1294ms 7.7290 KOps/s 7.5376 KOps/s $\color{#35bf28}+2.54\%$
test_compile_add_self_flat[pytree-eager] 0.6366ms 0.5267ms 1.8987 KOps/s 1.8780 KOps/s $\color{#35bf28}+1.10\%$
test_compile_add_self_flat[pytree-compile] 0.3915ms 0.3178ms 3.1465 KOps/s 3.1410 KOps/s $\color{#35bf28}+0.17\%$
test_compile_copy_flat[tensordict-compile] 0.1037ms 19.1548μs 52.2061 KOps/s 55.8024 KOps/s $\textbf{\color{#d91a1a}-6.44\%}$
test_compile_copy_flat[tensordict-eager] 74.2110μs 37.7709μs 26.4754 KOps/s 36.3842 KOps/s $\textbf{\color{#d91a1a}-27.23\%}$
test_compile_copy_flat[pytree-compile] 0.1159ms 69.7753μs 14.3317 KOps/s 14.2740 KOps/s $\color{#35bf28}+0.40\%$
test_compile_copy_flat[pytree-eager] 88.6320μs 51.3590μs 19.4708 KOps/s 19.2397 KOps/s $\color{#35bf28}+1.20\%$
test_compile_assign_and_add[tensordict-compile] 2.3584ms 0.8303ms 1.2043 KOps/s 1.0861 KOps/s $\textbf{\color{#35bf28}+10.89\%}$
test_compile_assign_and_add[tensordict-eager] 3.3067ms 3.2344ms 309.1746 Ops/s 294.3255 Ops/s $\textbf{\color{#35bf28}+5.05\%}$
test_compile_assign_and_add[pytree-compile] 2.2944ms 0.8142ms 1.2282 KOps/s 1.1108 KOps/s $\textbf{\color{#35bf28}+10.57\%}$
test_compile_assign_and_add[pytree-eager] 3.3044ms 3.1902ms 313.4570 Ops/s 298.2858 Ops/s $\textbf{\color{#35bf28}+5.09\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1500ms 0.1083ms 9.2356 KOps/s 8.7228 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_compile_indexing[tensor-tensordict-eager] 0.1905ms 62.8071μs 15.9218 KOps/s 15.1936 KOps/s $\color{#35bf28}+4.79\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1832ms 0.1027ms 9.7371 KOps/s 9.5832 KOps/s $\color{#35bf28}+1.61\%$
test_compile_indexing[tensor-tensorclass-eager] 85.2820μs 44.8216μs 22.3107 KOps/s 22.5693 KOps/s $\color{#d91a1a}-1.15\%$
test_compile_indexing[tensor-pytree-compile] 0.1464ms 0.1077ms 9.2818 KOps/s 9.4690 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_indexing[tensor-pytree-eager] 91.9420μs 45.4462μs 22.0041 KOps/s 22.6847 KOps/s $\color{#d91a1a}-3.00\%$
test_compile_indexing[slice-tensordict-compile] 0.1856ms 0.1370ms 7.2993 KOps/s 7.1428 KOps/s $\color{#35bf28}+2.19\%$
test_compile_indexing[slice-tensordict-eager] 0.1513ms 26.1657μs 38.2179 KOps/s 37.6465 KOps/s $\color{#35bf28}+1.52\%$
test_compile_indexing[slice-tensorclass-compile] 0.1777ms 0.1310ms 7.6341 KOps/s 7.4976 KOps/s $\color{#35bf28}+1.82\%$
test_compile_indexing[slice-tensorclass-eager] 50.0210μs 20.9494μs 47.7340 KOps/s 45.9984 KOps/s $\color{#35bf28}+3.77\%$
test_compile_indexing[slice-pytree-compile] 0.1726ms 0.1318ms 7.5877 KOps/s 7.4484 KOps/s $\color{#35bf28}+1.87\%$
test_compile_indexing[slice-pytree-eager] 52.3710μs 21.0618μs 47.4793 KOps/s 46.3085 KOps/s $\color{#35bf28}+2.53\%$
test_compile_indexing[int-tensordict-compile] 0.1794ms 0.1375ms 7.2730 KOps/s 7.0845 KOps/s $\color{#35bf28}+2.66\%$
test_compile_indexing[int-tensordict-eager] 0.4948ms 25.6478μs 38.9898 KOps/s 37.7604 KOps/s $\color{#35bf28}+3.26\%$
test_compile_indexing[int-tensorclass-compile] 0.2134ms 0.1335ms 7.4927 KOps/s 7.4363 KOps/s $\color{#35bf28}+0.76\%$
test_compile_indexing[int-tensorclass-eager] 49.9710μs 21.0072μs 47.6028 KOps/s 46.2095 KOps/s $\color{#35bf28}+3.02\%$
test_compile_indexing[int-pytree-compile] 0.1675ms 0.1313ms 7.6179 KOps/s 7.4466 KOps/s $\color{#35bf28}+2.30\%$
test_compile_indexing[int-pytree-eager] 49.3610μs 20.8915μs 47.8664 KOps/s 46.2590 KOps/s $\color{#35bf28}+3.47\%$
test_mod_add[eager] 83.3310μs 33.1822μs 30.1367 KOps/s 32.2527 KOps/s $\textbf{\color{#d91a1a}-6.56\%}$
test_mod_add[compile] 0.1179ms 70.5028μs 14.1838 KOps/s 13.6951 KOps/s $\color{#35bf28}+3.57\%$
test_mod_add[compile-overhead] 0.2651ms 0.1350ms 7.4101 KOps/s 6.5678 KOps/s $\textbf{\color{#35bf28}+12.83\%}$
test_mod_wrap[eager] 0.8864ms 0.7775ms 1.2862 KOps/s 1.2622 KOps/s $\color{#35bf28}+1.90\%$
test_mod_wrap[compile] 2.0354ms 0.8310ms 1.2033 KOps/s 1.1883 KOps/s $\color{#35bf28}+1.26\%$
test_mod_wrap[compile-overhead] 4.9553ms 3.0873ms 323.9059 Ops/s 322.5178 Ops/s $\color{#35bf28}+0.43\%$
test_mod_wrap_and_backward[eager] 4.5499ms 4.0493ms 246.9546 Ops/s 239.9512 Ops/s $\color{#35bf28}+2.92\%$
test_mod_wrap_and_backward[compile] 4.3180ms 4.0886ms 244.5843 Ops/s 241.1546 Ops/s $\color{#35bf28}+1.42\%$
test_mod_wrap_and_backward[compile-overhead] 1.3826ms 0.9755ms 1.0251 KOps/s 978.4724 Ops/s $\color{#35bf28}+4.77\%$
test_seq_add[eager] 0.1380ms 0.1007ms 9.9342 KOps/s 10.0315 KOps/s $\color{#d91a1a}-0.97\%$
test_seq_add[compile] 0.4795ms 82.3602μs 12.1418 KOps/s 12.0535 KOps/s $\color{#35bf28}+0.73\%$
test_seq_add[compile-overhead] 0.5560ms 0.1142ms 8.7549 KOps/s 8.5697 KOps/s $\color{#35bf28}+2.16\%$
test_seq_wrap[eager] 1.3484ms 0.9326ms 1.0723 KOps/s 1.0764 KOps/s $\color{#d91a1a}-0.38\%$
test_seq_wrap[compile] 0.9711ms 0.8546ms 1.1701 KOps/s 1.1605 KOps/s $\color{#35bf28}+0.83\%$
test_seq_wrap[compile-overhead] 0.6096ms 0.2224ms 4.4957 KOps/s 4.4364 KOps/s $\color{#35bf28}+1.34\%$
test_func_call_runtime[False-eager] 2.7758ms 2.3541ms 424.7984 Ops/s 413.1138 Ops/s $\color{#35bf28}+2.83\%$
test_func_call_runtime[False-compile] 2.8150ms 2.3735ms 421.3265 Ops/s 413.5533 Ops/s $\color{#35bf28}+1.88\%$
test_func_call_runtime[False-compile-overhead] 0.7599ms 0.3610ms 2.7700 KOps/s 2.7188 KOps/s $\color{#35bf28}+1.88\%$
test_func_call_runtime[True-eager] 2.9207ms 2.5119ms 398.1100 Ops/s 389.5727 Ops/s $\color{#35bf28}+2.19\%$
test_func_call_runtime[True-compile] 3.1641ms 2.3883ms 418.7025 Ops/s 411.2629 Ops/s $\color{#35bf28}+1.81\%$
test_func_call_runtime[True-compile-overhead] 0.4331ms 0.3831ms 2.6100 KOps/s 2.5990 KOps/s $\color{#35bf28}+0.42\%$
test_func_call_cm_runtime[False-eager] 2.7606ms 2.3350ms 428.2624 Ops/s 414.9328 Ops/s $\color{#35bf28}+3.21\%$
test_func_call_cm_runtime[False-compile] 2.7717ms 2.3863ms 419.0569 Ops/s 413.4612 Ops/s $\color{#35bf28}+1.35\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4118ms 0.3643ms 2.7449 KOps/s 2.7451 KOps/s $-0.01\%$
test_func_call_cm_runtime[True-eager] 3.0048ms 2.6192ms 381.8023 Ops/s 373.0363 Ops/s $\color{#35bf28}+2.35\%$
test_func_call_cm_runtime[True-compile] 2.8320ms 2.4250ms 412.3788 Ops/s 405.7524 Ops/s $\color{#35bf28}+1.63\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5519ms 0.4079ms 2.4514 KOps/s 2.4196 KOps/s $\color{#35bf28}+1.31\%$
test_vmap_func_call_cm_runtime[eager] 4.2439ms 3.7594ms 266.0017 Ops/s 263.3169 Ops/s $\color{#35bf28}+1.02\%$
test_vmap_func_call_cm_runtime[compile] 2.6707ms 2.4532ms 407.6304 Ops/s 404.8802 Ops/s $\color{#35bf28}+0.68\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5023ms 0.4115ms 2.4301 KOps/s 2.3937 KOps/s $\color{#35bf28}+1.52\%$
test_distributed 3.3686ms 0.1811ms 5.5214 KOps/s 8.3881 KOps/s $\textbf{\color{#d91a1a}-34.18\%}$
test_tdmodule 0.2895ms 16.7653μs 59.6468 KOps/s 69.5160 KOps/s $\textbf{\color{#d91a1a}-14.20\%}$
test_tdmodule_dispatch 61.2010μs 32.2316μs 31.0254 KOps/s 35.7074 KOps/s $\textbf{\color{#d91a1a}-13.11\%}$
test_tdseq 26.5310μs 17.4486μs 57.3112 KOps/s 65.3221 KOps/s $\textbf{\color{#d91a1a}-12.26\%}$
test_tdseq_dispatch 58.4610μs 35.0457μs 28.5342 KOps/s 32.5858 KOps/s $\textbf{\color{#d91a1a}-12.43\%}$
test_instantiation_functorch 2.0232ms 1.8657ms 535.9858 Ops/s 521.9969 Ops/s $\color{#35bf28}+2.68\%$
test_instantiation_td 1.8124ms 1.2044ms 830.3211 Ops/s 814.7713 Ops/s $\color{#35bf28}+1.91\%$
test_exec_functorch 1.0384ms 0.9787ms 1.0218 KOps/s 1.0085 KOps/s $\color{#35bf28}+1.32\%$
test_exec_functional_call 1.2243ms 1.0018ms 998.2288 Ops/s 1.0032 KOps/s $\color{#d91a1a}-0.50\%$
test_exec_td 1.1640ms 1.0199ms 980.4649 Ops/s 976.1708 Ops/s $\color{#35bf28}+0.44\%$
test_exec_td_decorator 1.7097ms 1.0447ms 957.1850 Ops/s 937.6960 Ops/s $\color{#35bf28}+2.08\%$
test_vmap_mlp_speed[True-True] 1.6668ms 1.2800ms 781.2582 Ops/s 793.0807 Ops/s $\color{#d91a1a}-1.49\%$
test_vmap_mlp_speed[True-False] 1.6573ms 1.2764ms 783.4237 Ops/s 796.2151 Ops/s $\color{#d91a1a}-1.61\%$
test_vmap_mlp_speed[False-True] 1.5374ms 1.1658ms 857.7574 Ops/s 873.1531 Ops/s $\color{#d91a1a}-1.76\%$
test_vmap_mlp_speed[False-False] 1.5713ms 1.1678ms 856.3278 Ops/s 876.2019 Ops/s $\color{#d91a1a}-2.27\%$
test_vmap_mlp_speed_decorator[True-True] 2.0079ms 1.2480ms 801.2851 Ops/s 805.6878 Ops/s $\color{#d91a1a}-0.55\%$
test_vmap_mlp_speed_decorator[True-False] 1.6461ms 1.2514ms 799.1183 Ops/s 804.6264 Ops/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed_decorator[False-True] 1.5467ms 1.1647ms 858.5989 Ops/s 863.0946 Ops/s $\color{#d91a1a}-0.52\%$
test_vmap_mlp_speed_decorator[False-False] 1.5597ms 1.1651ms 858.2823 Ops/s 864.3536 Ops/s $\color{#d91a1a}-0.70\%$
test_vmap_transformer_speed[True-True] 13.4664ms 13.1140ms 76.2545 Ops/s 76.2519 Ops/s $+0.00\%$
test_vmap_transformer_speed[True-False] 13.5502ms 13.1188ms 76.2264 Ops/s 76.2210 Ops/s $+0.01\%$
test_vmap_transformer_speed[False-True] 13.2576ms 12.8992ms 77.5244 Ops/s 77.6295 Ops/s $\color{#d91a1a}-0.14\%$
test_vmap_transformer_speed[False-False] 13.4001ms 12.9372ms 77.2964 Ops/s 77.8225 Ops/s $\color{#d91a1a}-0.68\%$
test_vmap_transformer_speed_decorator[True-True] 34.3837ms 33.8453ms 29.5462 Ops/s 29.4542 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed_decorator[True-False] 34.0018ms 33.7567ms 29.6237 Ops/s 29.4860 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[False-True] 34.1620ms 33.6649ms 29.7046 Ops/s 29.5775 Ops/s $\color{#35bf28}+0.43\%$
test_vmap_transformer_speed_decorator[False-False] 33.9425ms 33.6803ms 29.6909 Ops/s 29.5524 Ops/s $\color{#35bf28}+0.47\%$
test_to_module_speed[True] 1.5207ms 1.0005ms 999.4692 Ops/s 1.0562 KOps/s $\textbf{\color{#d91a1a}-5.37\%}$
test_to_module_speed[False] 1.3625ms 0.9659ms 1.0353 KOps/s 1.0848 KOps/s $\color{#d91a1a}-4.56\%$
test_tc_init 72.7820μs 36.3091μs 27.5413 KOps/s 30.4177 KOps/s $\textbf{\color{#d91a1a}-9.46\%}$
test_tc_init_nested 0.4619ms 73.9677μs 13.5194 KOps/s 14.4449 KOps/s $\textbf{\color{#d91a1a}-6.41\%}$
test_tc_first_layer_tensor 54.1596μs 0.6697μs 1.4932 MOps/s 1.5036 MOps/s $\color{#d91a1a}-0.69\%$
test_tc_first_layer_nontensor 28.9200μs 2.2379μs 446.8575 KOps/s 455.2834 KOps/s $\color{#d91a1a}-1.85\%$
test_tc_second_layer_tensor 95.7270μs 1.3614μs 734.5636 KOps/s 737.3980 KOps/s $\color{#d91a1a}-0.38\%$
test_tc_second_layer_nontensor 0.1362ms 2.9186μs 342.6275 KOps/s 344.3621 KOps/s $\color{#d91a1a}-0.50\%$
test_unbind 0.1975s 12.1025ms 82.6277 Ops/s 92.0460 Ops/s $\textbf{\color{#d91a1a}-10.23\%}$
test_full_like 0.9463ms 0.5735ms 1.7437 KOps/s 1.7471 KOps/s $\color{#d91a1a}-0.19\%$
test_zeros_like 0.2582ms 0.1979ms 5.0539 KOps/s 5.0555 KOps/s $\color{#d91a1a}-0.03\%$
test_ones_like 0.5241ms 0.1978ms 5.0547 KOps/s 5.0585 KOps/s $\color{#d91a1a}-0.08\%$
test_clone 0.7773ms 0.4139ms 2.4162 KOps/s 2.4166 KOps/s $\color{#d91a1a}-0.02\%$
test_squeeze 34.4210μs 9.9865μs 100.1349 KOps/s 101.1337 KOps/s $\color{#d91a1a}-0.99\%$
test_unsqueeze 0.2893ms 76.1952μs 13.1242 KOps/s 13.2861 KOps/s $\color{#d91a1a}-1.22\%$
test_split 0.5291ms 0.1612ms 6.2030 KOps/s 6.1978 KOps/s $\color{#35bf28}+0.08\%$
test_permute 0.5698ms 0.1797ms 5.5633 KOps/s 5.4873 KOps/s $\color{#35bf28}+1.39\%$
test_stack 1.2557ms 0.8583ms 1.1650 KOps/s 1.1482 KOps/s $\color{#35bf28}+1.46\%$
test_cat 1.4038ms 1.2315ms 812.0058 Ops/s 812.0496 Ops/s $-0.01\%$

@@ -222,6 +222,8 @@ def _call(
return result

if not self._has_cuda or self.counter < self._warmup - 1:
# We must clone the data because providing non-contiguous data will fail later when we clone
tensordict = self._tensordict = tensordict.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this statement, we are making a clone of tensordict and assigning it to tensordict ? (ignoring the assignment to self._tensordict for now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep
I'm doing that bc otherwise you could have views in your tensordict and compile with inputs that are views. Then when you cudagraph you clone, but then it's not a view anymore! So compile will recompile and the whole warmup will be useless

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh interesting! Thank you for the explanation.

@vmoens vmoens added the Quality label Oct 1, 2024
@vmoens vmoens merged commit 0cada70 into main Oct 3, 2024
9 of 13 checks passed
@vmoens vmoens deleted the fix-recompile branch October 3, 2024 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants