Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster _is_tensor_collection in eager mode #1060

Merged
merged 2 commits into from
Oct 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Oct 25, 2024

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 25, 2024
ghstack-source-id: 5a94e1265ecfa50cb528fb5cf3c905816894a18e
Pull Request resolved: #1060
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 25, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Oct 25, 2024
ghstack-source-id: b81c1d243a7c72a9d5fd68bf8e65e97a934ae61c
Pull Request resolved: #1060
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 216. Improved: $\large\color{#35bf28}64$. Worsened: $\large\color{#d91a1a}22$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 48.2500μs 21.0667μs 47.4682 KOps/s 39.2580 KOps/s $\textbf{\color{#35bf28}+20.91\%}$
test_plain_set_stack_nested 59.5610μs 20.8463μs 47.9702 KOps/s 38.3060 KOps/s $\textbf{\color{#35bf28}+25.23\%}$
test_plain_set_nested_inplace 58.8300μs 22.8751μs 43.7157 KOps/s 34.9390 KOps/s $\textbf{\color{#35bf28}+25.12\%}$
test_plain_set_stack_nested_inplace 53.6500μs 22.7765μs 43.9049 KOps/s 35.2923 KOps/s $\textbf{\color{#35bf28}+24.40\%}$
test_items 40.3750μs 4.1177μs 242.8548 KOps/s 240.0884 KOps/s $\color{#35bf28}+1.15\%$
test_items_nested 0.7091ms 0.3396ms 2.9448 KOps/s 2.7995 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_items_nested_locked 0.4021ms 0.3416ms 2.9278 KOps/s 2.7795 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_items_nested_leaf 0.1364ms 70.4245μs 14.1996 KOps/s 12.4789 KOps/s $\textbf{\color{#35bf28}+13.79\%}$
test_items_stack_nested 0.4744ms 0.3438ms 2.9084 KOps/s 2.7781 KOps/s $\color{#35bf28}+4.69\%$
test_items_stack_nested_leaf 0.1401ms 73.4047μs 13.6231 KOps/s 11.8696 KOps/s $\textbf{\color{#35bf28}+14.77\%}$
test_items_stack_nested_locked 0.5358ms 0.3432ms 2.9141 KOps/s 2.7521 KOps/s $\textbf{\color{#35bf28}+5.89\%}$
test_keys 43.1010μs 3.4933μs 286.2627 KOps/s 282.0129 KOps/s $\color{#35bf28}+1.51\%$
test_keys_nested 0.2272ms 0.1331ms 7.5115 KOps/s 5.4949 KOps/s $\textbf{\color{#35bf28}+36.70\%}$
test_keys_nested_locked 0.7383ms 0.1382ms 7.2378 KOps/s 5.3100 KOps/s $\textbf{\color{#35bf28}+36.31\%}$
test_keys_nested_leaf 0.2047ms 0.1138ms 8.7866 KOps/s 6.1738 KOps/s $\textbf{\color{#35bf28}+42.32\%}$
test_keys_stack_nested 0.2165ms 0.1310ms 7.6310 KOps/s 5.5049 KOps/s $\textbf{\color{#35bf28}+38.62\%}$
test_keys_stack_nested_leaf 0.2366ms 0.1120ms 8.9301 KOps/s 6.2364 KOps/s $\textbf{\color{#35bf28}+43.19\%}$
test_keys_stack_nested_locked 0.2583ms 0.1370ms 7.2991 KOps/s 5.3443 KOps/s $\textbf{\color{#35bf28}+36.58\%}$
test_values 6.0192μs 1.0232μs 977.3002 KOps/s 958.8754 KOps/s $\color{#35bf28}+1.92\%$
test_values_nested 0.1078ms 54.9309μs 18.2047 KOps/s 14.1172 KOps/s $\textbf{\color{#35bf28}+28.95\%}$
test_values_nested_locked 0.1040ms 54.7465μs 18.2660 KOps/s 14.2969 KOps/s $\textbf{\color{#35bf28}+27.76\%}$
test_values_nested_leaf 0.1060ms 59.7055μs 16.7489 KOps/s 11.7159 KOps/s $\textbf{\color{#35bf28}+42.96\%}$
test_values_stack_nested 0.1104ms 56.4951μs 17.7006 KOps/s 13.5126 KOps/s $\textbf{\color{#35bf28}+30.99\%}$
test_values_stack_nested_leaf 0.1093ms 58.8037μs 17.0057 KOps/s 12.0896 KOps/s $\textbf{\color{#35bf28}+40.66\%}$
test_values_stack_nested_locked 0.1111ms 56.7318μs 17.6268 KOps/s 13.8849 KOps/s $\textbf{\color{#35bf28}+26.95\%}$
test_membership 6.1229μs 0.7447μs 1.3428 MOps/s 1.0802 MOps/s $\textbf{\color{#35bf28}+24.31\%}$
test_membership_nested 40.3050μs 2.7546μs 363.0278 KOps/s 365.6285 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_nested_leaf 40.3150μs 2.7667μs 361.4357 KOps/s 363.7756 KOps/s $\color{#d91a1a}-0.64\%$
test_membership_stacked_nested 23.1440μs 2.7543μs 363.0693 KOps/s 363.8361 KOps/s $\color{#d91a1a}-0.21\%$
test_membership_stacked_nested_leaf 19.9380μs 2.7545μs 363.0371 KOps/s 359.2849 KOps/s $\color{#35bf28}+1.04\%$
test_membership_nested_last 97.9530μs 4.1270μs 242.3057 KOps/s 232.5992 KOps/s $\color{#35bf28}+4.17\%$
test_membership_nested_leaf_last 0.1033ms 4.2224μs 236.8343 KOps/s 232.5689 KOps/s $\color{#35bf28}+1.83\%$
test_membership_stacked_nested_last 38.0820μs 13.0259μs 76.7700 KOps/s 234.8537 KOps/s $\textbf{\color{#d91a1a}-67.31\%}$
test_membership_stacked_nested_leaf_last 63.7390μs 12.9831μs 77.0233 KOps/s 237.6599 KOps/s $\textbf{\color{#d91a1a}-67.59\%}$
test_nested_getleaf 31.7100μs 10.5999μs 94.3408 KOps/s 93.6465 KOps/s $\color{#35bf28}+0.74\%$
test_nested_get 66.0840μs 9.9909μs 100.0906 KOps/s 98.0224 KOps/s $\color{#35bf28}+2.11\%$
test_stacked_getleaf 62.1070μs 10.2914μs 97.1681 KOps/s 93.8099 KOps/s $\color{#35bf28}+3.58\%$
test_stacked_get 0.1156ms 9.8284μs 101.7457 KOps/s 97.5586 KOps/s $\color{#35bf28}+4.29\%$
test_nested_getitemleaf 75.1210μs 10.8964μs 91.7734 KOps/s 89.6881 KOps/s $\color{#35bf28}+2.33\%$
test_nested_getitem 38.3910μs 10.1402μs 98.6170 KOps/s 95.9631 KOps/s $\color{#35bf28}+2.77\%$
test_stacked_getitemleaf 34.4040μs 10.7634μs 92.9077 KOps/s 90.8159 KOps/s $\color{#35bf28}+2.30\%$
test_stacked_getitem 43.7210μs 9.9139μs 100.8685 KOps/s 96.0182 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_lock_nested 0.8903ms 0.4858ms 2.0584 KOps/s 1.9805 KOps/s $\color{#35bf28}+3.93\%$
test_lock_stack_nested 0.6683ms 0.4421ms 2.2619 KOps/s 2.1583 KOps/s $\color{#35bf28}+4.80\%$
test_unlock_nested 0.7998ms 0.4091ms 2.4441 KOps/s 2.3713 KOps/s $\color{#35bf28}+3.07\%$
test_unlock_stack_nested 0.4398ms 0.3591ms 2.7848 KOps/s 2.6521 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_flatten_speed 0.1930ms 91.1831μs 10.9669 KOps/s 9.8572 KOps/s $\textbf{\color{#35bf28}+11.26\%}$
test_unflatten_speed 0.8128ms 0.4748ms 2.1061 KOps/s 1.9510 KOps/s $\textbf{\color{#35bf28}+7.95\%}$
test_common_ops 6.7560ms 1.1131ms 898.4159 Ops/s 798.2737 Ops/s $\textbf{\color{#35bf28}+12.54\%}$
test_creation 17.9240μs 2.1287μs 469.7670 KOps/s 483.2972 KOps/s $\color{#d91a1a}-2.80\%$
test_creation_empty 0.1532ms 16.8329μs 59.4075 KOps/s 50.0455 KOps/s $\textbf{\color{#35bf28}+18.71\%}$
test_creation_nested_1 0.2316ms 19.5480μs 51.1561 KOps/s 42.5009 KOps/s $\textbf{\color{#35bf28}+20.36\%}$
test_creation_nested_2 57.1170μs 23.8911μs 41.8566 KOps/s 36.2703 KOps/s $\textbf{\color{#35bf28}+15.40\%}$
test_clone 1.3485ms 17.0067μs 58.8005 KOps/s 58.1392 KOps/s $\color{#35bf28}+1.14\%$
test_getitem[int] 0.7705ms 16.8307μs 59.4154 KOps/s 61.4012 KOps/s $\color{#d91a1a}-3.23\%$
test_getitem[slice_int] 0.1363ms 31.7463μs 31.4997 KOps/s 32.8008 KOps/s $\color{#d91a1a}-3.97\%$
test_getitem[range] 0.3331ms 61.3720μs 16.2941 KOps/s 18.1392 KOps/s $\textbf{\color{#d91a1a}-10.17\%}$
test_getitem[tuple] 0.1312ms 26.2278μs 38.1275 KOps/s 39.6729 KOps/s $\color{#d91a1a}-3.90\%$
test_getitem[list] 0.2894ms 55.0059μs 18.1799 KOps/s 19.4995 KOps/s $\textbf{\color{#d91a1a}-6.77\%}$
test_setitem_dim[int] 69.1190μs 34.8380μs 28.7043 KOps/s 30.4479 KOps/s $\textbf{\color{#d91a1a}-5.73\%}$
test_setitem_dim[slice_int] 0.1094ms 63.1414μs 15.8375 KOps/s 16.0440 KOps/s $\color{#d91a1a}-1.29\%$
test_setitem_dim[range] 0.1540ms 84.3772μs 11.8515 KOps/s 12.0970 KOps/s $\color{#d91a1a}-2.03\%$
test_setitem_dim[tuple] 96.8210μs 51.1167μs 19.5631 KOps/s 19.7998 KOps/s $\color{#d91a1a}-1.20\%$
test_setitem 0.1330ms 29.7376μs 33.6275 KOps/s 31.9162 KOps/s $\textbf{\color{#35bf28}+5.36\%}$
test_set 0.1523ms 28.9802μs 34.5063 KOps/s 32.0963 KOps/s $\textbf{\color{#35bf28}+7.51\%}$
test_set_shared 3.4310ms 0.2136ms 4.6818 KOps/s 4.5434 KOps/s $\color{#35bf28}+3.04\%$
test_update 0.1298ms 35.4735μs 28.1901 KOps/s 24.5163 KOps/s $\textbf{\color{#35bf28}+14.99\%}$
test_update_nested 0.1450ms 47.5345μs 21.0374 KOps/s 19.5299 KOps/s $\textbf{\color{#35bf28}+7.72\%}$
test_update__nested 0.3751ms 40.4692μs 24.7101 KOps/s 21.6401 KOps/s $\textbf{\color{#35bf28}+14.19\%}$
test_set_nested 76.9140μs 31.0509μs 32.2052 KOps/s 29.5770 KOps/s $\textbf{\color{#35bf28}+8.89\%}$
test_set_nested_new 0.1086ms 36.0845μs 27.7127 KOps/s 25.9822 KOps/s $\textbf{\color{#35bf28}+6.66\%}$
test_select 0.1472ms 53.9420μs 18.5384 KOps/s 17.7232 KOps/s $\color{#35bf28}+4.60\%$
test_select_nested 0.1183ms 60.5339μs 16.5197 KOps/s 16.3544 KOps/s $\color{#35bf28}+1.01\%$
test_exclude_nested 0.1370ms 75.8951μs 13.1761 KOps/s 13.1935 KOps/s $\color{#d91a1a}-0.13\%$
test_empty[True] 0.5032ms 0.3460ms 2.8904 KOps/s 2.5147 KOps/s $\textbf{\color{#35bf28}+14.94\%}$
test_empty[False] 10.8905μs 1.2353μs 809.5345 KOps/s 819.5789 KOps/s $\color{#d91a1a}-1.23\%$
test_unbind_speed 0.5091ms 0.3003ms 3.3298 KOps/s 3.3288 KOps/s $\color{#35bf28}+0.03\%$
test_unbind_speed_stack0 0.5958ms 0.2836ms 3.5262 KOps/s 3.4602 KOps/s $\color{#35bf28}+1.91\%$
test_unbind_speed_stack1 0.1017s 0.8665ms 1.1541 KOps/s 1.5257 KOps/s $\textbf{\color{#d91a1a}-24.35\%}$
test_split 0.1004s 2.2582ms 442.8390 Ops/s 446.3209 Ops/s $\color{#d91a1a}-0.78\%$
test_chunk 3.3105ms 2.0561ms 486.3483 Ops/s 450.9768 Ops/s $\textbf{\color{#35bf28}+7.84\%}$
test_creation[device0] 0.2477ms 0.1155ms 8.6560 KOps/s 8.4757 KOps/s $\color{#35bf28}+2.13\%$
test_creation_from_tensor 3.7533ms 0.1175ms 8.5097 KOps/s 8.5793 KOps/s $\color{#d91a1a}-0.81\%$
test_add_one[memmap_tensor0] 0.2423ms 7.0330μs 142.1868 KOps/s 143.7132 KOps/s $\color{#d91a1a}-1.06\%$
test_contiguous[memmap_tensor0] 16.9420μs 1.9206μs 520.6819 KOps/s 513.8350 KOps/s $\color{#35bf28}+1.33\%$
test_stack[memmap_tensor0] 61.4460μs 5.4817μs 182.4267 KOps/s 186.3303 KOps/s $\color{#d91a1a}-2.09\%$
test_memmaptd_index 1.1194ms 0.4064ms 2.4604 KOps/s 2.4788 KOps/s $\color{#d91a1a}-0.74\%$
test_memmaptd_index_astensor 0.9001ms 0.4868ms 2.0542 KOps/s 1.9697 KOps/s $\color{#35bf28}+4.29\%$
test_memmaptd_index_op 1.6935ms 1.0186ms 981.7012 Ops/s 939.0198 Ops/s $\color{#35bf28}+4.55\%$
test_serialize_model 0.2204s 0.1288s 7.7612 Ops/s 8.3625 Ops/s $\textbf{\color{#d91a1a}-7.19\%}$
test_serialize_model_pickle 0.4863s 0.4043s 2.4737 Ops/s 2.5539 Ops/s $\color{#d91a1a}-3.14\%$
test_serialize_weights 0.1281s 0.1180s 8.4770 Ops/s 7.5318 Ops/s $\textbf{\color{#35bf28}+12.55\%}$
test_serialize_weights_returnearly 0.1848s 0.1662s 6.0160 Ops/s 6.2356 Ops/s $\color{#d91a1a}-3.52\%$
test_serialize_weights_pickle 0.5092s 0.4124s 2.4248 Ops/s 1.1914 Ops/s $\textbf{\color{#35bf28}+103.52\%}$
test_serialize_weights_filesystem 0.2545s 0.1597s 6.2609 Ops/s 7.1234 Ops/s $\textbf{\color{#d91a1a}-12.11\%}$
test_serialize_model_filesystem 0.1579s 0.1492s 6.7029 Ops/s 6.4489 Ops/s $\color{#35bf28}+3.94\%$
test_reshape_pytree 94.8080μs 39.4816μs 25.3282 KOps/s 25.3056 KOps/s $\color{#35bf28}+0.09\%$
test_reshape_td 0.1214ms 47.0335μs 21.2614 KOps/s 21.8255 KOps/s $\color{#d91a1a}-2.58\%$
test_view_pytree 0.1070ms 39.6057μs 25.2489 KOps/s 25.6919 KOps/s $\color{#d91a1a}-1.72\%$
test_view_td 0.1119ms 52.8317μs 18.9280 KOps/s 19.2084 KOps/s $\color{#d91a1a}-1.46\%$
test_unbind_pytree 81.9930μs 35.7736μs 27.9536 KOps/s 28.0005 KOps/s $\color{#d91a1a}-0.17\%$
test_unbind_td 0.3176ms 44.4723μs 22.4859 KOps/s 22.6871 KOps/s $\color{#d91a1a}-0.89\%$
test_split_pytree 80.4810μs 37.7650μs 26.4795 KOps/s 26.5296 KOps/s $\color{#d91a1a}-0.19\%$
test_split_td 0.2022ms 58.7057μs 17.0341 KOps/s 17.6941 KOps/s $\color{#d91a1a}-3.73\%$
test_add_pytree 0.1809ms 45.7554μs 21.8554 KOps/s 23.1873 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_add_td 0.5826ms 84.8475μs 11.7859 KOps/s 11.7214 KOps/s $\color{#35bf28}+0.55\%$
test_compile_add_one_nested[tensordict-compile] 0.1340ms 71.7378μs 13.9397 KOps/s 13.6938 KOps/s $\color{#35bf28}+1.80\%$
test_compile_add_one_nested[tensordict-eager] 0.4022ms 0.1844ms 5.4231 KOps/s 4.9611 KOps/s $\textbf{\color{#35bf28}+9.31\%}$
test_compile_add_one_nested[pytree-compile] 0.1056ms 55.0855μs 18.1536 KOps/s 18.4879 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_add_one_nested[pytree-eager] 0.4016ms 0.1453ms 6.8828 KOps/s 6.9508 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_copy_nested[tensordict-compile] 96.2710μs 25.8852μs 38.6321 KOps/s 36.5849 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_compile_copy_nested[tensordict-eager] 0.1302ms 70.4170μs 14.2011 KOps/s 13.1710 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_compile_copy_nested[pytree-compile] 0.1786ms 79.8043μs 12.5306 KOps/s 12.5246 KOps/s $\color{#35bf28}+0.05\%$
test_compile_copy_nested[pytree-eager] 0.1268ms 67.6045μs 14.7919 KOps/s 14.2204 KOps/s $\color{#35bf28}+4.02\%$
test_compile_add_one_flat[tensordict-compile] 0.1962ms 0.1145ms 8.7339 KOps/s 8.2124 KOps/s $\textbf{\color{#35bf28}+6.35\%}$
test_compile_add_one_flat[tensordict-eager] 0.3949ms 0.2055ms 4.8669 KOps/s 4.0256 KOps/s $\textbf{\color{#35bf28}+20.90\%}$
test_compile_add_one_flat[tensorclass-compile] 0.1203ms 54.5434μs 18.3340 KOps/s 18.3820 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_add_one_flat[tensorclass-eager] 0.4801ms 69.6778μs 14.3518 KOps/s 13.2165 KOps/s $\textbf{\color{#35bf28}+8.59\%}$
test_compile_add_one_flat[pytree-compile] 0.1998ms 0.1122ms 8.9091 KOps/s 9.0338 KOps/s $\color{#d91a1a}-1.38\%$
test_compile_add_one_flat[pytree-eager] 0.6353ms 0.3020ms 3.3115 KOps/s 3.3048 KOps/s $\color{#35bf28}+0.20\%$
test_compile_add_self_flat[tensordict-eager] 0.5401ms 0.2223ms 4.4990 KOps/s 3.5354 KOps/s $\textbf{\color{#35bf28}+27.26\%}$
test_compile_add_self_flat[tensordict-compile] 0.1765ms 0.1138ms 8.7856 KOps/s 8.0882 KOps/s $\textbf{\color{#35bf28}+8.62\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1302ms 62.8456μs 15.9120 KOps/s 13.7504 KOps/s $\textbf{\color{#35bf28}+15.72\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1162ms 55.5115μs 18.0143 KOps/s 18.7636 KOps/s $\color{#d91a1a}-3.99\%$
test_compile_add_self_flat[pytree-eager] 0.6510ms 0.2452ms 4.0788 KOps/s 4.0545 KOps/s $\color{#35bf28}+0.60\%$
test_compile_add_self_flat[pytree-compile] 0.3988ms 0.1140ms 8.7714 KOps/s 9.0682 KOps/s $\color{#d91a1a}-3.27\%$
test_compile_copy_flat[tensordict-compile] 99.0850μs 20.2872μs 49.2921 KOps/s 35.0785 KOps/s $\textbf{\color{#35bf28}+40.52\%}$
test_compile_copy_flat[tensordict-eager] 0.1201ms 59.4285μs 16.8269 KOps/s 12.8840 KOps/s $\textbf{\color{#35bf28}+30.60\%}$
test_compile_copy_flat[pytree-compile] 0.1735ms 81.3497μs 12.2926 KOps/s 12.1380 KOps/s $\color{#35bf28}+1.27\%$
test_compile_copy_flat[pytree-eager] 0.1526ms 68.9795μs 14.4971 KOps/s 14.5152 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_assign_and_add[tensordict-compile] 0.3159ms 0.2153ms 4.6445 KOps/s 4.7026 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_assign_and_add[tensordict-eager] 3.0626ms 1.7280ms 578.7038 Ops/s 540.6940 Ops/s $\textbf{\color{#35bf28}+7.03\%}$
test_compile_assign_and_add[pytree-compile] 0.2828ms 0.2120ms 4.7178 KOps/s 4.8270 KOps/s $\color{#d91a1a}-2.26\%$
test_compile_assign_and_add[pytree-eager] 1.2393ms 1.1515ms 868.4358 Ops/s 858.7958 Ops/s $\color{#35bf28}+1.12\%$
test_compile_assign_and_add_stack[compile] 0.5378ms 0.4591ms 2.1782 KOps/s 2.2533 KOps/s $\color{#d91a1a}-3.33\%$
test_compile_assign_and_add_stack[eager] 4.0685ms 3.9331ms 254.2534 Ops/s 177.8062 Ops/s $\textbf{\color{#35bf28}+42.99\%}$
test_compile_indexing[tensor-tensordict-compile] 0.1073ms 45.3874μs 22.0326 KOps/s 23.6466 KOps/s $\textbf{\color{#d91a1a}-6.83\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5274ms 51.5643μs 19.3933 KOps/s 20.7981 KOps/s $\textbf{\color{#d91a1a}-6.75\%}$
test_compile_indexing[tensor-tensorclass-compile] 87.5640μs 37.9871μs 26.3248 KOps/s 27.8972 KOps/s $\textbf{\color{#d91a1a}-5.64\%}$
test_compile_indexing[tensor-tensorclass-eager] 67.4460μs 30.3214μs 32.9800 KOps/s 34.5550 KOps/s $\color{#d91a1a}-4.56\%$
test_compile_indexing[tensor-pytree-compile] 84.3780μs 38.2366μs 26.1530 KOps/s 26.9185 KOps/s $\color{#d91a1a}-2.84\%$
test_compile_indexing[tensor-pytree-eager] 95.4490μs 30.4062μs 32.8880 KOps/s 33.7286 KOps/s $\color{#d91a1a}-2.49\%$
test_compile_indexing[slice-tensordict-compile] 0.1874ms 78.6874μs 12.7085 KOps/s 12.8977 KOps/s $\color{#d91a1a}-1.47\%$
test_compile_indexing[slice-tensordict-eager] 0.6009ms 30.2885μs 33.0158 KOps/s 34.7950 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1488ms 71.9681μs 13.8950 KOps/s 14.1341 KOps/s $\color{#d91a1a}-1.69\%$
test_compile_indexing[slice-tensorclass-eager] 57.2070μs 23.9160μs 41.8130 KOps/s 42.0974 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_indexing[slice-pytree-compile] 0.1729ms 71.7288μs 13.9414 KOps/s 13.9376 KOps/s $\color{#35bf28}+0.03\%$
test_compile_indexing[slice-pytree-eager] 73.8390μs 23.7756μs 42.0600 KOps/s 42.4165 KOps/s $\color{#d91a1a}-0.84\%$
test_compile_indexing[int-tensordict-compile] 0.1473ms 79.6479μs 12.5553 KOps/s 12.6781 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_indexing[int-tensordict-eager] 0.8940ms 29.9858μs 33.3492 KOps/s 35.7493 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_compile_indexing[int-tensorclass-compile] 0.3861ms 74.5765μs 13.4091 KOps/s 14.0053 KOps/s $\color{#d91a1a}-4.26\%$
test_compile_indexing[int-tensorclass-eager] 62.7070μs 23.7300μs 42.1408 KOps/s 42.5013 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_indexing[int-pytree-compile] 0.1377ms 72.5381μs 13.7859 KOps/s 14.1959 KOps/s $\color{#d91a1a}-2.89\%$
test_compile_indexing[int-pytree-eager] 75.7010μs 24.1019μs 41.4904 KOps/s 42.4293 KOps/s $\color{#d91a1a}-2.21\%$
test_mod_add[eager] 70.0410μs 24.7384μs 40.4229 KOps/s 36.7966 KOps/s $\textbf{\color{#35bf28}+9.86\%}$
test_mod_add[compile] 0.2910ms 44.9061μs 22.2687 KOps/s 22.9547 KOps/s $\color{#d91a1a}-2.99\%$
test_mod_add[compile-overhead] 0.1028ms 43.8924μs 22.7830 KOps/s 22.9701 KOps/s $\color{#d91a1a}-0.81\%$
test_mod_wrap[eager] 0.3933ms 0.2156ms 4.6377 KOps/s 4.6658 KOps/s $\color{#d91a1a}-0.60\%$
test_mod_wrap[compile] 2.0011ms 0.2052ms 4.8735 KOps/s 4.9765 KOps/s $\color{#d91a1a}-2.07\%$
test_mod_wrap[compile-overhead] 2.0703ms 0.2072ms 4.8269 KOps/s 5.0190 KOps/s $\color{#d91a1a}-3.83\%$
test_mod_wrap_and_backward[eager] 12.7917ms 11.3613ms 88.0183 Ops/s 86.5272 Ops/s $\color{#35bf28}+1.72\%$
test_mod_wrap_and_backward[compile] 14.2751ms 12.4995ms 80.0031 Ops/s 79.8547 Ops/s $\color{#35bf28}+0.19\%$
test_mod_wrap_and_backward[compile-overhead] 16.4416ms 13.3595ms 74.8531 Ops/s 80.3805 Ops/s $\textbf{\color{#d91a1a}-6.88\%}$
test_seq_add[eager] 0.1699ms 91.0993μs 10.9770 KOps/s 10.3432 KOps/s $\textbf{\color{#35bf28}+6.13\%}$
test_seq_add[compile] 0.1477ms 60.9812μs 16.3985 KOps/s 17.3610 KOps/s $\textbf{\color{#d91a1a}-5.54\%}$
test_seq_add[compile-overhead] 0.1330ms 59.3165μs 16.8587 KOps/s 17.7466 KOps/s $\textbf{\color{#d91a1a}-5.00\%}$
test_seq_wrap[eager] 0.5070ms 0.3836ms 2.6071 KOps/s 2.5253 KOps/s $\color{#35bf28}+3.24\%$
test_seq_wrap[compile] 0.3288ms 0.2315ms 4.3191 KOps/s 4.4253 KOps/s $\color{#d91a1a}-2.40\%$
test_seq_wrap[compile-overhead] 0.4114ms 0.2320ms 4.3102 KOps/s 4.4359 KOps/s $\color{#d91a1a}-2.83\%$
test_func_call_runtime[False-eager] 1.5317ms 0.5585ms 1.7904 KOps/s 1.8274 KOps/s $\color{#d91a1a}-2.03\%$
test_func_call_runtime[False-compile] 0.5769ms 0.4314ms 2.3180 KOps/s 2.3517 KOps/s $\color{#d91a1a}-1.43\%$
test_func_call_runtime[False-compile-overhead] 0.5890ms 0.4304ms 2.3233 KOps/s 2.3686 KOps/s $\color{#d91a1a}-1.91\%$
test_func_call_runtime[True-eager] 1.0982ms 0.7751ms 1.2902 KOps/s 1.3222 KOps/s $\color{#d91a1a}-2.43\%$
test_func_call_runtime[True-compile] 0.7678ms 0.4756ms 2.1025 KOps/s 2.1690 KOps/s $\color{#d91a1a}-3.07\%$
test_func_call_runtime[True-compile-overhead] 0.5586ms 0.4703ms 2.1263 KOps/s 2.1577 KOps/s $\color{#d91a1a}-1.46\%$
test_func_call_cm_runtime[False-eager] 0.9458ms 0.5460ms 1.8314 KOps/s 1.8254 KOps/s $\color{#35bf28}+0.33\%$
test_func_call_cm_runtime[False-compile] 0.5450ms 0.4330ms 2.3093 KOps/s 2.3562 KOps/s $\color{#d91a1a}-1.99\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6401ms 0.4325ms 2.3124 KOps/s 2.3646 KOps/s $\color{#d91a1a}-2.21\%$
test_func_call_cm_runtime[True-eager] 1.4596ms 0.9016ms 1.1091 KOps/s 1.0999 KOps/s $\color{#35bf28}+0.84\%$
test_func_call_cm_runtime[True-compile] 1.0061ms 0.5070ms 1.9724 KOps/s 2.0499 KOps/s $\color{#d91a1a}-3.78\%$
test_func_call_cm_runtime[True-compile-overhead] 0.7812ms 0.4972ms 2.0114 KOps/s 2.0415 KOps/s $\color{#d91a1a}-1.48\%$
test_vmap_func_call_cm_runtime[eager] 2.6180ms 1.9216ms 520.3874 Ops/s 530.1247 Ops/s $\color{#d91a1a}-1.84\%$
test_vmap_func_call_cm_runtime[compile] 0.8812ms 0.5165ms 1.9361 KOps/s 1.9338 KOps/s $\color{#35bf28}+0.12\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.7318ms 0.5264ms 1.8996 KOps/s 1.9368 KOps/s $\color{#d91a1a}-1.92\%$
test_distributed 0.2988ms 0.1294ms 7.7296 KOps/s 7.8179 KOps/s $\color{#d91a1a}-1.13\%$
test_tdmodule 50.9360μs 17.8901μs 55.8969 KOps/s 49.8083 KOps/s $\textbf{\color{#35bf28}+12.22\%}$
test_tdmodule_dispatch 76.0120μs 34.7097μs 28.8104 KOps/s 25.4371 KOps/s $\textbf{\color{#35bf28}+13.26\%}$
test_tdseq 60.1320μs 19.9842μs 50.0395 KOps/s 44.8797 KOps/s $\textbf{\color{#35bf28}+11.50\%}$
test_tdseq_dispatch 74.4790μs 39.6027μs 25.2508 KOps/s 22.1753 KOps/s $\textbf{\color{#35bf28}+13.87\%}$
test_instantiation_functorch 2.8225ms 1.5556ms 642.8553 Ops/s 657.5005 Ops/s $\color{#d91a1a}-2.23\%$
test_exec_functorch 0.4630ms 0.1825ms 5.4801 KOps/s 5.4044 KOps/s $\color{#35bf28}+1.40\%$
test_exec_functional_call 0.2885ms 0.1758ms 5.6882 KOps/s 5.5691 KOps/s $\color{#35bf28}+2.14\%$
test_exec_td_decorator 0.5803ms 0.2311ms 4.3281 KOps/s 4.1865 KOps/s $\color{#35bf28}+3.38\%$
test_vmap_mlp_speed_decorator[True-True] 0.7846ms 0.6432ms 1.5548 KOps/s 1.4816 KOps/s $\color{#35bf28}+4.94\%$
test_vmap_mlp_speed_decorator[True-False] 0.9841ms 0.6491ms 1.5406 KOps/s 1.5376 KOps/s $\color{#35bf28}+0.19\%$
test_vmap_mlp_speed_decorator[False-True] 0.9726ms 0.5343ms 1.8716 KOps/s 1.8778 KOps/s $\color{#d91a1a}-0.33\%$
test_vmap_mlp_speed_decorator[False-False] 0.8011ms 0.5385ms 1.8570 KOps/s 1.8746 KOps/s $\color{#d91a1a}-0.94\%$
test_to_module_speed[True] 1.5513ms 1.3080ms 764.5231 Ops/s 710.2315 Ops/s $\textbf{\color{#35bf28}+7.64\%}$
test_to_module_speed[False] 1.4558ms 1.2573ms 795.3443 Ops/s 743.5256 Ops/s $\textbf{\color{#35bf28}+6.97\%}$
test_tc_init 0.1043ms 41.5140μs 24.0883 KOps/s 20.3004 KOps/s $\textbf{\color{#35bf28}+18.66\%}$
test_tc_init_nested 0.1525ms 80.9316μs 12.3561 KOps/s 9.8198 KOps/s $\textbf{\color{#35bf28}+25.83\%}$
test_tc_first_layer_tensor 48.1900μs 1.5060μs 663.9985 KOps/s 665.0462 KOps/s $\color{#d91a1a}-0.16\%$
test_tc_first_layer_nontensor 30.2260μs 4.6956μs 212.9675 KOps/s 215.2457 KOps/s $\color{#d91a1a}-1.06\%$
test_tc_second_layer_tensor 29.7760μs 2.7827μs 359.3633 KOps/s 358.4541 KOps/s $\color{#35bf28}+0.25\%$
test_tc_second_layer_nontensor 53.2390μs 5.9628μs 167.7055 KOps/s 161.7132 KOps/s $\color{#35bf28}+3.71\%$
test_unbind 0.2521s 16.1417ms 61.9515 Ops/s 84.4889 Ops/s $\textbf{\color{#d91a1a}-26.68\%}$
test_full_like 10.3004ms 8.5143ms 117.4490 Ops/s 142.3334 Ops/s $\textbf{\color{#d91a1a}-17.48\%}$
test_zeros_like 4.0160ms 3.3857ms 295.3576 Ops/s 366.0661 Ops/s $\textbf{\color{#d91a1a}-19.32\%}$
test_ones_like 4.6103ms 3.8859ms 257.3425 Ops/s 312.8564 Ops/s $\textbf{\color{#d91a1a}-17.74\%}$
test_clone 6.9317ms 5.6819ms 175.9976 Ops/s 201.6075 Ops/s $\textbf{\color{#d91a1a}-12.70\%}$
test_squeeze 89.5270μs 11.7123μs 85.3802 KOps/s 85.6542 KOps/s $\color{#d91a1a}-0.32\%$
test_unsqueeze 0.1973ms 89.4726μs 11.1766 KOps/s 11.1594 KOps/s $\color{#35bf28}+0.15\%$
test_split 0.4020ms 0.1983ms 5.0421 KOps/s 5.2946 KOps/s $\color{#d91a1a}-4.77\%$
test_permute 0.4850ms 0.2233ms 4.4780 KOps/s 4.6805 KOps/s $\color{#d91a1a}-4.33\%$
test_stack 31.6116ms 26.1855ms 38.1891 Ops/s 38.2287 Ops/s $\color{#d91a1a}-0.10\%$
test_cat 28.2903ms 26.1830ms 38.1928 Ops/s 40.1111 Ops/s $\color{#d91a1a}-4.78\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 222. Improved: $\large\color{#35bf28}60$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.7410μs 13.2511μs 75.4654 KOps/s 64.3158 KOps/s $\textbf{\color{#35bf28}+17.34\%}$
test_plain_set_stack_nested 34.4610μs 13.4414μs 74.3972 KOps/s 62.7238 KOps/s $\textbf{\color{#35bf28}+18.61\%}$
test_plain_set_nested_inplace 0.1123ms 14.3783μs 69.5491 KOps/s 58.4347 KOps/s $\textbf{\color{#35bf28}+19.02\%}$
test_plain_set_stack_nested_inplace 0.1155ms 14.2812μs 70.0219 KOps/s 58.8784 KOps/s $\textbf{\color{#35bf28}+18.93\%}$
test_items 25.8510μs 2.8813μs 347.0664 KOps/s 339.3808 KOps/s $\color{#35bf28}+2.26\%$
test_items_nested 0.3905ms 0.3232ms 3.0936 KOps/s 3.0488 KOps/s $\color{#35bf28}+1.47\%$
test_items_nested_locked 0.3745ms 0.3229ms 3.0971 KOps/s 3.0108 KOps/s $\color{#35bf28}+2.87\%$
test_items_nested_leaf 0.1316ms 58.8213μs 17.0006 KOps/s 15.8392 KOps/s $\textbf{\color{#35bf28}+7.33\%}$
test_items_stack_nested 0.3714ms 0.3239ms 3.0874 KOps/s 3.0354 KOps/s $\color{#35bf28}+1.72\%$
test_items_stack_nested_leaf 87.0410μs 58.1053μs 17.2101 KOps/s 15.8691 KOps/s $\textbf{\color{#35bf28}+8.45\%}$
test_items_stack_nested_locked 0.4664ms 0.3257ms 3.0705 KOps/s 3.0248 KOps/s $\color{#35bf28}+1.51\%$
test_keys 26.4500μs 3.4663μs 288.4919 KOps/s 290.2263 KOps/s $\color{#d91a1a}-0.60\%$
test_keys_nested 0.1045ms 71.7823μs 13.9310 KOps/s 10.6284 KOps/s $\textbf{\color{#35bf28}+31.07\%}$
test_keys_nested_locked 0.7654ms 76.9944μs 12.9880 KOps/s 9.8713 KOps/s $\textbf{\color{#35bf28}+31.57\%}$
test_keys_nested_leaf 0.1032ms 61.5603μs 16.2442 KOps/s 11.6796 KOps/s $\textbf{\color{#35bf28}+39.08\%}$
test_keys_stack_nested 0.1120ms 71.2926μs 14.0267 KOps/s 10.5766 KOps/s $\textbf{\color{#35bf28}+32.62\%}$
test_keys_stack_nested_leaf 90.8710μs 61.6755μs 16.2139 KOps/s 11.6434 KOps/s $\textbf{\color{#35bf28}+39.25\%}$
test_keys_stack_nested_locked 0.1047ms 77.5286μs 12.8985 KOps/s 9.9447 KOps/s $\textbf{\color{#35bf28}+29.70\%}$
test_values 4.9768μs 0.8467μs 1.1810 MOps/s 1.1861 MOps/s $\color{#d91a1a}-0.43\%$
test_values_nested 80.7110μs 31.5281μs 31.7178 KOps/s 26.3006 KOps/s $\textbf{\color{#35bf28}+20.60\%}$
test_values_nested_locked 85.3110μs 32.9198μs 30.3769 KOps/s 25.3202 KOps/s $\textbf{\color{#35bf28}+19.97\%}$
test_values_nested_leaf 57.8710μs 33.5585μs 29.7987 KOps/s 22.0075 KOps/s $\textbf{\color{#35bf28}+35.40\%}$
test_values_stack_nested 0.2099ms 31.5333μs 31.7125 KOps/s 26.2321 KOps/s $\textbf{\color{#35bf28}+20.89\%}$
test_values_stack_nested_leaf 0.1307ms 33.8274μs 29.5618 KOps/s 21.3709 KOps/s $\textbf{\color{#35bf28}+38.33\%}$
test_values_stack_nested_locked 61.8600μs 33.3324μs 30.0008 KOps/s 25.1800 KOps/s $\textbf{\color{#35bf28}+19.15\%}$
test_membership 1.7316μs 0.5116μs 1.9546 MOps/s 1.9702 MOps/s $\color{#d91a1a}-0.79\%$
test_membership_nested 16.7055μs 1.9103μs 523.4672 KOps/s 511.2687 KOps/s $\color{#35bf28}+2.39\%$
test_membership_nested_leaf 66.2243μs 1.8944μs 527.8760 KOps/s 529.5573 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_stacked_nested 27.7000μs 1.9722μs 507.0483 KOps/s 511.9547 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_stacked_nested_leaf 28.7300μs 1.9813μs 504.7072 KOps/s 517.1124 KOps/s $\color{#d91a1a}-2.40\%$
test_membership_nested_last 26.6000μs 2.8507μs 350.7876 KOps/s 335.2675 KOps/s $\color{#35bf28}+4.63\%$
test_membership_nested_leaf_last 29.9600μs 2.8484μs 351.0698 KOps/s 335.8328 KOps/s $\color{#35bf28}+4.54\%$
test_membership_stacked_nested_last 27.9910μs 2.8468μs 351.2674 KOps/s 333.0448 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_membership_stacked_nested_leaf_last 23.1610μs 2.8249μs 353.9911 KOps/s 336.6053 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_nested_getleaf 36.0010μs 6.0295μs 165.8504 KOps/s 167.0368 KOps/s $\color{#d91a1a}-0.71\%$
test_nested_get 30.6600μs 5.7068μs 175.2311 KOps/s 176.8073 KOps/s $\color{#d91a1a}-0.89\%$
test_stacked_getleaf 25.1300μs 6.0180μs 166.1694 KOps/s 166.9578 KOps/s $\color{#d91a1a}-0.47\%$
test_stacked_get 32.4300μs 5.7076μs 175.2046 KOps/s 176.6381 KOps/s $\color{#d91a1a}-0.81\%$
test_nested_getitemleaf 26.3810μs 6.0873μs 164.2775 KOps/s 165.1851 KOps/s $\color{#d91a1a}-0.55\%$
test_nested_getitem 30.9610μs 5.7712μs 173.2744 KOps/s 173.6592 KOps/s $\color{#d91a1a}-0.22\%$
test_stacked_getitemleaf 38.1910μs 6.1328μs 163.0580 KOps/s 165.3179 KOps/s $\color{#d91a1a}-1.37\%$
test_stacked_getitem 0.1953ms 5.7705μs 173.2958 KOps/s 173.9332 KOps/s $\color{#d91a1a}-0.37\%$
test_lock_nested 1.1972ms 0.4202ms 2.3796 KOps/s 2.3067 KOps/s $\color{#35bf28}+3.16\%$
test_lock_stack_nested 0.5057ms 0.3887ms 2.5728 KOps/s 2.4783 KOps/s $\color{#35bf28}+3.82\%$
test_unlock_nested 0.7827ms 0.3546ms 2.8200 KOps/s 2.7107 KOps/s $\color{#35bf28}+4.03\%$
test_unlock_stack_nested 0.4297ms 0.3247ms 3.0802 KOps/s 2.9707 KOps/s $\color{#35bf28}+3.69\%$
test_flatten_speed 0.1155ms 73.1371μs 13.6729 KOps/s 12.8702 KOps/s $\textbf{\color{#35bf28}+6.24\%}$
test_unflatten_speed 0.3261ms 0.2943ms 3.3978 KOps/s 3.1245 KOps/s $\textbf{\color{#35bf28}+8.75\%}$
test_common_ops 1.7723ms 1.1617ms 860.8434 Ops/s 829.1264 Ops/s $\color{#35bf28}+3.83\%$
test_creation 26.8000μs 1.4888μs 671.6893 KOps/s 671.0429 KOps/s $\color{#35bf28}+0.10\%$
test_creation_empty 36.3400μs 13.2256μs 75.6107 KOps/s 72.3391 KOps/s $\color{#35bf28}+4.52\%$
test_creation_nested_1 38.1310μs 14.7279μs 67.8985 KOps/s 64.9717 KOps/s $\color{#35bf28}+4.50\%$
test_creation_nested_2 0.1590ms 17.8139μs 56.1361 KOps/s 55.9864 KOps/s $\color{#35bf28}+0.27\%$
test_clone 0.2137ms 28.3594μs 35.2617 KOps/s 33.7838 KOps/s $\color{#35bf28}+4.37\%$
test_getitem[int] 1.1198ms 16.1814μs 61.7992 KOps/s 59.4819 KOps/s $\color{#35bf28}+3.90\%$
test_getitem[slice_int] 0.1296ms 28.9717μs 34.5164 KOps/s 34.0418 KOps/s $\color{#35bf28}+1.39\%$
test_getitem[range] 0.2584ms 0.1155ms 8.6584 KOps/s 8.9562 KOps/s $\color{#d91a1a}-3.33\%$
test_getitem[tuple] 95.6467ms 31.0818μs 32.1732 KOps/s 39.1356 KOps/s $\textbf{\color{#d91a1a}-17.79\%}$
test_getitem[list] 0.2223ms 0.1015ms 9.8568 KOps/s 9.7472 KOps/s $\color{#35bf28}+1.12\%$
test_setitem_dim[int] 73.6010μs 44.6589μs 22.3919 KOps/s 22.6085 KOps/s $\color{#d91a1a}-0.96\%$
test_setitem_dim[slice_int] 92.3910μs 66.7575μs 14.9796 KOps/s 14.8573 KOps/s $\color{#35bf28}+0.82\%$
test_setitem_dim[range] 0.1624ms 0.1286ms 7.7778 KOps/s 7.8247 KOps/s $\color{#d91a1a}-0.60\%$
test_setitem_dim[tuple] 0.2073ms 60.7023μs 16.4739 KOps/s 16.6689 KOps/s $\color{#d91a1a}-1.17\%$
test_setitem 0.1872ms 39.6158μs 25.2424 KOps/s 24.8585 KOps/s $\color{#35bf28}+1.54\%$
test_set 0.1900ms 37.7828μs 26.4671 KOps/s 25.2042 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_set_shared 0.3197ms 49.7724μs 20.0914 KOps/s 18.6894 KOps/s $\textbf{\color{#35bf28}+7.50\%}$
test_update 0.1957ms 45.5377μs 21.9598 KOps/s 20.9882 KOps/s $\color{#35bf28}+4.63\%$
test_update_nested 0.1978ms 54.2353μs 18.4382 KOps/s 18.1476 KOps/s $\color{#35bf28}+1.60\%$
test_update__nested 0.4702ms 64.8449μs 15.4214 KOps/s 15.7770 KOps/s $\color{#d91a1a}-2.25\%$
test_set_nested 0.1910ms 41.5301μs 24.0789 KOps/s 23.7422 KOps/s $\color{#35bf28}+1.42\%$
test_set_nested_new 0.1909ms 43.5553μs 22.9593 KOps/s 21.8730 KOps/s $\color{#35bf28}+4.97\%$
test_select 0.2143ms 57.3646μs 17.4324 KOps/s 17.0572 KOps/s $\color{#35bf28}+2.20\%$
test_select_nested 66.5110μs 42.4638μs 23.5495 KOps/s 24.4498 KOps/s $\color{#d91a1a}-3.68\%$
test_exclude_nested 87.5310μs 59.4864μs 16.8106 KOps/s 17.1393 KOps/s $\color{#d91a1a}-1.92\%$
test_empty[True] 0.3001ms 0.2571ms 3.8892 KOps/s 3.5243 KOps/s $\textbf{\color{#35bf28}+10.36\%}$
test_empty[False] 2.9890μs 0.7571μs 1.3208 MOps/s 1.3243 MOps/s $\color{#d91a1a}-0.26\%$
test_to 51.0910μs 25.0869μs 39.8614 KOps/s 38.3712 KOps/s $\color{#35bf28}+3.88\%$
test_to_nonblocking 57.9210μs 23.9552μs 41.7447 KOps/s 40.7033 KOps/s $\color{#35bf28}+2.56\%$
test_unbind_speed 1.1173ms 0.2790ms 3.5847 KOps/s 3.6598 KOps/s $\color{#d91a1a}-2.05\%$
test_unbind_speed_stack0 0.3769ms 0.2747ms 3.6409 KOps/s 3.6276 KOps/s $\color{#35bf28}+0.37\%$
test_unbind_speed_stack1 94.1016ms 0.7043ms 1.4199 KOps/s 1.3986 KOps/s $\color{#35bf28}+1.52\%$
test_split 95.8361ms 2.1297ms 469.5537 Ops/s 445.1482 Ops/s $\textbf{\color{#35bf28}+5.48\%}$
test_chunk 97.1866ms 2.1474ms 465.6784 Ops/s 447.3076 Ops/s $\color{#35bf28}+4.11\%$
test_to[False] 3.3922ms 3.1777ms 314.6895 Ops/s 294.2220 Ops/s $\textbf{\color{#35bf28}+6.96\%}$
test_to[True] 4.5318ms 4.1691ms 239.8588 Ops/s 225.9464 Ops/s $\textbf{\color{#35bf28}+6.16\%}$
test_to_njt[False] 0.3252s 0.2462s 4.0610 Ops/s 4.3046 Ops/s $\textbf{\color{#d91a1a}-5.66\%}$
test_to_njt[True] 0.3610s 0.2751s 3.6344 Ops/s 3.5367 Ops/s $\color{#35bf28}+2.76\%$
test_creation[device0] 0.3381ms 0.1269ms 7.8799 KOps/s 7.7683 KOps/s $\color{#35bf28}+1.44\%$
test_creation_from_tensor 0.3901ms 0.1291ms 7.7439 KOps/s 7.7077 KOps/s $\color{#35bf28}+0.47\%$
test_add_one[memmap_tensor0] 0.1452ms 8.5055μs 117.5714 KOps/s 111.3350 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_contiguous[memmap_tensor0] 22.2210μs 2.1633μs 462.2524 KOps/s 457.3643 KOps/s $\color{#35bf28}+1.07\%$
test_stack[memmap_tensor0] 0.1590ms 6.8111μs 146.8189 KOps/s 142.3959 KOps/s $\color{#35bf28}+3.11\%$
test_memmaptd_index 1.0183ms 0.4138ms 2.4165 KOps/s 2.3423 KOps/s $\color{#35bf28}+3.17\%$
test_memmaptd_index_astensor 0.9500ms 0.4727ms 2.1154 KOps/s 2.0033 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_memmaptd_index_op 1.3387ms 0.9544ms 1.0478 KOps/s 1.0084 KOps/s $\color{#35bf28}+3.90\%$
test_serialize_model 0.1334s 0.1313s 7.6149 Ops/s 7.6623 Ops/s $\color{#d91a1a}-0.62\%$
test_serialize_model_pickle 1.3497s 1.2167s 0.8219 Ops/s 0.8416 Ops/s $\color{#d91a1a}-2.34\%$
test_serialize_weights 0.1311s 0.1298s 7.7016 Ops/s 7.6799 Ops/s $\color{#35bf28}+0.28\%$
test_serialize_weights_returnearly 0.2292s 55.9839ms 17.8623 Ops/s 21.2561 Ops/s $\textbf{\color{#d91a1a}-15.97\%}$
test_serialize_weights_pickle 1.3731s 1.1957s 0.8363 Ops/s 0.8223 Ops/s $\color{#35bf28}+1.70\%$
test_reshape_pytree 88.2710μs 34.9471μs 28.6147 KOps/s 26.7844 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_reshape_td 0.1419ms 42.6565μs 23.4431 KOps/s 23.9993 KOps/s $\color{#d91a1a}-2.32\%$
test_view_pytree 0.1431ms 35.0451μs 28.5347 KOps/s 28.1478 KOps/s $\color{#35bf28}+1.37\%$
test_view_td 0.1130ms 45.0915μs 22.1771 KOps/s 21.7929 KOps/s $\color{#35bf28}+1.76\%$
test_unbind_pytree 0.1319ms 33.4365μs 29.9074 KOps/s 29.2008 KOps/s $\color{#35bf28}+2.42\%$
test_unbind_td 98.6471ms 48.5896μs 20.5805 KOps/s 22.9740 KOps/s $\textbf{\color{#d91a1a}-10.42\%}$
test_split_pytree 0.1748ms 45.4430μs 22.0056 KOps/s 21.7628 KOps/s $\color{#35bf28}+1.12\%$
test_split_td 0.1790ms 56.0053μs 17.8555 KOps/s 17.5085 KOps/s $\color{#35bf28}+1.98\%$
test_add_pytree 0.2057ms 55.1721μs 18.1251 KOps/s 17.4752 KOps/s $\color{#35bf28}+3.72\%$
test_add_td 0.2579ms 87.8822μs 11.3789 KOps/s 11.0807 KOps/s $\color{#35bf28}+2.69\%$
test_compile_add_one_nested[tensordict-compile] 0.3103ms 0.1611ms 6.2088 KOps/s 6.1492 KOps/s $\color{#35bf28}+0.97\%$
test_compile_add_one_nested[tensordict-eager] 0.3088ms 0.1514ms 6.6055 KOps/s 6.2683 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_compile_add_one_nested[pytree-compile] 0.2934ms 0.1527ms 6.5501 KOps/s 6.0905 KOps/s $\textbf{\color{#35bf28}+7.55\%}$
test_compile_add_one_nested[pytree-eager] 0.3247ms 0.1771ms 5.6480 KOps/s 5.4722 KOps/s $\color{#35bf28}+3.21\%$
test_compile_copy_nested[tensordict-compile] 0.1498ms 21.4729μs 46.5704 KOps/s 47.2176 KOps/s $\color{#d91a1a}-1.37\%$
test_compile_copy_nested[tensordict-eager] 81.2510μs 44.9625μs 22.2408 KOps/s 20.5597 KOps/s $\textbf{\color{#35bf28}+8.18\%}$
test_compile_copy_nested[pytree-compile] 0.4599ms 65.4711μs 15.2739 KOps/s 15.1315 KOps/s $\color{#35bf28}+0.94\%$
test_compile_copy_nested[pytree-eager] 79.5310μs 50.0717μs 19.9714 KOps/s 19.8567 KOps/s $\color{#35bf28}+0.58\%$
test_compile_add_one_flat[tensordict-compile] 0.4211ms 0.3098ms 3.2280 KOps/s 3.1400 KOps/s $\color{#35bf28}+2.80\%$
test_compile_add_one_flat[tensordict-eager] 0.3440ms 0.2140ms 4.6733 KOps/s 4.2935 KOps/s $\textbf{\color{#35bf28}+8.85\%}$
test_compile_add_one_flat[tensorclass-compile] 0.2850ms 0.1281ms 7.8046 KOps/s 7.6571 KOps/s $\color{#35bf28}+1.93\%$
test_compile_add_one_flat[tensorclass-eager] 0.1921ms 58.4381μs 17.1121 KOps/s 14.7166 KOps/s $\textbf{\color{#35bf28}+16.28\%}$
test_compile_add_one_flat[pytree-compile] 0.4654ms 0.3178ms 3.1469 KOps/s 3.0823 KOps/s $\color{#35bf28}+2.10\%$
test_compile_add_one_flat[pytree-eager] 0.7619ms 0.5987ms 1.6702 KOps/s 1.5210 KOps/s $\textbf{\color{#35bf28}+9.81\%}$
test_compile_add_self_flat[tensordict-eager] 0.4048ms 0.2539ms 3.9388 KOps/s 3.4986 KOps/s $\textbf{\color{#35bf28}+12.58\%}$
test_compile_add_self_flat[tensordict-compile] 0.4459ms 0.3116ms 3.2093 KOps/s 3.1342 KOps/s $\color{#35bf28}+2.40\%$
test_compile_add_self_flat[tensorclass-eager] 0.2181ms 68.3597μs 14.6285 KOps/s 13.1206 KOps/s $\textbf{\color{#35bf28}+11.49\%}$
test_compile_add_self_flat[tensorclass-compile] 0.3016ms 0.1292ms 7.7388 KOps/s 7.5526 KOps/s $\color{#35bf28}+2.47\%$
test_compile_add_self_flat[pytree-eager] 0.6757ms 0.5046ms 1.9816 KOps/s 1.7465 KOps/s $\textbf{\color{#35bf28}+13.46\%}$
test_compile_add_self_flat[pytree-compile] 0.4643ms 0.3206ms 3.1190 KOps/s 3.0767 KOps/s $\color{#35bf28}+1.37\%$
test_compile_copy_flat[tensordict-compile] 0.1655ms 17.9428μs 55.7325 KOps/s 50.7746 KOps/s $\textbf{\color{#35bf28}+9.76\%}$
test_compile_copy_flat[tensordict-eager] 64.1710μs 31.4903μs 31.7558 KOps/s 24.7094 KOps/s $\textbf{\color{#35bf28}+28.52\%}$
test_compile_copy_flat[pytree-compile] 0.1577ms 69.7690μs 14.3330 KOps/s 14.3202 KOps/s $\color{#35bf28}+0.09\%$
test_compile_copy_flat[pytree-eager] 81.3110μs 52.0342μs 19.2181 KOps/s 19.6041 KOps/s $\color{#d91a1a}-1.97\%$
test_compile_assign_and_add[tensordict-compile] 2.4619ms 0.8368ms 1.1950 KOps/s 1.1298 KOps/s $\textbf{\color{#35bf28}+5.77\%}$
test_compile_assign_and_add[tensordict-eager] 3.2627ms 2.9992ms 333.4188 Ops/s 311.7406 Ops/s $\textbf{\color{#35bf28}+6.95\%}$
test_compile_assign_and_add[pytree-compile] 2.3890ms 0.8322ms 1.2016 KOps/s 1.1054 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_compile_assign_and_add[pytree-eager] 3.3224ms 3.0910ms 323.5227 Ops/s 310.9027 Ops/s $\color{#35bf28}+4.06\%$
test_compile_indexing[tensor-tensordict-compile] 0.2320ms 0.1206ms 8.2926 KOps/s 8.3241 KOps/s $\color{#d91a1a}-0.38\%$
test_compile_indexing[tensor-tensordict-eager] 0.2208ms 60.8046μs 16.4461 KOps/s 15.7253 KOps/s $\color{#35bf28}+4.58\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2590ms 0.1132ms 8.8337 KOps/s 8.7393 KOps/s $\color{#35bf28}+1.08\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1999ms 41.6314μs 24.0204 KOps/s 22.9450 KOps/s $\color{#35bf28}+4.69\%$
test_compile_indexing[tensor-pytree-compile] 0.2646ms 0.1140ms 8.7708 KOps/s 8.6229 KOps/s $\color{#35bf28}+1.71\%$
test_compile_indexing[tensor-pytree-eager] 0.1851ms 41.5988μs 24.0391 KOps/s 22.9716 KOps/s $\color{#35bf28}+4.65\%$
test_compile_indexing[slice-tensordict-compile] 0.3227ms 0.1517ms 6.5910 KOps/s 6.7149 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_indexing[slice-tensordict-eager] 0.1695ms 25.8097μs 38.7451 KOps/s 38.1579 KOps/s $\color{#35bf28}+1.54\%$
test_compile_indexing[slice-tensorclass-compile] 0.3275ms 0.1396ms 7.1639 KOps/s 6.9964 KOps/s $\color{#35bf28}+2.39\%$
test_compile_indexing[slice-tensorclass-eager] 0.1523ms 20.2823μs 49.3041 KOps/s 46.4260 KOps/s $\textbf{\color{#35bf28}+6.20\%}$
test_compile_indexing[slice-pytree-compile] 0.2778ms 0.1407ms 7.1051 KOps/s 6.9208 KOps/s $\color{#35bf28}+2.66\%$
test_compile_indexing[slice-pytree-eager] 0.1799ms 23.8332μs 41.9582 KOps/s 46.7144 KOps/s $\textbf{\color{#d91a1a}-10.18\%}$
test_compile_indexing[int-tensordict-compile] 0.3059ms 0.1495ms 6.6880 KOps/s 6.6291 KOps/s $\color{#35bf28}+0.89\%$
test_compile_indexing[int-tensordict-eager] 0.5167ms 25.5365μs 39.1596 KOps/s 37.4229 KOps/s $\color{#35bf28}+4.64\%$
test_compile_indexing[int-tensorclass-compile] 0.2883ms 0.1418ms 7.0501 KOps/s 6.9484 KOps/s $\color{#35bf28}+1.46\%$
test_compile_indexing[int-tensorclass-eager] 53.2710μs 20.2527μs 49.3762 KOps/s 46.4282 KOps/s $\textbf{\color{#35bf28}+6.35\%}$
test_compile_indexing[int-pytree-compile] 0.3035ms 0.1438ms 6.9518 KOps/s 6.9432 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[int-pytree-eager] 53.6410μs 20.0095μs 49.9763 KOps/s 46.3403 KOps/s $\textbf{\color{#35bf28}+7.85\%}$
test_mod_add[eager] 0.2029ms 31.3417μs 31.9063 KOps/s 32.9587 KOps/s $\color{#d91a1a}-3.19\%$
test_mod_add[compile] 0.2847ms 82.5584μs 12.1126 KOps/s 12.1125 KOps/s $+0.00\%$
test_mod_add[compile-overhead] 0.3079ms 0.1507ms 6.6365 KOps/s 5.8785 KOps/s $\textbf{\color{#35bf28}+12.89\%}$
test_mod_wrap[eager] 0.4402ms 0.2525ms 3.9601 KOps/s 4.0155 KOps/s $\color{#d91a1a}-1.38\%$
test_mod_wrap[compile] 0.6759ms 0.2885ms 3.4662 KOps/s 3.3185 KOps/s $\color{#35bf28}+4.45\%$
test_mod_wrap[compile-overhead] 7.5270ms 4.0217ms 248.6537 Ops/s 251.8105 Ops/s $\color{#d91a1a}-1.25\%$
test_mod_wrap_and_backward[eager] 1.5461ms 1.3580ms 736.3780 Ops/s 685.0928 Ops/s $\textbf{\color{#35bf28}+7.49\%}$
test_mod_wrap_and_backward[compile] 1.5795ms 1.3187ms 758.3118 Ops/s 684.7835 Ops/s $\textbf{\color{#35bf28}+10.74\%}$
test_mod_wrap_and_backward[compile-overhead] 1.3404ms 0.9020ms 1.1087 KOps/s 969.2714 Ops/s $\textbf{\color{#35bf28}+14.39\%}$
test_seq_add[eager] 0.2407ms 92.4745μs 10.8138 KOps/s 10.3659 KOps/s $\color{#35bf28}+4.32\%$
test_seq_add[compile] 0.2531ms 94.8202μs 10.5463 KOps/s 10.4640 KOps/s $\color{#35bf28}+0.79\%$
test_seq_add[compile-overhead] 0.3308ms 0.1251ms 7.9928 KOps/s 7.9163 KOps/s $\color{#35bf28}+0.97\%$
test_seq_wrap[eager] 0.5772ms 0.3804ms 2.6290 KOps/s 2.6106 KOps/s $\color{#35bf28}+0.70\%$
test_seq_wrap[compile] 0.4713ms 0.3178ms 3.1463 KOps/s 3.0011 KOps/s $\color{#35bf28}+4.84\%$
test_seq_wrap[compile-overhead] 0.3734ms 0.2197ms 4.5516 KOps/s 4.4365 KOps/s $\color{#35bf28}+2.60\%$
test_func_call_runtime[False-eager] 0.8943ms 0.7325ms 1.3652 KOps/s 1.3386 KOps/s $\color{#35bf28}+1.99\%$
test_func_call_runtime[False-compile] 0.9219ms 0.7734ms 1.2930 KOps/s 1.1895 KOps/s $\textbf{\color{#35bf28}+8.70\%}$
test_func_call_runtime[False-compile-overhead] 0.5553ms 0.3544ms 2.8215 KOps/s 2.7533 KOps/s $\color{#35bf28}+2.48\%$
test_func_call_runtime[True-eager] 1.0496ms 0.8852ms 1.1296 KOps/s 1.0914 KOps/s $\color{#35bf28}+3.50\%$
test_func_call_runtime[True-compile] 0.9700ms 0.7954ms 1.2572 KOps/s 1.2092 KOps/s $\color{#35bf28}+3.97\%$
test_func_call_runtime[True-compile-overhead] 0.4981ms 0.3727ms 2.6831 KOps/s 2.6208 KOps/s $\color{#35bf28}+2.38\%$
test_func_call_cm_runtime[False-eager] 0.9273ms 0.7293ms 1.3712 KOps/s 1.3425 KOps/s $\color{#35bf28}+2.14\%$
test_func_call_cm_runtime[False-compile] 0.9303ms 0.7712ms 1.2967 KOps/s 1.2328 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_func_call_cm_runtime[False-compile-overhead] 1.0153ms 0.3539ms 2.8253 KOps/s 2.7436 KOps/s $\color{#35bf28}+2.98\%$
test_func_call_cm_runtime[True-eager] 1.1635ms 0.9817ms 1.0186 KOps/s 970.3053 Ops/s $\color{#35bf28}+4.98\%$
test_func_call_cm_runtime[True-compile] 0.9830ms 0.8198ms 1.2198 KOps/s 1.1664 KOps/s $\color{#35bf28}+4.57\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5517ms 0.4005ms 2.4966 KOps/s 2.4477 KOps/s $\color{#35bf28}+1.99\%$
test_vmap_func_call_cm_runtime[eager] 2.5153ms 2.0689ms 483.3379 Ops/s 464.7094 Ops/s $\color{#35bf28}+4.01\%$
test_vmap_func_call_cm_runtime[compile] 0.9921ms 0.8335ms 1.1998 KOps/s 1.1472 KOps/s $\color{#35bf28}+4.58\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5334ms 0.4045ms 2.4721 KOps/s 2.4313 KOps/s $\color{#35bf28}+1.68\%$
test_distributed 2.5331ms 0.1775ms 5.6350 KOps/s 8.8305 KOps/s $\textbf{\color{#d91a1a}-36.19\%}$
test_tdmodule 0.3806ms 13.1183μs 76.2292 KOps/s 71.7911 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_tdmodule_dispatch 45.7610μs 25.6057μs 39.0538 KOps/s 36.3383 KOps/s $\textbf{\color{#35bf28}+7.47\%}$
test_tdseq 34.0910μs 13.9470μs 71.7002 KOps/s 66.4128 KOps/s $\textbf{\color{#35bf28}+7.96\%}$
test_tdseq_dispatch 48.3610μs 28.4277μs 35.1769 KOps/s 33.7274 KOps/s $\color{#35bf28}+4.30\%$
test_instantiation_functorch 2.0045ms 1.8327ms 545.6442 Ops/s 532.9391 Ops/s $\color{#35bf28}+2.38\%$
test_exec_functorch 0.3548ms 0.2051ms 4.8763 KOps/s 4.6787 KOps/s $\color{#35bf28}+4.22\%$
test_exec_functional_call 0.3790ms 0.2058ms 4.8579 KOps/s 4.6112 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_exec_td_decorator 0.4418ms 0.2514ms 3.9783 KOps/s 3.7933 KOps/s $\color{#35bf28}+4.88\%$
test_vmap_mlp_speed_decorator[True-True] 0.8253ms 0.6671ms 1.4991 KOps/s 1.4493 KOps/s $\color{#35bf28}+3.43\%$
test_vmap_mlp_speed_decorator[True-False] 0.8565ms 0.6658ms 1.5019 KOps/s 1.4569 KOps/s $\color{#35bf28}+3.09\%$
test_vmap_mlp_speed_decorator[False-True] 0.7525ms 0.5898ms 1.6956 KOps/s 1.6507 KOps/s $\color{#35bf28}+2.72\%$
test_vmap_mlp_speed_decorator[False-False] 0.7316ms 0.5886ms 1.6988 KOps/s 1.6497 KOps/s $\color{#35bf28}+2.98\%$
test_vmap_transformer_speed_decorator[True-True] 19.6868ms 19.4242ms 51.4821 Ops/s 50.7053 Ops/s $\color{#35bf28}+1.53\%$
test_vmap_transformer_speed_decorator[True-False] 19.5663ms 19.4292ms 51.4689 Ops/s 50.5223 Ops/s $\color{#35bf28}+1.87\%$
test_vmap_transformer_speed_decorator[False-True] 19.4454ms 19.2962ms 51.8238 Ops/s 50.9730 Ops/s $\color{#35bf28}+1.67\%$
test_vmap_transformer_speed_decorator[False-False] 19.5297ms 19.2996ms 51.8146 Ops/s 50.9791 Ops/s $\color{#35bf28}+1.64\%$
test_to_module_speed[True] 1.4365ms 0.9231ms 1.0833 KOps/s 1.0353 KOps/s $\color{#35bf28}+4.63\%$
test_to_module_speed[False] 0.9964ms 0.9041ms 1.1061 KOps/s 1.0442 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_tc_init 58.7810μs 31.7816μs 31.4647 KOps/s 30.1014 KOps/s $\color{#35bf28}+4.53\%$
test_tc_init_nested 0.1091ms 64.9444μs 15.3978 KOps/s 15.1522 KOps/s $\color{#35bf28}+1.62\%$
test_tc_first_layer_tensor 4.7044μs 0.6920μs 1.4451 MOps/s 1.4419 MOps/s $\color{#35bf28}+0.22\%$
test_tc_first_layer_nontensor 25.8200μs 2.3082μs 433.2301 KOps/s 428.7910 KOps/s $\color{#35bf28}+1.04\%$
test_tc_second_layer_tensor 7.8475μs 1.4188μs 704.8126 KOps/s 698.7333 KOps/s $\color{#35bf28}+0.87\%$
test_tc_second_layer_nontensor 35.5400μs 3.0307μs 329.9528 KOps/s 325.9762 KOps/s $\color{#35bf28}+1.22\%$
test_unbind 0.1998s 9.4600ms 105.7082 Ops/s 93.2536 Ops/s $\textbf{\color{#35bf28}+13.36\%}$
test_full_like 0.7579ms 0.5756ms 1.7373 KOps/s 1.7432 KOps/s $\color{#d91a1a}-0.34\%$
test_zeros_like 0.3414ms 0.1984ms 5.0408 KOps/s 5.0441 KOps/s $\color{#d91a1a}-0.07\%$
test_ones_like 0.3495ms 0.1981ms 5.0489 KOps/s 5.0545 KOps/s $\color{#d91a1a}-0.11\%$
test_clone 0.5668ms 0.4152ms 2.4084 KOps/s 2.4101 KOps/s $\color{#d91a1a}-0.07\%$
test_squeeze 37.1300μs 9.3870μs 106.5302 KOps/s 107.5889 KOps/s $\color{#d91a1a}-0.98\%$
test_unsqueeze 0.2114ms 68.5623μs 14.5853 KOps/s 14.4177 KOps/s $\color{#35bf28}+1.16\%$
test_split 0.4044ms 0.1603ms 6.2399 KOps/s 6.0579 KOps/s $\color{#35bf28}+3.00\%$
test_permute 0.2757ms 0.1719ms 5.8184 KOps/s 5.7960 KOps/s $\color{#35bf28}+0.39\%$
test_stack 1.2612ms 0.8433ms 1.1858 KOps/s 1.1580 KOps/s $\color{#35bf28}+2.40\%$
test_cat 1.3151ms 1.2314ms 812.1023 Ops/s 812.5440 Ops/s $\color{#d91a1a}-0.05\%$

@vmoens vmoens merged commit f70288a into gh/vmoens/31/base Oct 25, 2024
48 of 54 checks passed
vmoens added a commit that referenced this pull request Oct 25, 2024
ghstack-source-id: b81c1d243a7c72a9d5fd68bf8e65e97a934ae61c
Pull Request resolved: #1060
@vmoens vmoens deleted the gh/vmoens/31/head branch October 25, 2024 14:33
is_dynamo = is_dynamo_compiling()
out = None
if not is_dynamo:
out = _TENSOR_COLLECTION_MEMO.get(datatype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why do we need the if-conditon. Does compile not support dict inserts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to avoid re-compiles.

In #1015 I completely removed these checks but the benchmarks in eager mode suffered a lot from this (basically any iteration over tensordict leaves got 10x slower).

So I'm partially reverting this. The compiler shouldn't care too much about this kind of optimization, + it would add a guard on _TENSOR_COLLECTION_MEMO which would be counter productive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - thank you!

vmoens added a commit that referenced this pull request Nov 4, 2024
ghstack-source-id: b81c1d243a7c72a9d5fd68bf8e65e97a934ae61c
Pull Request resolved: #1060

(cherry picked from commit 3963e51)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants