Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] broadcast pointwise ops for tensor/tensordict mixed inputs #1166

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 7, 2025

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 7, 2025
vmoens added a commit that referenced this pull request Jan 7, 2025
ghstack-source-id: 9d1446630ed08238e0a62a879222aeb6e161c425
Pull Request resolved: #1166
Copy link

github-actions bot commented Jan 7, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}17$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 47.2280μs 21.0463μs 47.5143 KOps/s 49.9637 KOps/s $\color{#d91a1a}-4.90\%$
test_plain_set_stack_nested 67.2760μs 21.1947μs 47.1817 KOps/s 48.9615 KOps/s $\color{#d91a1a}-3.64\%$
test_plain_set_nested_inplace 0.1317ms 22.8563μs 43.7516 KOps/s 45.4554 KOps/s $\color{#d91a1a}-3.75\%$
test_plain_set_stack_nested_inplace 76.1920μs 22.7722μs 43.9132 KOps/s 45.3432 KOps/s $\color{#d91a1a}-3.15\%$
test_items 43.4620μs 4.1095μs 243.3412 KOps/s 238.9702 KOps/s $\color{#35bf28}+1.83\%$
test_items_nested 0.6945ms 0.4038ms 2.4763 KOps/s 2.4352 KOps/s $\color{#35bf28}+1.69\%$
test_items_nested_locked 0.6023ms 0.4040ms 2.4751 KOps/s 2.4335 KOps/s $\color{#35bf28}+1.71\%$
test_items_nested_leaf 0.1478ms 77.0581μs 12.9772 KOps/s 12.9764 KOps/s $+0.01\%$
test_items_stack_nested 0.8413ms 0.4067ms 2.4586 KOps/s 2.4203 KOps/s $\color{#35bf28}+1.58\%$
test_items_stack_nested_leaf 0.1510ms 79.9144μs 12.5134 KOps/s 12.2530 KOps/s $\color{#35bf28}+2.13\%$
test_items_stack_nested_locked 0.5537ms 0.4081ms 2.4503 KOps/s 2.4299 KOps/s $\color{#35bf28}+0.84\%$
test_keys 21.1300μs 3.4856μs 286.8914 KOps/s 283.0684 KOps/s $\color{#35bf28}+1.35\%$
test_keys_nested 0.2721ms 0.1654ms 6.0466 KOps/s 5.8665 KOps/s $\color{#35bf28}+3.07\%$
test_keys_nested_locked 1.8037ms 0.1721ms 5.8108 KOps/s 5.7410 KOps/s $\color{#35bf28}+1.22\%$
test_keys_nested_leaf 0.2345ms 0.1445ms 6.9208 KOps/s 6.8734 KOps/s $\color{#35bf28}+0.69\%$
test_keys_stack_nested 0.3263ms 0.1654ms 6.0447 KOps/s 5.9918 KOps/s $\color{#35bf28}+0.88\%$
test_keys_stack_nested_leaf 0.2327ms 0.1439ms 6.9469 KOps/s 6.8814 KOps/s $\color{#35bf28}+0.95\%$
test_keys_stack_nested_locked 0.2678ms 0.1710ms 5.8492 KOps/s 5.8043 KOps/s $\color{#35bf28}+0.77\%$
test_values 5.9212μs 1.0505μs 951.9314 KOps/s 951.2131 KOps/s $\color{#35bf28}+0.08\%$
test_values_nested 0.1330ms 63.5118μs 15.7451 KOps/s 15.5950 KOps/s $\color{#35bf28}+0.96\%$
test_values_nested_locked 0.1243ms 63.9641μs 15.6338 KOps/s 14.8120 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_values_nested_leaf 0.1608ms 73.0617μs 13.6871 KOps/s 13.8100 KOps/s $\color{#d91a1a}-0.89\%$
test_values_stack_nested 0.1236ms 65.5509μs 15.2553 KOps/s 15.7673 KOps/s $\color{#d91a1a}-3.25\%$
test_values_stack_nested_leaf 0.1281ms 74.7077μs 13.3855 KOps/s 13.6825 KOps/s $\color{#d91a1a}-2.17\%$
test_values_stack_nested_locked 0.1162ms 64.6818μs 15.4603 KOps/s 15.6584 KOps/s $\color{#d91a1a}-1.26\%$
test_membership 19.6470μs 0.8840μs 1.1312 MOps/s 1.1059 MOps/s $\color{#35bf28}+2.29\%$
test_membership_nested 29.4260μs 2.9489μs 339.1105 KOps/s 344.5831 KOps/s $\color{#d91a1a}-1.59\%$
test_membership_nested_leaf 30.4270μs 2.9948μs 333.9176 KOps/s 334.1849 KOps/s $\color{#d91a1a}-0.08\%$
test_membership_stacked_nested 25.0670μs 2.9422μs 339.8770 KOps/s 333.4512 KOps/s $\color{#35bf28}+1.93\%$
test_membership_stacked_nested_leaf 33.9730μs 2.9996μs 333.3747 KOps/s 346.7772 KOps/s $\color{#d91a1a}-3.86\%$
test_membership_nested_last 28.4540μs 4.4998μs 222.2345 KOps/s 227.7478 KOps/s $\color{#d91a1a}-2.42\%$
test_membership_nested_leaf_last 29.2540μs 4.5492μs 219.8168 KOps/s 224.3809 KOps/s $\color{#d91a1a}-2.03\%$
test_membership_stacked_nested_last 42.3800μs 4.4453μs 224.9572 KOps/s 227.8845 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_stacked_nested_leaf_last 31.8200μs 4.4572μs 224.3566 KOps/s 225.0590 KOps/s $\color{#d91a1a}-0.31\%$
test_nested_getleaf 49.9030μs 11.1796μs 89.4486 KOps/s 92.0904 KOps/s $\color{#d91a1a}-2.87\%$
test_nested_get 50.8450μs 10.5874μs 94.4518 KOps/s 95.5312 KOps/s $\color{#d91a1a}-1.13\%$
test_stacked_getleaf 50.8550μs 11.1554μs 89.6430 KOps/s 91.4396 KOps/s $\color{#d91a1a}-1.96\%$
test_stacked_get 51.4070μs 10.5717μs 94.5923 KOps/s 96.6658 KOps/s $\color{#d91a1a}-2.15\%$
test_nested_getitemleaf 52.7690μs 11.5951μs 86.2434 KOps/s 90.2691 KOps/s $\color{#d91a1a}-4.46\%$
test_nested_getitem 51.9880μs 10.7994μs 92.5975 KOps/s 94.2462 KOps/s $\color{#d91a1a}-1.75\%$
test_stacked_getitemleaf 49.9940μs 11.6280μs 85.9997 KOps/s 90.1395 KOps/s $\color{#d91a1a}-4.59\%$
test_stacked_getitem 50.0430μs 11.5727μs 86.4106 KOps/s 94.0139 KOps/s $\textbf{\color{#d91a1a}-8.09\%}$
test_lock_nested 2.2180ms 0.4625ms 2.1623 KOps/s 2.1261 KOps/s $\color{#35bf28}+1.70\%$
test_lock_stack_nested 0.5810ms 0.4359ms 2.2943 KOps/s 2.2592 KOps/s $\color{#35bf28}+1.55\%$
test_unlock_nested 1.3990ms 0.3817ms 2.6198 KOps/s 2.5807 KOps/s $\color{#35bf28}+1.52\%$
test_unlock_stack_nested 0.6761ms 0.3529ms 2.8336 KOps/s 2.8326 KOps/s $\color{#35bf28}+0.03\%$
test_flatten_speed 0.2020ms 0.1015ms 9.8502 KOps/s 9.7812 KOps/s $\color{#35bf28}+0.70\%$
test_unflatten_speed 0.9163ms 0.5416ms 1.8465 KOps/s 1.8475 KOps/s $\color{#d91a1a}-0.05\%$
test_common_ops 1.6291ms 0.8038ms 1.2440 KOps/s 1.3051 KOps/s $\color{#d91a1a}-4.67\%$
test_creation 0.1226ms 2.7541μs 363.0956 KOps/s 395.8291 KOps/s $\textbf{\color{#d91a1a}-8.27\%}$
test_creation_empty 43.8820μs 12.3112μs 81.2272 KOps/s 100.1901 KOps/s $\textbf{\color{#d91a1a}-18.93\%}$
test_creation_nested_1 55.6140μs 15.0348μs 66.5122 KOps/s 77.3335 KOps/s $\textbf{\color{#d91a1a}-13.99\%}$
test_creation_nested_2 64.6210μs 19.9042μs 50.2407 KOps/s 56.8570 KOps/s $\textbf{\color{#d91a1a}-11.64\%}$
test_clone 51.7370μs 13.7297μs 72.8350 KOps/s 71.9938 KOps/s $\color{#35bf28}+1.17\%$
test_getitem[int] 1.1249ms 12.8592μs 77.7654 KOps/s 76.5643 KOps/s $\color{#35bf28}+1.57\%$
test_getitem[slice_int] 0.1549ms 25.5141μs 39.1940 KOps/s 40.0437 KOps/s $\color{#d91a1a}-2.12\%$
test_getitem[range] 0.1874ms 48.8755μs 20.4602 KOps/s 19.8601 KOps/s $\color{#35bf28}+3.02\%$
test_getitem[tuple] 0.1753ms 20.4314μs 48.9442 KOps/s 48.0661 KOps/s $\color{#35bf28}+1.83\%$
test_getitem[list] 0.1991ms 44.7157μs 22.3635 KOps/s 21.9816 KOps/s $\color{#35bf28}+1.74\%$
test_setitem_dim[int] 48.3110μs 24.6790μs 40.5202 KOps/s 38.1125 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_setitem_dim[slice_int] 97.0020μs 50.7209μs 19.7158 KOps/s 19.3656 KOps/s $\color{#35bf28}+1.81\%$
test_setitem_dim[range] 0.1213ms 73.2890μs 13.6446 KOps/s 13.3924 KOps/s $\color{#35bf28}+1.88\%$
test_setitem_dim[tuple] 81.8430μs 39.7639μs 25.1484 KOps/s 24.3531 KOps/s $\color{#35bf28}+3.27\%$
test_setitem 65.9630μs 20.4498μs 48.9001 KOps/s 50.0865 KOps/s $\color{#d91a1a}-2.37\%$
test_set 0.3961ms 20.2179μs 49.4612 KOps/s 52.1361 KOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_set_shared 2.1581ms 0.1712ms 5.8402 KOps/s 5.5960 KOps/s $\color{#35bf28}+4.36\%$
test_update 0.3711ms 23.3425μs 42.8403 KOps/s 46.7775 KOps/s $\textbf{\color{#d91a1a}-8.42\%}$
test_update_nested 0.3657ms 33.7708μs 29.6114 KOps/s 31.3871 KOps/s $\textbf{\color{#d91a1a}-5.66\%}$
test_update__nested 0.7612ms 33.5288μs 29.8251 KOps/s 29.1374 KOps/s $\color{#35bf28}+2.36\%$
test_set_nested 0.3813ms 22.5018μs 44.4410 KOps/s 46.0089 KOps/s $\color{#d91a1a}-3.41\%$
test_set_nested_new 0.3791ms 27.2467μs 36.7017 KOps/s 37.6014 KOps/s $\color{#d91a1a}-2.39\%$
test_select 0.3982ms 44.4787μs 22.4827 KOps/s 23.5550 KOps/s $\color{#d91a1a}-4.55\%$
test_select_nested 0.1216ms 64.1126μs 15.5976 KOps/s 15.4363 KOps/s $\color{#35bf28}+1.04\%$
test_exclude_nested 0.1629ms 83.4754μs 11.9796 KOps/s 12.1454 KOps/s $\color{#d91a1a}-1.37\%$
test_empty[True] 0.5229ms 0.4136ms 2.4177 KOps/s 2.3526 KOps/s $\color{#35bf28}+2.77\%$
test_empty[False] 13.2147μs 1.3903μs 719.2908 KOps/s 699.2205 KOps/s $\color{#35bf28}+2.87\%$
test_unbind_speed 0.4012ms 0.2730ms 3.6632 KOps/s 3.5981 KOps/s $\color{#35bf28}+1.81\%$
test_unbind_speed_stack0 0.3904ms 0.2732ms 3.6606 KOps/s 3.6622 KOps/s $\color{#d91a1a}-0.05\%$
test_unbind_speed_stack1 0.1185s 0.8520ms 1.1737 KOps/s 1.3027 KOps/s $\textbf{\color{#d91a1a}-9.90\%}$
test_split 1.7352ms 1.6074ms 622.1405 Ops/s 550.4681 Ops/s $\textbf{\color{#35bf28}+13.02\%}$
test_chunk 0.1213s 2.0246ms 493.9230 Ops/s 554.8626 Ops/s $\textbf{\color{#d91a1a}-10.98\%}$
test_consolidate_njt[False-None] 8.9948ms 8.2553ms 121.1341 Ops/s 116.7531 Ops/s $\color{#35bf28}+3.75\%$
test_creation[device0] 4.5713ms 94.5646μs 10.5748 KOps/s 10.7168 KOps/s $\color{#d91a1a}-1.33\%$
test_creation_from_tensor 0.4410ms 98.7902μs 10.1225 KOps/s 10.5411 KOps/s $\color{#d91a1a}-3.97\%$
test_add_one[memmap_tensor0] 0.1199ms 4.9682μs 201.2802 KOps/s 199.7852 KOps/s $\color{#35bf28}+0.75\%$
test_contiguous[memmap_tensor0] 8.1250μs 0.5214μs 1.9181 MOps/s 1.9466 MOps/s $\color{#d91a1a}-1.47\%$
test_stack[memmap_tensor0] 0.1307ms 3.5664μs 280.3985 KOps/s 292.7340 KOps/s $\color{#d91a1a}-4.21\%$
test_memmaptd_index 1.1395ms 0.2352ms 4.2508 KOps/s 4.1335 KOps/s $\color{#35bf28}+2.84\%$
test_memmaptd_index_astensor 0.8136ms 0.3250ms 3.0772 KOps/s 3.0032 KOps/s $\color{#35bf28}+2.46\%$
test_memmaptd_index_op 1.0806ms 0.6049ms 1.6531 KOps/s 1.7263 KOps/s $\color{#d91a1a}-4.24\%$
test_serialize_model 0.1258s 0.1202s 8.3182 Ops/s 8.2356 Ops/s $\color{#35bf28}+1.00\%$
test_serialize_model_pickle 0.4339s 0.3988s 2.5075 Ops/s 2.4975 Ops/s $\color{#35bf28}+0.40\%$
test_serialize_weights 0.1286s 0.1215s 8.2328 Ops/s 7.1621 Ops/s $\textbf{\color{#35bf28}+14.95\%}$
test_serialize_weights_returnearly 0.2823s 0.1832s 5.4588 Ops/s 6.4306 Ops/s $\textbf{\color{#d91a1a}-15.11\%}$
test_serialize_weights_pickle 0.4832s 0.4069s 2.4574 Ops/s 2.3460 Ops/s $\color{#35bf28}+4.75\%$
test_serialize_weights_filesystem 0.1595s 0.1466s 6.8209 Ops/s 6.8354 Ops/s $\color{#d91a1a}-0.21\%$
test_serialize_model_filesystem 0.1637s 0.1541s 6.4903 Ops/s 6.5987 Ops/s $\color{#d91a1a}-1.64\%$
test_reshape_pytree 85.7610μs 26.5151μs 37.7144 KOps/s 37.4850 KOps/s $\color{#35bf28}+0.61\%$
test_reshape_td 78.3570μs 33.0780μs 30.2316 KOps/s 29.7203 KOps/s $\color{#35bf28}+1.72\%$
test_view_pytree 0.1054ms 26.9401μs 37.1194 KOps/s 36.9710 KOps/s $\color{#35bf28}+0.40\%$
test_view_td 85.4600μs 37.8907μs 26.3917 KOps/s 25.1669 KOps/s $\color{#35bf28}+4.87\%$
test_unbind_pytree 65.0620μs 29.7801μs 33.5795 KOps/s 32.8652 KOps/s $\color{#35bf28}+2.17\%$
test_unbind_td 0.3509ms 40.6281μs 24.6135 KOps/s 24.6339 KOps/s $\color{#d91a1a}-0.08\%$
test_split_pytree 72.5460μs 29.6169μs 33.7645 KOps/s 33.5474 KOps/s $\color{#35bf28}+0.65\%$
test_split_td 0.6560ms 45.7118μs 21.8762 KOps/s 22.1434 KOps/s $\color{#d91a1a}-1.21\%$
test_add_pytree 83.4560μs 35.3287μs 28.3056 KOps/s 27.9565 KOps/s $\color{#35bf28}+1.25\%$
test_add_td 0.1369ms 61.0805μs 16.3718 KOps/s 18.4130 KOps/s $\textbf{\color{#d91a1a}-11.09\%}$
test_compile_add_one_nested[tensordict-compile] 0.2119ms 64.1445μs 15.5898 KOps/s 15.8017 KOps/s $\color{#d91a1a}-1.34\%$
test_compile_add_one_nested[tensordict-eager] 0.5456ms 0.1786ms 5.6006 KOps/s 5.7980 KOps/s $\color{#d91a1a}-3.40\%$
test_compile_add_one_nested[pytree-compile] 0.1066ms 45.4638μs 21.9955 KOps/s 21.2423 KOps/s $\color{#35bf28}+3.55\%$
test_compile_add_one_nested[pytree-eager] 0.2624ms 0.1184ms 8.4488 KOps/s 8.3823 KOps/s $\color{#35bf28}+0.79\%$
test_compile_copy_nested[tensordict-compile] 73.3470μs 25.9896μs 38.4770 KOps/s 36.8485 KOps/s $\color{#35bf28}+4.42\%$
test_compile_copy_nested[tensordict-eager] 0.1205ms 59.2236μs 16.8852 KOps/s 16.6983 KOps/s $\color{#35bf28}+1.12\%$
test_compile_copy_nested[pytree-compile] 0.3556ms 78.3428μs 12.7644 KOps/s 12.5487 KOps/s $\color{#35bf28}+1.72\%$
test_compile_copy_nested[pytree-eager] 0.1415ms 68.0511μs 14.6948 KOps/s 14.4212 KOps/s $\color{#35bf28}+1.90\%$
test_compile_add_one_flat[tensordict-compile] 0.1923ms 0.1054ms 9.4845 KOps/s 9.4167 KOps/s $\color{#35bf28}+0.72\%$
test_compile_add_one_flat[tensordict-eager] 0.4486ms 0.2215ms 4.5137 KOps/s 4.5650 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_add_one_flat[tensorclass-compile] 0.1344ms 44.6744μs 22.3842 KOps/s 21.6216 KOps/s $\color{#35bf28}+3.53\%$
test_compile_add_one_flat[tensorclass-eager] 0.4815ms 67.4723μs 14.8209 KOps/s 15.3332 KOps/s $\color{#d91a1a}-3.34\%$
test_compile_add_one_flat[pytree-compile] 0.1842ms 0.1023ms 9.7799 KOps/s 9.7106 KOps/s $\color{#35bf28}+0.71\%$
test_compile_add_one_flat[pytree-eager] 0.4375ms 0.2011ms 4.9725 KOps/s 4.9811 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_add_self_flat[tensordict-eager] 0.4689ms 0.2406ms 4.1555 KOps/s 4.2379 KOps/s $\color{#d91a1a}-1.94\%$
test_compile_add_self_flat[tensordict-compile] 0.2002ms 0.1042ms 9.5924 KOps/s 9.3541 KOps/s $\color{#35bf28}+2.55\%$
test_compile_add_self_flat[tensorclass-eager] 0.1526ms 68.5874μs 14.5799 KOps/s 16.8855 KOps/s $\textbf{\color{#d91a1a}-13.65\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1168ms 47.1648μs 21.2022 KOps/s 21.8986 KOps/s $\color{#d91a1a}-3.18\%$
test_compile_add_self_flat[pytree-eager] 0.2527ms 0.1584ms 6.3114 KOps/s 6.3034 KOps/s $\color{#35bf28}+0.13\%$
test_compile_add_self_flat[pytree-compile] 0.1899ms 0.1062ms 9.4156 KOps/s 9.6242 KOps/s $\color{#d91a1a}-2.17\%$
test_compile_copy_flat[tensordict-compile] 79.0000μs 21.0505μs 47.5047 KOps/s 46.0732 KOps/s $\color{#35bf28}+3.11\%$
test_compile_copy_flat[tensordict-eager] 0.1337ms 66.2283μs 15.0993 KOps/s 15.0454 KOps/s $\color{#35bf28}+0.36\%$
test_compile_copy_flat[pytree-compile] 0.1506ms 83.3610μs 11.9960 KOps/s 12.0967 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_copy_flat[pytree-eager] 0.1305ms 69.0873μs 14.4744 KOps/s 14.3643 KOps/s $\color{#35bf28}+0.77\%$
test_compile_assign_and_add[tensordict-compile] 0.3097ms 0.2054ms 4.8684 KOps/s 4.8120 KOps/s $\color{#35bf28}+1.17\%$
test_compile_assign_and_add[tensordict-eager] 1.6271ms 1.3423ms 744.9959 Ops/s 756.9032 Ops/s $\color{#d91a1a}-1.57\%$
test_compile_assign_and_add[pytree-compile] 0.3349ms 0.2021ms 4.9489 KOps/s 4.8556 KOps/s $\color{#35bf28}+1.92\%$
test_compile_assign_and_add[pytree-eager] 0.9792ms 0.7759ms 1.2888 KOps/s 1.2958 KOps/s $\color{#d91a1a}-0.54\%$
test_compile_assign_and_add_stack[compile] 0.5410ms 0.4499ms 2.2227 KOps/s 2.2030 KOps/s $\color{#35bf28}+0.89\%$
test_compile_assign_and_add_stack[eager] 3.5396ms 2.7415ms 364.7630 Ops/s 378.2797 Ops/s $\color{#d91a1a}-3.57\%$
test_compile_indexing[tensor-tensordict-compile] 90.1490μs 34.5361μs 28.9552 KOps/s 26.6725 KOps/s $\textbf{\color{#35bf28}+8.56\%}$
test_compile_indexing[tensor-tensordict-eager] 0.6850ms 32.9796μs 30.3218 KOps/s 28.9533 KOps/s $\color{#35bf28}+4.73\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1088ms 28.6509μs 34.9030 KOps/s 32.6979 KOps/s $\textbf{\color{#35bf28}+6.74\%}$
test_compile_indexing[tensor-tensorclass-eager] 0.1045ms 23.4919μs 42.5679 KOps/s 40.3790 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_compile_indexing[tensor-pytree-compile] 78.9380μs 29.3240μs 34.1017 KOps/s 31.9058 KOps/s $\textbf{\color{#35bf28}+6.88\%}$
test_compile_indexing[tensor-pytree-eager] 0.1036ms 23.4003μs 42.7345 KOps/s 42.1286 KOps/s $\color{#35bf28}+1.44\%$
test_compile_indexing[slice-tensordict-compile] 0.1324ms 50.2069μs 19.9176 KOps/s 18.7380 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_compile_indexing[slice-tensordict-eager] 0.5812ms 20.7127μs 48.2796 KOps/s 49.3451 KOps/s $\color{#d91a1a}-2.16\%$
test_compile_indexing[slice-tensorclass-compile] 0.1006ms 42.4556μs 23.5540 KOps/s 22.0801 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_compile_indexing[slice-tensorclass-eager] 99.1060μs 18.6670μs 53.5705 KOps/s 51.9637 KOps/s $\color{#35bf28}+3.09\%$
test_compile_indexing[slice-pytree-compile] 0.1219ms 43.4751μs 23.0017 KOps/s 21.7617 KOps/s $\textbf{\color{#35bf28}+5.70\%}$
test_compile_indexing[slice-pytree-eager] 98.8370μs 18.4206μs 54.2872 KOps/s 52.3899 KOps/s $\color{#35bf28}+3.62\%$
test_compile_indexing[int-tensordict-compile] 0.1587ms 51.3842μs 19.4612 KOps/s 18.2631 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_compile_indexing[int-tensordict-eager] 1.0636ms 20.0797μs 49.8016 KOps/s 50.3002 KOps/s $\color{#d91a1a}-0.99\%$
test_compile_indexing[int-tensorclass-compile] 98.0640μs 43.9402μs 22.7582 KOps/s 21.4946 KOps/s $\textbf{\color{#35bf28}+5.88\%}$
test_compile_indexing[int-tensorclass-eager] 57.1570μs 18.5616μs 53.8747 KOps/s 52.8079 KOps/s $\color{#35bf28}+2.02\%$
test_compile_indexing[int-pytree-compile] 0.1225ms 44.1620μs 22.6439 KOps/s 21.5927 KOps/s $\color{#35bf28}+4.87\%$
test_compile_indexing[int-pytree-eager] 57.2570μs 18.5050μs 54.0395 KOps/s 52.9247 KOps/s $\color{#35bf28}+2.11\%$
test_mod_add[eager] 0.1268ms 33.7419μs 29.6367 KOps/s 29.4539 KOps/s $\color{#35bf28}+0.62\%$
test_mod_add[compile] 0.1420ms 47.1806μs 21.1951 KOps/s 19.9226 KOps/s $\textbf{\color{#35bf28}+6.39\%}$
test_mod_add[compile-overhead] 0.1028ms 46.4634μs 21.5223 KOps/s 19.7515 KOps/s $\textbf{\color{#35bf28}+8.97\%}$
test_mod_wrap[eager] 0.3644ms 0.2279ms 4.3882 KOps/s 4.3363 KOps/s $\color{#35bf28}+1.20\%$
test_mod_wrap[compile] 0.3012ms 0.2090ms 4.7858 KOps/s 4.8101 KOps/s $\color{#d91a1a}-0.50\%$
test_mod_wrap[compile-overhead] 0.3941ms 0.2088ms 4.7893 KOps/s 4.8451 KOps/s $\color{#d91a1a}-1.15\%$
test_mod_wrap_and_backward[eager] 18.1627ms 12.3833ms 80.7538 Ops/s 82.3454 Ops/s $\color{#d91a1a}-1.93\%$
test_mod_wrap_and_backward[compile] 14.0254ms 12.8747ms 77.6716 Ops/s 85.5255 Ops/s $\textbf{\color{#d91a1a}-9.18\%}$
test_mod_wrap_and_backward[compile-overhead] 13.5814ms 12.0057ms 83.2935 Ops/s 85.5481 Ops/s $\color{#d91a1a}-2.64\%$
test_seq_add[eager] 0.2254ms 0.1141ms 8.7621 KOps/s 8.4732 KOps/s $\color{#35bf28}+3.41\%$
test_seq_add[compile] 0.1710ms 62.0005μs 16.1289 KOps/s 15.5482 KOps/s $\color{#35bf28}+3.73\%$
test_seq_add[compile-overhead] 0.1315ms 60.6624μs 16.4847 KOps/s 16.0373 KOps/s $\color{#35bf28}+2.79\%$
test_seq_wrap[eager] 0.7984ms 0.4491ms 2.2269 KOps/s 2.2234 KOps/s $\color{#35bf28}+0.16\%$
test_seq_wrap[compile] 0.4060ms 0.2350ms 4.2554 KOps/s 4.2670 KOps/s $\color{#d91a1a}-0.27\%$
test_seq_wrap[compile-overhead] 0.4449ms 0.2304ms 4.3399 KOps/s 4.3555 KOps/s $\color{#d91a1a}-0.36\%$
test_func_call_runtime[False-eager] 0.9938ms 0.5540ms 1.8050 KOps/s 1.8035 KOps/s $\color{#35bf28}+0.08\%$
test_func_call_runtime[False-compile] 0.5507ms 0.4370ms 2.2882 KOps/s 2.3221 KOps/s $\color{#d91a1a}-1.46\%$
test_func_call_runtime[False-compile-overhead] 0.8455ms 0.4435ms 2.2549 KOps/s 2.3176 KOps/s $\color{#d91a1a}-2.71\%$
test_func_call_runtime[True-eager] 1.3102ms 0.7728ms 1.2940 KOps/s 1.2896 KOps/s $\color{#35bf28}+0.34\%$
test_func_call_runtime[True-compile] 0.6567ms 0.4822ms 2.0740 KOps/s 2.1007 KOps/s $\color{#d91a1a}-1.27\%$
test_func_call_runtime[True-compile-overhead] 0.6614ms 0.4808ms 2.0800 KOps/s 2.1202 KOps/s $\color{#d91a1a}-1.90\%$
test_func_call_cm_runtime[False-eager] 1.0068ms 0.5528ms 1.8088 KOps/s 1.8305 KOps/s $\color{#d91a1a}-1.18\%$
test_func_call_cm_runtime[False-compile] 0.6197ms 0.4411ms 2.2672 KOps/s 2.3194 KOps/s $\color{#d91a1a}-2.25\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6284ms 0.4389ms 2.2782 KOps/s 2.3288 KOps/s $\color{#d91a1a}-2.17\%$
test_func_call_cm_runtime[True-eager] 1.4757ms 0.9231ms 1.0833 KOps/s 1.0941 KOps/s $\color{#d91a1a}-0.98\%$
test_func_call_cm_runtime[True-compile] 0.6844ms 0.5041ms 1.9838 KOps/s 2.0042 KOps/s $\color{#d91a1a}-1.02\%$
test_func_call_cm_runtime[True-compile-overhead] 0.6691ms 0.5084ms 1.9671 KOps/s 1.9935 KOps/s $\color{#d91a1a}-1.32\%$
test_vmap_func_call_cm_runtime[eager] 2.8381ms 1.9645ms 509.0265 Ops/s 513.1605 Ops/s $\color{#d91a1a}-0.81\%$
test_vmap_func_call_cm_runtime[compile] 0.7546ms 0.5250ms 1.9049 KOps/s 1.8831 KOps/s $\color{#35bf28}+1.16\%$
test_vmap_func_call_cm_runtime[compile-overhead] 1.0866ms 0.5432ms 1.8408 KOps/s 1.8869 KOps/s $\color{#d91a1a}-2.44\%$
test_distributed 0.4017ms 0.1259ms 7.9428 KOps/s 7.6339 KOps/s $\color{#35bf28}+4.05\%$
test_tdmodule 44.3030μs 26.4544μs 37.8009 KOps/s 39.4531 KOps/s $\color{#d91a1a}-4.19\%$
test_tdmodule_dispatch 78.1160μs 48.6631μs 20.5495 KOps/s 21.7760 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_tdseq 60.1530μs 29.5336μs 33.8597 KOps/s 34.6601 KOps/s $\color{#d91a1a}-2.31\%$
test_tdseq_dispatch 88.7960μs 54.6123μs 18.3109 KOps/s 19.0228 KOps/s $\color{#d91a1a}-3.74\%$
test_instantiation_functorch 2.3641ms 1.5359ms 651.0947 Ops/s 634.2536 Ops/s $\color{#35bf28}+2.66\%$
test_exec_functorch 0.3368ms 0.1789ms 5.5887 KOps/s 5.3855 KOps/s $\color{#35bf28}+3.77\%$
test_exec_functional_call 0.3021ms 0.1756ms 5.6933 KOps/s 5.8182 KOps/s $\color{#d91a1a}-2.15\%$
test_exec_td_decorator 0.5451ms 0.2380ms 4.2017 KOps/s 4.3249 KOps/s $\color{#d91a1a}-2.85\%$
test_vmap_mlp_speed_decorator[True-True] 0.8908ms 0.6732ms 1.4853 KOps/s 1.5062 KOps/s $\color{#d91a1a}-1.39\%$
test_vmap_mlp_speed_decorator[True-False] 1.1819ms 0.6745ms 1.4826 KOps/s 1.5067 KOps/s $\color{#d91a1a}-1.60\%$
test_vmap_mlp_speed_decorator[False-True] 0.8562ms 0.5403ms 1.8507 KOps/s 1.8772 KOps/s $\color{#d91a1a}-1.41\%$
test_vmap_mlp_speed_decorator[False-False] 0.8561ms 0.5416ms 1.8463 KOps/s 1.8719 KOps/s $\color{#d91a1a}-1.36\%$
test_to_module_speed[True] 2.6468ms 1.3638ms 733.2409 Ops/s 738.8994 Ops/s $\color{#d91a1a}-0.77\%$
test_to_module_speed[False] 2.0798ms 1.3304ms 751.6407 Ops/s 725.9635 Ops/s $\color{#35bf28}+3.54\%$
test_tc_init 99.9270μs 45.6227μs 21.9189 KOps/s 22.8320 KOps/s $\color{#d91a1a}-4.00\%$
test_tc_init_nested 0.1944ms 94.0133μs 10.6368 KOps/s 11.3242 KOps/s $\textbf{\color{#d91a1a}-6.07\%}$
test_tc_first_layer_tensor 27.1710μs 1.5548μs 643.1902 KOps/s 652.9202 KOps/s $\color{#d91a1a}-1.49\%$
test_tc_first_layer_nontensor 27.6210μs 4.7010μs 212.7194 KOps/s 212.2733 KOps/s $\color{#35bf28}+0.21\%$
test_tc_second_layer_tensor 24.9870μs 2.8723μs 348.1473 KOps/s 350.8432 KOps/s $\color{#d91a1a}-0.77\%$
test_tc_second_layer_nontensor 52.4080μs 6.0868μs 164.2907 KOps/s 165.7808 KOps/s $\color{#d91a1a}-0.90\%$
test_unbind 0.2542s 14.2526ms 70.1629 Ops/s 62.7140 Ops/s $\textbf{\color{#35bf28}+11.88\%}$
test_full_like 12.9306ms 9.7799ms 102.2505 Ops/s 110.0829 Ops/s $\textbf{\color{#d91a1a}-7.12\%}$
test_zeros_like 4.2041ms 3.4926ms 286.3164 Ops/s 290.7328 Ops/s $\color{#d91a1a}-1.52\%$
test_ones_like 5.1076ms 4.2276ms 236.5388 Ops/s 149.7090 Ops/s $\textbf{\color{#35bf28}+58.00\%}$
test_clone 7.2973ms 6.4086ms 156.0411 Ops/s 111.2004 Ops/s $\textbf{\color{#35bf28}+40.32\%}$
test_squeeze 0.1000ms 12.1392μs 82.3780 KOps/s 81.8939 KOps/s $\color{#35bf28}+0.59\%$
test_unsqueeze 0.1729ms 92.7443μs 10.7823 KOps/s 10.8544 KOps/s $\color{#d91a1a}-0.66\%$
test_split 0.3985ms 0.1929ms 5.1845 KOps/s 5.1101 KOps/s $\color{#35bf28}+1.46\%$
test_permute 0.4027ms 0.2094ms 4.7760 KOps/s 4.7332 KOps/s $\color{#35bf28}+0.90\%$
test_stack 34.9362ms 28.6046ms 34.9594 Ops/s 34.8049 Ops/s $\color{#35bf28}+0.44\%$
test_cat 32.4270ms 28.2399ms 35.4109 Ops/s 35.0316 Ops/s $\color{#35bf28}+1.08\%$

Copy link

github-actions bot commented Jan 7, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 229. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}15$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 28.4610μs 11.3055μs 88.4529 KOps/s 78.5192 KOps/s $\textbf{\color{#35bf28}+12.65\%}$
test_plain_set_stack_nested 36.6100μs 11.4933μs 87.0075 KOps/s 77.1846 KOps/s $\textbf{\color{#35bf28}+12.73\%}$
test_plain_set_nested_inplace 44.7900μs 12.3651μs 80.8725 KOps/s 72.2716 KOps/s $\textbf{\color{#35bf28}+11.90\%}$
test_plain_set_stack_nested_inplace 38.4110μs 12.3154μs 81.1992 KOps/s 72.0625 KOps/s $\textbf{\color{#35bf28}+12.68\%}$
test_items 26.0600μs 2.8635μs 349.2191 KOps/s 343.0273 KOps/s $\color{#35bf28}+1.81\%$
test_items_nested 0.4022ms 0.3577ms 2.7955 KOps/s 2.7840 KOps/s $\color{#35bf28}+0.41\%$
test_items_nested_locked 0.4225ms 0.3540ms 2.8250 KOps/s 2.7651 KOps/s $\color{#35bf28}+2.17\%$
test_items_nested_leaf 79.5010μs 57.9045μs 17.2698 KOps/s 17.3530 KOps/s $\color{#d91a1a}-0.48\%$
test_items_stack_nested 0.3974ms 0.3613ms 2.7679 KOps/s 2.7580 KOps/s $\color{#35bf28}+0.36\%$
test_items_stack_nested_leaf 93.1020μs 59.9991μs 16.6669 KOps/s 16.7316 KOps/s $\color{#d91a1a}-0.39\%$
test_items_stack_nested_locked 0.4031ms 0.3590ms 2.7859 KOps/s 2.7419 KOps/s $\color{#35bf28}+1.60\%$
test_keys 24.6300μs 3.4399μs 290.7019 KOps/s 291.5825 KOps/s $\color{#d91a1a}-0.30\%$
test_keys_nested 0.1119ms 80.7549μs 12.3831 KOps/s 12.0827 KOps/s $\color{#35bf28}+2.49\%$
test_keys_nested_locked 0.8195ms 86.1642μs 11.6057 KOps/s 11.3455 KOps/s $\color{#35bf28}+2.29\%$
test_keys_nested_leaf 2.7085ms 72.1271μs 13.8644 KOps/s 13.7198 KOps/s $\color{#35bf28}+1.05\%$
test_keys_stack_nested 0.1159ms 82.2833μs 12.1531 KOps/s 12.0061 KOps/s $\color{#35bf28}+1.22\%$
test_keys_stack_nested_leaf 0.1183ms 72.8543μs 13.7260 KOps/s 13.4575 KOps/s $\color{#35bf28}+2.00\%$
test_keys_stack_nested_locked 0.1168ms 88.1604μs 11.3430 KOps/s 11.2839 KOps/s $\color{#35bf28}+0.52\%$
test_values 3.7490μs 0.8430μs 1.1863 MOps/s 1.1781 MOps/s $\color{#35bf28}+0.69\%$
test_values_nested 60.2110μs 34.5108μs 28.9764 KOps/s 29.0648 KOps/s $\color{#d91a1a}-0.30\%$
test_values_nested_locked 61.9110μs 35.9280μs 27.8334 KOps/s 27.6388 KOps/s $\color{#35bf28}+0.70\%$
test_values_nested_leaf 74.1310μs 39.1078μs 25.5704 KOps/s 25.3626 KOps/s $\color{#35bf28}+0.82\%$
test_values_stack_nested 61.7410μs 34.9548μs 28.6084 KOps/s 28.5087 KOps/s $\color{#35bf28}+0.35\%$
test_values_stack_nested_leaf 73.3610μs 39.6028μs 25.2507 KOps/s 25.5833 KOps/s $\color{#d91a1a}-1.30\%$
test_values_stack_nested_locked 67.7810μs 36.5744μs 27.3416 KOps/s 27.5046 KOps/s $\color{#d91a1a}-0.59\%$
test_membership 1.7230μs 0.5019μs 1.9922 MOps/s 1.9639 MOps/s $\color{#35bf28}+1.44\%$
test_membership_nested 31.4600μs 2.0136μs 496.6285 KOps/s 511.0686 KOps/s $\color{#d91a1a}-2.83\%$
test_membership_nested_leaf 15.4355μs 1.9560μs 511.2483 KOps/s 513.5656 KOps/s $\color{#d91a1a}-0.45\%$
test_membership_stacked_nested 27.0810μs 2.0290μs 492.8630 KOps/s 478.5833 KOps/s $\color{#35bf28}+2.98\%$
test_membership_stacked_nested_leaf 23.7510μs 2.0432μs 489.4320 KOps/s 485.8433 KOps/s $\color{#35bf28}+0.74\%$
test_membership_nested_last 27.2800μs 3.0604μs 326.7584 KOps/s 329.4252 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_nested_leaf_last 27.6500μs 3.0155μs 331.6180 KOps/s 324.5486 KOps/s $\color{#35bf28}+2.18\%$
test_membership_stacked_nested_last 33.1410μs 3.0805μs 324.6221 KOps/s 147.4585 KOps/s $\textbf{\color{#35bf28}+120.14\%}$
test_membership_stacked_nested_leaf_last 36.8010μs 3.0594μs 326.8654 KOps/s 146.3000 KOps/s $\textbf{\color{#35bf28}+123.42\%}$
test_nested_getleaf 37.2310μs 6.0682μs 164.7926 KOps/s 165.6221 KOps/s $\color{#d91a1a}-0.50\%$
test_nested_get 29.9000μs 5.8083μs 172.1679 KOps/s 174.0449 KOps/s $\color{#d91a1a}-1.08\%$
test_stacked_getleaf 36.9410μs 6.1066μs 163.7585 KOps/s 165.2708 KOps/s $\color{#d91a1a}-0.92\%$
test_stacked_get 35.8410μs 5.8069μs 172.2083 KOps/s 174.0533 KOps/s $\color{#d91a1a}-1.06\%$
test_nested_getitemleaf 37.2410μs 6.1717μs 162.0306 KOps/s 159.5021 KOps/s $\color{#35bf28}+1.59\%$
test_nested_getitem 38.3910μs 5.9658μs 167.6211 KOps/s 170.6297 KOps/s $\color{#d91a1a}-1.76\%$
test_stacked_getitemleaf 36.0810μs 6.2178μs 160.8296 KOps/s 161.5666 KOps/s $\color{#d91a1a}-0.46\%$
test_stacked_getitem 35.8510μs 5.9115μs 169.1625 KOps/s 169.4064 KOps/s $\color{#d91a1a}-0.14\%$
test_lock_nested 2.4897ms 0.3669ms 2.7254 KOps/s 2.7169 KOps/s $\color{#35bf28}+0.31\%$
test_lock_stack_nested 0.3852ms 0.3396ms 2.9444 KOps/s 2.9781 KOps/s $\color{#d91a1a}-1.13\%$
test_unlock_nested 0.6449ms 0.3070ms 3.2574 KOps/s 3.2733 KOps/s $\color{#d91a1a}-0.49\%$
test_unlock_stack_nested 0.3193ms 0.2789ms 3.5849 KOps/s 3.6409 KOps/s $\color{#d91a1a}-1.54\%$
test_flatten_speed 0.1185ms 74.7354μs 13.3805 KOps/s 13.3075 KOps/s $\color{#35bf28}+0.55\%$
test_unflatten_speed 0.3592ms 0.3139ms 3.1861 KOps/s 3.1495 KOps/s $\color{#35bf28}+1.16\%$
test_common_ops 1.5438ms 0.5581ms 1.7917 KOps/s 1.6011 KOps/s $\textbf{\color{#35bf28}+11.90\%}$
test_creation 0.1049ms 1.7155μs 582.9296 KOps/s 585.1059 KOps/s $\color{#d91a1a}-0.37\%$
test_creation_empty 35.8410μs 6.3273μs 158.0456 KOps/s 105.7940 KOps/s $\textbf{\color{#35bf28}+49.39\%}$
test_creation_nested_1 31.7310μs 8.0371μs 124.4231 KOps/s 89.0497 KOps/s $\textbf{\color{#35bf28}+39.72\%}$
test_creation_nested_2 45.4110μs 10.6900μs 93.5454 KOps/s 72.0865 KOps/s $\textbf{\color{#35bf28}+29.77\%}$
test_clone 0.1339ms 10.3465μs 96.6508 KOps/s 96.3950 KOps/s $\color{#35bf28}+0.27\%$
test_getitem[int] 1.8925ms 10.4462μs 95.7285 KOps/s 94.5667 KOps/s $\color{#35bf28}+1.23\%$
test_getitem[slice_int] 0.1132ms 20.1614μs 49.5998 KOps/s 48.8534 KOps/s $\color{#35bf28}+1.53\%$
test_getitem[range] 0.1273ms 35.7358μs 27.9832 KOps/s 28.5043 KOps/s $\color{#d91a1a}-1.83\%$
test_getitem[tuple] 0.1095ms 17.8346μs 56.0708 KOps/s 56.8849 KOps/s $\color{#d91a1a}-1.43\%$
test_getitem[list] 0.1258ms 31.0987μs 32.1557 KOps/s 31.9050 KOps/s $\color{#35bf28}+0.79\%$
test_setitem_dim[int] 39.5410μs 19.4747μs 51.3488 KOps/s 58.3895 KOps/s $\textbf{\color{#d91a1a}-12.06\%}$
test_setitem_dim[slice_int] 60.7110μs 39.4593μs 25.3425 KOps/s 27.7685 KOps/s $\textbf{\color{#d91a1a}-8.74\%}$
test_setitem_dim[range] 83.3420μs 53.1100μs 18.8288 KOps/s 19.9883 KOps/s $\textbf{\color{#d91a1a}-5.80\%}$
test_setitem_dim[tuple] 53.6610μs 32.3456μs 30.9161 KOps/s 33.4253 KOps/s $\textbf{\color{#d91a1a}-7.51\%}$
test_setitem 0.1292ms 14.5082μs 68.9267 KOps/s 67.6088 KOps/s $\color{#35bf28}+1.95\%$
test_set 0.1264ms 13.1353μs 76.1305 KOps/s 67.5619 KOps/s $\textbf{\color{#35bf28}+12.68\%}$
test_set_shared 1.5529ms 0.1502ms 6.6581 KOps/s 6.6609 KOps/s $\color{#d91a1a}-0.04\%$
test_update 0.5479ms 14.9460μs 66.9075 KOps/s 54.1541 KOps/s $\textbf{\color{#35bf28}+23.55\%}$
test_update_nested 0.1214ms 20.1452μs 49.6396 KOps/s 42.6881 KOps/s $\textbf{\color{#35bf28}+16.28\%}$
test_update__nested 0.5557ms 24.2966μs 41.1581 KOps/s 41.5786 KOps/s $\color{#d91a1a}-1.01\%$
test_set_nested 0.1217ms 14.2145μs 70.3508 KOps/s 62.4676 KOps/s $\textbf{\color{#35bf28}+12.62\%}$
test_set_nested_new 0.1252ms 16.4815μs 60.6739 KOps/s 54.4067 KOps/s $\textbf{\color{#35bf28}+11.52\%}$
test_select 0.1626ms 27.7532μs 36.0319 KOps/s 33.1189 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_select_nested 67.9810μs 43.1336μs 23.1838 KOps/s 22.8462 KOps/s $\color{#35bf28}+1.48\%$
test_exclude_nested 92.5610μs 60.9812μs 16.3985 KOps/s 16.2323 KOps/s $\color{#35bf28}+1.02\%$
test_empty[True] 0.3180ms 0.2846ms 3.5137 KOps/s 3.4351 KOps/s $\color{#35bf28}+2.29\%$
test_empty[False] 3.9140μs 0.8302μs 1.2045 MOps/s 1.2157 MOps/s $\color{#d91a1a}-0.92\%$
test_to 84.2220μs 55.1751μs 18.1241 KOps/s 17.9590 KOps/s $\color{#35bf28}+0.92\%$
test_to_nonblocking 94.8210μs 49.2319μs 20.3121 KOps/s 21.3437 KOps/s $\color{#d91a1a}-4.83\%$
test_unbind_speed 1.6725ms 0.2327ms 4.2968 KOps/s 4.3701 KOps/s $\color{#d91a1a}-1.68\%$
test_unbind_speed_stack0 0.2897ms 0.2350ms 4.2548 KOps/s 4.3059 KOps/s $\color{#d91a1a}-1.19\%$
test_unbind_speed_stack1 93.4147ms 0.6653ms 1.5031 KOps/s 1.5238 KOps/s $\color{#d91a1a}-1.36\%$
test_split 94.8030ms 1.5818ms 632.1982 Ops/s 588.0183 Ops/s $\textbf{\color{#35bf28}+7.51\%}$
test_chunk 94.9267ms 1.5839ms 631.3413 Ops/s 701.3201 Ops/s $\textbf{\color{#d91a1a}-9.98\%}$
test_consolidate[False-None] 97.4774ms 2.8930ms 345.6674 Ops/s 341.1321 Ops/s $\color{#35bf28}+1.33\%$
test_consolidate[default-None] 1.7385ms 1.6287ms 613.9828 Ops/s 612.8844 Ops/s $\color{#35bf28}+0.18\%$
test_consolidate[reduce-overhead-None] 2.0543ms 1.6567ms 603.5997 Ops/s 596.4995 Ops/s $\color{#35bf28}+1.19\%$
test_consolidate_njt[False-None] 6.6218ms 6.1975ms 161.3543 Ops/s 161.9332 Ops/s $\color{#d91a1a}-0.36\%$
test_to[False-False-None] 2.0846ms 1.6707ms 598.5464 Ops/s 584.6755 Ops/s $\color{#35bf28}+2.37\%$
test_to[True-False-None] 1.6664ms 1.2325ms 811.3674 Ops/s 811.7336 Ops/s $\color{#d91a1a}-0.05\%$
test_to[within-False-None] 4.3362ms 3.9379ms 253.9437 Ops/s 250.9985 Ops/s $\color{#35bf28}+1.17\%$
test_to[True-default-None] 5.4126ms 5.0060ms 199.7614 Ops/s 201.4812 Ops/s $\color{#d91a1a}-0.85\%$
test_to_njt[False-False-None] 7.1342ms 6.7495ms 148.1597 Ops/s 149.1523 Ops/s $\color{#d91a1a}-0.67\%$
test_to_njt[True-False-None] 5.6006ms 5.1813ms 193.0029 Ops/s 193.7095 Ops/s $\color{#d91a1a}-0.36\%$
test_to_njt[within-False-None] 11.8562ms 11.4682ms 87.1979 Ops/s 87.9750 Ops/s $\color{#d91a1a}-0.88\%$
test_creation[device0] 0.5437ms 78.6331μs 12.7173 KOps/s 12.8023 KOps/s $\color{#d91a1a}-0.66\%$
test_creation_from_tensor 0.6700ms 81.7413μs 12.2337 KOps/s 12.2106 KOps/s $\color{#35bf28}+0.19\%$
test_add_one[memmap_tensor0] 0.3183ms 6.4353μs 155.3920 KOps/s 156.6005 KOps/s $\color{#d91a1a}-0.77\%$
test_contiguous[memmap_tensor0] 20.2948μs 0.3952μs 2.5306 MOps/s 2.4889 MOps/s $\color{#35bf28}+1.67\%$
test_stack[memmap_tensor0] 21.9110μs 4.1377μs 241.6785 KOps/s 241.0583 KOps/s $\color{#35bf28}+0.26\%$
test_memmaptd_index 1.6827ms 0.2371ms 4.2171 KOps/s 4.1992 KOps/s $\color{#35bf28}+0.43\%$
test_memmaptd_index_astensor 0.5696ms 0.2959ms 3.3793 KOps/s 3.3601 KOps/s $\color{#35bf28}+0.57\%$
test_memmaptd_index_op 0.9905ms 0.5235ms 1.9104 KOps/s 1.7387 KOps/s $\textbf{\color{#35bf28}+9.88\%}$
test_serialize_model 0.1320s 0.1310s 7.6319 Ops/s 7.6805 Ops/s $\color{#d91a1a}-0.63\%$
test_serialize_model_pickle 1.3661s 1.2149s 0.8231 Ops/s 0.8240 Ops/s $\color{#d91a1a}-0.11\%$
test_serialize_weights 0.4169s 0.1713s 5.8365 Ops/s 7.7004 Ops/s $\textbf{\color{#d91a1a}-24.21\%}$
test_serialize_weights_returnearly 0.3353s 53.5928ms 18.6592 Ops/s 11.2767 Ops/s $\textbf{\color{#35bf28}+65.47\%}$
test_serialize_weights_pickle 1.3779s 1.2236s 0.8172 Ops/s 0.8376 Ops/s $\color{#d91a1a}-2.43\%$
test_reshape_pytree 65.5710μs 21.7255μs 46.0288 KOps/s 46.3895 KOps/s $\color{#d91a1a}-0.78\%$
test_reshape_td 47.4510μs 25.9695μs 38.5067 KOps/s 38.4272 KOps/s $\color{#35bf28}+0.21\%$
test_view_pytree 48.7810μs 21.7494μs 45.9784 KOps/s 46.8612 KOps/s $\color{#d91a1a}-1.88\%$
test_view_td 56.2010μs 28.8519μs 34.6598 KOps/s 32.2523 KOps/s $\textbf{\color{#35bf28}+7.46\%}$
test_unbind_pytree 55.1910μs 27.5688μs 36.2729 KOps/s 36.6336 KOps/s $\color{#d91a1a}-0.98\%$
test_unbind_td 0.7599ms 35.6447μs 28.0547 KOps/s 28.4816 KOps/s $\color{#d91a1a}-1.50\%$
test_split_pytree 58.2510μs 29.4295μs 33.9795 KOps/s 34.2073 KOps/s $\color{#d91a1a}-0.67\%$
test_split_td 0.9588ms 37.3500μs 26.7737 KOps/s 26.4918 KOps/s $\color{#35bf28}+1.06\%$
test_add_pytree 64.2210μs 33.3374μs 29.9963 KOps/s 30.5967 KOps/s $\color{#d91a1a}-1.96\%$
test_add_td 85.1010μs 44.5908μs 22.4261 KOps/s 21.1410 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_compile_add_one_nested[tensordict-compile] 0.1689ms 0.1216ms 8.2228 KOps/s 8.2794 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_add_one_nested[tensordict-eager] 0.2184ms 0.1287ms 7.7723 KOps/s 7.8883 KOps/s $\color{#d91a1a}-1.47\%$
test_compile_add_one_nested[pytree-compile] 0.1390ms 96.6842μs 10.3430 KOps/s 10.4600 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_add_one_nested[pytree-eager] 2.2948ms 0.1451ms 6.8908 KOps/s 6.9369 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_copy_nested[tensordict-compile] 62.8410μs 23.8847μs 41.8678 KOps/s 47.2592 KOps/s $\textbf{\color{#d91a1a}-11.41\%}$
test_compile_copy_nested[tensordict-eager] 52.5210μs 28.7803μs 34.7460 KOps/s 34.5092 KOps/s $\color{#35bf28}+0.69\%$
test_compile_copy_nested[pytree-compile] 0.3892ms 63.3215μs 15.7924 KOps/s 15.6874 KOps/s $\color{#35bf28}+0.67\%$
test_compile_copy_nested[pytree-eager] 83.4810μs 48.8205μs 20.4832 KOps/s 20.2041 KOps/s $\color{#35bf28}+1.38\%$
test_compile_add_one_flat[tensordict-compile] 0.1829ms 0.1414ms 7.0742 KOps/s 7.1401 KOps/s $\color{#d91a1a}-0.92\%$
test_compile_add_one_flat[tensordict-eager] 0.3118ms 0.2142ms 4.6692 KOps/s 4.7505 KOps/s $\color{#d91a1a}-1.71\%$
test_compile_add_one_flat[tensorclass-compile] 0.1493ms 0.1001ms 9.9919 KOps/s 10.4791 KOps/s $\color{#d91a1a}-4.65\%$
test_compile_add_one_flat[tensorclass-eager] 0.1151ms 55.2482μs 18.1001 KOps/s 19.3240 KOps/s $\textbf{\color{#d91a1a}-6.33\%}$
test_compile_add_one_flat[pytree-compile] 0.2735ms 0.1342ms 7.4500 KOps/s 7.4865 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_add_one_flat[pytree-eager] 0.4979ms 0.4605ms 2.1716 KOps/s 2.1576 KOps/s $\color{#35bf28}+0.65\%$
test_compile_add_self_flat[tensordict-eager] 0.3710ms 0.2578ms 3.8784 KOps/s 3.9053 KOps/s $\color{#d91a1a}-0.69\%$
test_compile_add_self_flat[tensordict-compile] 0.1893ms 0.1422ms 7.0301 KOps/s 7.1303 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_add_self_flat[tensorclass-eager] 0.1485ms 68.3549μs 14.6295 KOps/s 15.9619 KOps/s $\textbf{\color{#d91a1a}-8.35\%}$
test_compile_add_self_flat[tensorclass-compile] 0.1376ms 98.9682μs 10.1043 KOps/s 10.3925 KOps/s $\color{#d91a1a}-2.77\%$
test_compile_add_self_flat[pytree-eager] 0.4575ms 0.3936ms 2.5404 KOps/s 2.5162 KOps/s $\color{#35bf28}+0.96\%$
test_compile_add_self_flat[pytree-compile] 0.1712ms 0.1342ms 7.4508 KOps/s 7.5820 KOps/s $\color{#d91a1a}-1.73\%$
test_compile_copy_flat[tensordict-compile] 46.6710μs 19.4782μs 51.3394 KOps/s 58.8694 KOps/s $\textbf{\color{#d91a1a}-12.79\%}$
test_compile_copy_flat[tensordict-eager] 58.8810μs 31.0452μs 32.2111 KOps/s 32.3612 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_copy_flat[pytree-compile] 0.2198ms 70.6344μs 14.1574 KOps/s 14.4122 KOps/s $\color{#d91a1a}-1.77\%$
test_compile_copy_flat[pytree-eager] 81.3110μs 51.9152μs 19.2622 KOps/s 19.5329 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_assign_and_add[tensordict-compile] 1.5901ms 0.3833ms 2.6093 KOps/s 2.2918 KOps/s $\textbf{\color{#35bf28}+13.85\%}$
test_compile_assign_and_add[tensordict-eager] 2.6451ms 2.5521ms 391.8343 Ops/s 386.9012 Ops/s $\color{#35bf28}+1.28\%$
test_compile_assign_and_add[pytree-compile] 1.5714ms 0.4311ms 2.3197 KOps/s 2.3203 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_assign_and_add[pytree-eager] 2.7349ms 2.5585ms 390.8599 Ops/s 384.8160 Ops/s $\color{#35bf28}+1.57\%$
test_compile_indexing[tensor-tensordict-compile] 0.1687ms 0.1142ms 8.7552 KOps/s 9.0499 KOps/s $\color{#d91a1a}-3.26\%$
test_compile_indexing[tensor-tensordict-eager] 0.5791ms 78.3966μs 12.7557 KOps/s 12.8341 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2136ms 0.1071ms 9.3342 KOps/s 9.7249 KOps/s $\color{#d91a1a}-4.02\%$
test_compile_indexing[tensor-tensorclass-eager] 0.5082ms 68.7782μs 14.5395 KOps/s 14.8178 KOps/s $\color{#d91a1a}-1.88\%$
test_compile_indexing[tensor-pytree-compile] 0.5607ms 0.1101ms 9.0855 KOps/s 9.6647 KOps/s $\textbf{\color{#d91a1a}-5.99\%}$
test_compile_indexing[tensor-pytree-eager] 0.4706ms 68.3683μs 14.6267 KOps/s 14.1395 KOps/s $\color{#35bf28}+3.45\%$
test_compile_indexing[slice-tensordict-compile] 0.5180ms 0.1030ms 9.7113 KOps/s 10.2092 KOps/s $\color{#d91a1a}-4.88\%$
test_compile_indexing[slice-tensordict-eager] 0.1428ms 15.8363μs 63.1460 KOps/s 60.7160 KOps/s $\color{#35bf28}+4.00\%$
test_compile_indexing[slice-tensorclass-compile] 0.3531ms 95.0011μs 10.5262 KOps/s 10.7237 KOps/s $\color{#d91a1a}-1.84\%$
test_compile_indexing[slice-tensorclass-eager] 41.3410μs 15.4608μs 64.6797 KOps/s 63.2509 KOps/s $\color{#35bf28}+2.26\%$
test_compile_indexing[slice-pytree-compile] 0.1396ms 98.6098μs 10.1410 KOps/s 10.5923 KOps/s $\color{#d91a1a}-4.26\%$
test_compile_indexing[slice-pytree-eager] 50.0710μs 15.4365μs 64.7816 KOps/s 64.2667 KOps/s $\color{#35bf28}+0.80\%$
test_compile_indexing[int-tensordict-compile] 0.1401ms 98.6363μs 10.1383 KOps/s 10.0931 KOps/s $\color{#35bf28}+0.45\%$
test_compile_indexing[int-tensordict-eager] 0.5705ms 16.3988μs 60.9800 KOps/s 58.5174 KOps/s $\color{#35bf28}+4.21\%$
test_compile_indexing[int-tensorclass-compile] 0.1389ms 95.3209μs 10.4909 KOps/s 10.5853 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_indexing[int-tensorclass-eager] 51.4400μs 15.2819μs 65.4367 KOps/s 64.5751 KOps/s $\color{#35bf28}+1.33\%$
test_compile_indexing[int-pytree-compile] 0.1479ms 95.0216μs 10.5239 KOps/s 10.6002 KOps/s $\color{#d91a1a}-0.72\%$
test_compile_indexing[int-pytree-eager] 44.8610μs 15.3757μs 65.0378 KOps/s 64.4444 KOps/s $\color{#35bf28}+0.92\%$
test_mod_add[eager] 0.1684ms 35.1677μs 28.4352 KOps/s 26.3182 KOps/s $\textbf{\color{#35bf28}+8.04\%}$
test_mod_add[compile] 0.2132ms 81.6481μs 12.2477 KOps/s 12.8050 KOps/s $\color{#d91a1a}-4.35\%$
test_mod_add[compile-overhead] 0.3185ms 0.1697ms 5.8910 KOps/s 5.7977 KOps/s $\color{#35bf28}+1.61\%$
test_mod_wrap[eager] 0.3205ms 0.2388ms 4.1882 KOps/s 4.0474 KOps/s $\color{#35bf28}+3.48\%$
test_mod_wrap[compile] 0.3483ms 0.2795ms 3.5779 KOps/s 3.4559 KOps/s $\color{#35bf28}+3.53\%$
test_mod_wrap[compile-overhead] 7.1909ms 3.6595ms 273.2603 Ops/s 274.5701 Ops/s $\color{#d91a1a}-0.48\%$
test_mod_wrap_and_backward[eager] 1.6099ms 1.3477ms 742.0096 Ops/s 724.4654 Ops/s $\color{#35bf28}+2.42\%$
test_mod_wrap_and_backward[compile] 1.3452ms 1.2255ms 816.0079 Ops/s 799.9640 Ops/s $\color{#35bf28}+2.01\%$
test_mod_wrap_and_backward[compile-overhead] 1.3437ms 0.9103ms 1.0986 KOps/s 1.0770 KOps/s $\color{#35bf28}+2.00\%$
test_seq_add[eager] 0.2098ms 0.1062ms 9.4150 KOps/s 8.8469 KOps/s $\textbf{\color{#35bf28}+6.42\%}$
test_seq_add[compile] 0.1836ms 85.3461μs 11.7170 KOps/s 11.6339 KOps/s $\color{#35bf28}+0.71\%$
test_seq_add[compile-overhead] 0.2349ms 0.1317ms 7.5933 KOps/s 7.9350 KOps/s $\color{#d91a1a}-4.31\%$
test_seq_wrap[eager] 0.5209ms 0.4112ms 2.4319 KOps/s 2.4195 KOps/s $\color{#35bf28}+0.52\%$
test_seq_wrap[compile] 0.4052ms 0.2928ms 3.4150 KOps/s 3.4236 KOps/s $\color{#d91a1a}-0.25\%$
test_seq_wrap[compile-overhead] 0.3050ms 0.2187ms 4.5721 KOps/s 4.5785 KOps/s $\color{#d91a1a}-0.14\%$
test_func_call_runtime[False-eager] 0.7837ms 0.7138ms 1.4009 KOps/s 1.3911 KOps/s $\color{#35bf28}+0.70\%$
test_func_call_runtime[False-compile] 0.8063ms 0.7179ms 1.3929 KOps/s 1.3970 KOps/s $\color{#d91a1a}-0.29\%$
test_func_call_runtime[False-compile-overhead] 0.4048ms 0.3526ms 2.8359 KOps/s 2.8504 KOps/s $\color{#d91a1a}-0.51\%$
test_func_call_runtime[True-eager] 0.9561ms 0.8805ms 1.1357 KOps/s 1.1346 KOps/s $\color{#35bf28}+0.10\%$
test_func_call_runtime[True-compile] 0.8135ms 0.7367ms 1.3574 KOps/s 1.3538 KOps/s $\color{#35bf28}+0.27\%$
test_func_call_runtime[True-compile-overhead] 0.4468ms 0.3747ms 2.6690 KOps/s 2.6843 KOps/s $\color{#d91a1a}-0.57\%$
test_func_call_cm_runtime[False-eager] 0.7687ms 0.7150ms 1.3985 KOps/s 1.4007 KOps/s $\color{#d91a1a}-0.16\%$
test_func_call_cm_runtime[False-compile] 0.8367ms 0.7216ms 1.3858 KOps/s 1.3925 KOps/s $\color{#d91a1a}-0.48\%$
test_func_call_cm_runtime[False-compile-overhead] 0.3960ms 0.3562ms 2.8072 KOps/s 2.8370 KOps/s $\color{#d91a1a}-1.05\%$
test_func_call_cm_runtime[True-eager] 1.0663ms 0.9767ms 1.0238 KOps/s 1.0076 KOps/s $\color{#35bf28}+1.61\%$
test_func_call_cm_runtime[True-compile] 0.8037ms 0.7644ms 1.3083 KOps/s 1.3083 KOps/s $-0.01\%$
test_func_call_cm_runtime[True-compile-overhead] 0.4470ms 0.3987ms 2.5081 KOps/s 2.4996 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_func_call_cm_runtime[eager] 2.4867ms 2.0437ms 489.3026 Ops/s 482.8714 Ops/s $\color{#35bf28}+1.33\%$
test_vmap_func_call_cm_runtime[compile] 0.8365ms 0.7822ms 1.2784 KOps/s 1.2793 KOps/s $\color{#d91a1a}-0.07\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.4524ms 0.4029ms 2.4818 KOps/s 2.4782 KOps/s $\color{#35bf28}+0.14\%$
test_distributed 3.1187ms 0.2071ms 4.8277 KOps/s 7.8828 KOps/s $\textbf{\color{#d91a1a}-38.76\%}$
test_tdmodule 0.3577ms 18.9214μs 52.8503 KOps/s 51.4240 KOps/s $\color{#35bf28}+2.77\%$
test_tdmodule_dispatch 55.2110μs 32.8150μs 30.4739 KOps/s 28.1731 KOps/s $\textbf{\color{#35bf28}+8.17\%}$
test_tdseq 27.5000μs 18.9159μs 52.8654 KOps/s 47.4550 KOps/s $\textbf{\color{#35bf28}+11.40\%}$
test_tdseq_dispatch 62.8410μs 35.2202μs 28.3928 KOps/s 25.6071 KOps/s $\textbf{\color{#35bf28}+10.88\%}$
test_instantiation_functorch 1.6245ms 1.5125ms 661.1401 Ops/s 669.1552 Ops/s $\color{#d91a1a}-1.20\%$
test_exec_functorch 0.1809ms 0.1391ms 7.1872 KOps/s 7.1912 KOps/s $\color{#d91a1a}-0.05\%$
test_exec_functional_call 0.1734ms 0.1335ms 7.4919 KOps/s 7.6330 KOps/s $\color{#d91a1a}-1.85\%$
test_exec_td_decorator 0.4056ms 0.1815ms 5.5093 KOps/s 5.5332 KOps/s $\color{#d91a1a}-0.43\%$
test_vmap_mlp_speed_decorator[True-True] 0.7601ms 0.6697ms 1.4931 KOps/s 1.4824 KOps/s $\color{#35bf28}+0.72\%$
test_vmap_mlp_speed_decorator[True-False] 0.8448ms 0.6711ms 1.4902 KOps/s 1.4802 KOps/s $\color{#35bf28}+0.67\%$
test_vmap_mlp_speed_decorator[False-True] 0.7030ms 0.5836ms 1.7136 KOps/s 1.7081 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[False-False] 0.7018ms 0.5818ms 1.7188 KOps/s 1.7034 KOps/s $\color{#35bf28}+0.90\%$
test_vmap_transformer_speed_decorator[True-True] 19.2270ms 18.8864ms 52.9482 Ops/s 52.6929 Ops/s $\color{#35bf28}+0.48\%$
test_vmap_transformer_speed_decorator[True-False] 19.0696ms 18.9503ms 52.7697 Ops/s 52.5462 Ops/s $\color{#35bf28}+0.43\%$
test_vmap_transformer_speed_decorator[False-True] 18.9485ms 18.8230ms 53.1264 Ops/s 53.0754 Ops/s $\color{#35bf28}+0.10\%$
test_vmap_transformer_speed_decorator[False-False] 18.8633ms 18.7853ms 53.2331 Ops/s 52.9574 Ops/s $\color{#35bf28}+0.52\%$
test_to_module_speed[True] 1.0687ms 0.9605ms 1.0412 KOps/s 1.0509 KOps/s $\color{#d91a1a}-0.93\%$
test_to_module_speed[False] 1.3370ms 0.9553ms 1.0468 KOps/s 1.0759 KOps/s $\color{#d91a1a}-2.70\%$
test_tc_init 64.1310μs 32.3728μs 30.8901 KOps/s 29.0539 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_tc_init_nested 0.1137ms 66.1733μs 15.1118 KOps/s 14.2634 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_tc_first_layer_tensor 5.0843μs 0.7028μs 1.4228 MOps/s 1.4667 MOps/s $\color{#d91a1a}-2.99\%$
test_tc_first_layer_nontensor 23.8200μs 2.2436μs 445.7060 KOps/s 454.9922 KOps/s $\color{#d91a1a}-2.04\%$
test_tc_second_layer_tensor 9.5533μs 1.4446μs 692.2125 KOps/s 710.6785 KOps/s $\color{#d91a1a}-2.60\%$
test_tc_second_layer_nontensor 0.1388ms 2.9962μs 333.7574 KOps/s 337.1868 KOps/s $\color{#d91a1a}-1.02\%$
test_unbind 0.2217s 10.1712ms 98.3170 Ops/s 146.2863 Ops/s $\textbf{\color{#d91a1a}-32.79\%}$
test_full_like 9.8135ms 9.1927ms 108.7818 Ops/s 107.6308 Ops/s $\color{#35bf28}+1.07\%$
test_zeros_like 4.8998ms 4.3290ms 231.0023 Ops/s 234.6626 Ops/s $\color{#d91a1a}-1.56\%$
test_ones_like 4.6579ms 4.3359ms 230.6325 Ops/s 230.7104 Ops/s $\color{#d91a1a}-0.03\%$
test_clone 6.7027ms 6.4046ms 156.1377 Ops/s 109.5115 Ops/s $\textbf{\color{#35bf28}+42.58\%}$
test_squeeze 57.8410μs 9.5536μs 104.6730 KOps/s 110.5279 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_unsqueeze 5.0006ms 71.4757μs 13.9908 KOps/s 14.9839 KOps/s $\textbf{\color{#d91a1a}-6.63\%}$
test_split 0.2526ms 0.1566ms 6.3858 KOps/s 6.6421 KOps/s $\color{#d91a1a}-3.86\%$
test_permute 0.2174ms 0.1751ms 5.7120 KOps/s 5.8879 KOps/s $\color{#d91a1a}-2.99\%$
test_stack 51.2982ms 50.7511ms 19.7040 Ops/s 19.5978 Ops/s $\color{#35bf28}+0.54\%$
test_cat 51.0116ms 50.6818ms 19.7309 Ops/s 19.7718 Ops/s $\color{#d91a1a}-0.21\%$

@vmoens
Copy link
Contributor Author

vmoens commented Jan 8, 2025

The following behavior is deprecated as part of this PR:

td0 = TensorDict(batch_size=[3, 4])
td1 = TensorDict(batch_size=[3, 4, 2])
td0 == td1

Previously, __eq__ and other unitary methods didn't look at the batch size of the TDs. Now, the TD shapes will be broadcast - in this example no compatible shape exists and an error is raised.

This change is necessary as this would be the expected behaviour:

td0 = TensorDict(a=torch.randn(3, 4), batch_size=[3, 4])
td1 = TensorDict(a=torch.randn(4), batch_size=[4])
td0 == td1 # works because td1 is broadcast to (3, 4)

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 8, 2025
ghstack-source-id: bbefbb1a2e9841847c618bb9cf49160ff1a5c36a
Pull Request resolved: #1166
@vmoens vmoens merged commit ea86a4b into gh/vmoens/46/base Jan 8, 2025
50 of 55 checks passed
vmoens added a commit that referenced this pull request Jan 8, 2025
ghstack-source-id: bbefbb1a2e9841847c618bb9cf49160ff1a5c36a
Pull Request resolved: #1166
@vmoens vmoens deleted the gh/vmoens/46/head branch January 8, 2025 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BC-breaking CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants