Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Better dtype coverage #834

Merged
merged 2 commits into from
Jun 25, 2024
Merged

[Feature] Better dtype coverage #834

merged 2 commits into from
Jun 25, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 25, 2024

closes #833

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2024
@vmoens vmoens added the enhancement New feature or request label Jun 25, 2024
Copy link

github-actions bot commented Jun 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}15$. Worsened: $\large\color{#d91a1a}19$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.3220μs 17.6406μs 56.6876 KOps/s 60.9173 KOps/s $\textbf{\color{#d91a1a}-6.94\%}$
test_plain_set_stack_nested 61.2940μs 17.6439μs 56.6769 KOps/s 60.9882 KOps/s $\textbf{\color{#d91a1a}-7.07\%}$
test_plain_set_nested_inplace 57.4980μs 19.7057μs 50.7468 KOps/s 54.0795 KOps/s $\textbf{\color{#d91a1a}-6.16\%}$
test_plain_set_stack_nested_inplace 54.9920μs 19.7685μs 50.5856 KOps/s 54.2124 KOps/s $\textbf{\color{#d91a1a}-6.69\%}$
test_items 22.8330μs 2.5405μs 393.6167 KOps/s 399.8147 KOps/s $\color{#d91a1a}-1.55\%$
test_items_nested 1.2688ms 0.2663ms 3.7552 KOps/s 3.7904 KOps/s $\color{#d91a1a}-0.93\%$
test_items_nested_locked 0.5090ms 0.2671ms 3.7445 KOps/s 3.7474 KOps/s $\color{#d91a1a}-0.08\%$
test_items_nested_leaf 0.1418ms 76.8830μs 13.0068 KOps/s 13.0226 KOps/s $\color{#d91a1a}-0.12\%$
test_items_stack_nested 0.5029ms 0.2692ms 3.7142 KOps/s 3.7283 KOps/s $\color{#d91a1a}-0.38\%$
test_items_stack_nested_leaf 0.3378ms 80.7692μs 12.3810 KOps/s 13.2810 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_items_stack_nested_locked 1.2876ms 0.2667ms 3.7496 KOps/s 3.7524 KOps/s $\color{#d91a1a}-0.07\%$
test_keys 35.8570μs 3.9396μs 253.8346 KOps/s 262.4316 KOps/s $\color{#d91a1a}-3.28\%$
test_keys_nested 0.2227ms 0.1361ms 7.3486 KOps/s 7.2301 KOps/s $\color{#35bf28}+1.64\%$
test_keys_nested_locked 0.7091ms 0.1419ms 7.0463 KOps/s 6.9539 KOps/s $\color{#35bf28}+1.33\%$
test_keys_nested_leaf 0.3413ms 0.1165ms 8.5861 KOps/s 8.5687 KOps/s $\color{#35bf28}+0.20\%$
test_keys_stack_nested 0.2234ms 0.1372ms 7.2896 KOps/s 7.3642 KOps/s $\color{#d91a1a}-1.01\%$
test_keys_stack_nested_leaf 0.2395ms 0.1166ms 8.5789 KOps/s 8.4965 KOps/s $\color{#35bf28}+0.97\%$
test_keys_stack_nested_locked 0.2529ms 0.1420ms 7.0406 KOps/s 7.1424 KOps/s $\color{#d91a1a}-1.43\%$
test_values 10.7225μs 1.1710μs 853.9843 KOps/s 872.0107 KOps/s $\color{#d91a1a}-2.07\%$
test_values_nested 0.1051ms 50.3682μs 19.8538 KOps/s 19.6896 KOps/s $\color{#35bf28}+0.83\%$
test_values_nested_locked 0.1158ms 50.3826μs 19.8481 KOps/s 19.6532 KOps/s $\color{#35bf28}+0.99\%$
test_values_nested_leaf 83.0650μs 45.3897μs 22.0314 KOps/s 21.8534 KOps/s $\color{#35bf28}+0.81\%$
test_values_stack_nested 0.1201ms 51.7158μs 19.3364 KOps/s 18.9330 KOps/s $\color{#35bf28}+2.13\%$
test_values_stack_nested_leaf 84.9490μs 46.0438μs 21.7184 KOps/s 21.6871 KOps/s $\color{#35bf28}+0.14\%$
test_values_stack_nested_locked 0.1064ms 51.5299μs 19.4062 KOps/s 19.0250 KOps/s $\color{#35bf28}+2.00\%$
test_membership 33.9630μs 1.3502μs 740.6242 KOps/s 730.3180 KOps/s $\color{#35bf28}+1.41\%$
test_membership_nested 29.4950μs 3.4416μs 290.5616 KOps/s 287.7178 KOps/s $\color{#35bf28}+0.99\%$
test_membership_nested_leaf 27.1100μs 3.4653μs 288.5752 KOps/s 291.9922 KOps/s $\color{#d91a1a}-1.17\%$
test_membership_stacked_nested 34.4040μs 3.4014μs 293.9966 KOps/s 289.7410 KOps/s $\color{#35bf28}+1.47\%$
test_membership_stacked_nested_leaf 27.2210μs 3.4481μs 290.0159 KOps/s 249.2783 KOps/s $\textbf{\color{#35bf28}+16.34\%}$
test_membership_nested_last 28.5640μs 4.1524μs 240.8225 KOps/s 233.7240 KOps/s $\color{#35bf28}+3.04\%$
test_membership_nested_leaf_last 36.2370μs 4.1696μs 239.8301 KOps/s 236.7697 KOps/s $\color{#35bf28}+1.29\%$
test_membership_stacked_nested_last 36.7290μs 4.1297μs 242.1458 KOps/s 75.9188 KOps/s $\textbf{\color{#35bf28}+218.95\%}$
test_membership_stacked_nested_leaf_last 28.5540μs 4.1516μs 240.8715 KOps/s 75.5629 KOps/s $\textbf{\color{#35bf28}+218.77\%}$
test_nested_getleaf 44.3720μs 10.4991μs 95.2467 KOps/s 95.2474 KOps/s $-0.00\%$
test_nested_get 58.7480μs 9.9796μs 100.2046 KOps/s 99.8959 KOps/s $\color{#35bf28}+0.31\%$
test_stacked_getleaf 51.1150μs 10.5903μs 94.4263 KOps/s 94.8685 KOps/s $\color{#d91a1a}-0.47\%$
test_stacked_get 45.3950μs 9.8193μs 101.8398 KOps/s 100.0066 KOps/s $\color{#35bf28}+1.83\%$
test_nested_getitemleaf 45.0540μs 11.0718μs 90.3197 KOps/s 87.8632 KOps/s $\color{#35bf28}+2.80\%$
test_nested_getitem 49.7830μs 10.1566μs 98.4584 KOps/s 97.0905 KOps/s $\color{#35bf28}+1.41\%$
test_stacked_getitemleaf 42.8400μs 11.0851μs 90.2110 KOps/s 90.2928 KOps/s $\color{#d91a1a}-0.09\%$
test_stacked_getitem 30.0160μs 10.0602μs 99.4016 KOps/s 97.3083 KOps/s $\color{#35bf28}+2.15\%$
test_lock_nested 49.0483ms 0.3864ms 2.5882 KOps/s 2.9567 KOps/s $\textbf{\color{#d91a1a}-12.46\%}$
test_lock_stack_nested 0.4900ms 0.3096ms 3.2298 KOps/s 3.3749 KOps/s $\color{#d91a1a}-4.30\%$
test_unlock_nested 0.9955ms 0.3537ms 2.8272 KOps/s 2.9057 KOps/s $\color{#d91a1a}-2.70\%$
test_unlock_stack_nested 0.6848ms 0.3196ms 3.1290 KOps/s 3.2976 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_flatten_speed 0.1755ms 96.1303μs 10.4025 KOps/s 10.3257 KOps/s $\color{#35bf28}+0.74\%$
test_unflatten_speed 0.9068ms 0.4113ms 2.4314 KOps/s 2.4551 KOps/s $\color{#d91a1a}-0.97\%$
test_common_ops 4.2398ms 0.7244ms 1.3805 KOps/s 1.4414 KOps/s $\color{#d91a1a}-4.22\%$
test_creation 15.7290μs 1.9030μs 525.4857 KOps/s 504.2000 KOps/s $\color{#35bf28}+4.22\%$
test_creation_empty 37.7010μs 11.1548μs 89.6473 KOps/s 108.5557 KOps/s $\textbf{\color{#d91a1a}-17.42\%}$
test_creation_nested_1 52.4380μs 13.9047μs 71.9183 KOps/s 83.1157 KOps/s $\textbf{\color{#d91a1a}-13.47\%}$
test_creation_nested_2 0.1788ms 17.4905μs 57.1740 KOps/s 65.8842 KOps/s $\textbf{\color{#d91a1a}-13.22\%}$
test_clone 0.1182ms 13.4154μs 74.5411 KOps/s 73.1633 KOps/s $\color{#35bf28}+1.88\%$
test_getitem[int] 45.2540μs 11.1878μs 89.3827 KOps/s 86.1007 KOps/s $\color{#35bf28}+3.81\%$
test_getitem[slice_int] 85.5290μs 22.4031μs 44.6367 KOps/s 43.1471 KOps/s $\color{#35bf28}+3.45\%$
test_getitem[range] 80.6200μs 59.2124μs 16.8884 KOps/s 16.5402 KOps/s $\color{#35bf28}+2.10\%$
test_getitem[tuple] 49.2510μs 18.8450μs 53.0645 KOps/s 52.2920 KOps/s $\color{#35bf28}+1.48\%$
test_getitem[list] 0.1398ms 39.5393μs 25.2913 KOps/s 24.0768 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_setitem_dim[int] 61.5140μs 34.4614μs 29.0180 KOps/s 31.5197 KOps/s $\textbf{\color{#d91a1a}-7.94\%}$
test_setitem_dim[slice_int] 0.1154ms 60.9041μs 16.4192 KOps/s 16.7584 KOps/s $\color{#d91a1a}-2.02\%$
test_setitem_dim[range] 0.1385ms 82.4719μs 12.1253 KOps/s 12.1462 KOps/s $\color{#d91a1a}-0.17\%$
test_setitem_dim[tuple] 0.1010ms 50.0891μs 19.9644 KOps/s 20.9771 KOps/s $\color{#d91a1a}-4.83\%$
test_setitem 57.7970μs 20.6999μs 48.3095 KOps/s 50.3892 KOps/s $\color{#d91a1a}-4.13\%$
test_set 96.2790μs 20.3214μs 49.2092 KOps/s 46.6368 KOps/s $\textbf{\color{#35bf28}+5.52\%}$
test_set_shared 1.5681ms 0.1383ms 7.2298 KOps/s 7.0194 KOps/s $\color{#35bf28}+3.00\%$
test_update 85.1490μs 23.2378μs 43.0334 KOps/s 47.4416 KOps/s $\textbf{\color{#d91a1a}-9.29\%}$
test_update_nested 76.3220μs 32.2410μs 31.0165 KOps/s 32.8176 KOps/s $\textbf{\color{#d91a1a}-5.49\%}$
test_update__nested 83.9570μs 25.3262μs 39.4847 KOps/s 39.5709 KOps/s $\color{#d91a1a}-0.22\%$
test_set_nested 66.0530μs 22.2481μs 44.9476 KOps/s 47.0241 KOps/s $\color{#d91a1a}-4.42\%$
test_set_nested_new 84.3370μs 26.1634μs 38.2214 KOps/s 38.9203 KOps/s $\color{#d91a1a}-1.80\%$
test_select 0.1120ms 40.8958μs 24.4524 KOps/s 24.5734 KOps/s $\color{#d91a1a}-0.49\%$
test_select_nested 0.1246ms 59.9362μs 16.6844 KOps/s 16.4051 KOps/s $\color{#35bf28}+1.70\%$
test_exclude_nested 0.2380ms 0.1194ms 8.3719 KOps/s 8.1861 KOps/s $\color{#35bf28}+2.27\%$
test_empty[True] 0.7267ms 0.3976ms 2.5152 KOps/s 2.5755 KOps/s $\color{#d91a1a}-2.34\%$
test_empty[False] 6.3468μs 1.1661μs 857.5635 KOps/s 856.1258 KOps/s $\color{#35bf28}+0.17\%$
test_unbind_speed 0.4118ms 0.2546ms 3.9283 KOps/s 3.9633 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_speed_stack0 0.5002ms 0.2502ms 3.9968 KOps/s 4.1264 KOps/s $\color{#d91a1a}-3.14\%$
test_unbind_speed_stack1 0.9197ms 0.6367ms 1.5706 KOps/s 1.4340 KOps/s $\textbf{\color{#35bf28}+9.53\%}$
test_split 70.0719ms 1.5988ms 625.4621 Ops/s 583.3214 Ops/s $\textbf{\color{#35bf28}+7.22\%}$
test_chunk 71.5994ms 1.6002ms 624.9150 Ops/s 630.7567 Ops/s $\color{#d91a1a}-0.93\%$
test_creation[device0] 0.2521ms 85.3757μs 11.7129 KOps/s 11.7124 KOps/s $+0.00\%$
test_creation_from_tensor 3.4505ms 86.2165μs 11.5987 KOps/s 11.9709 KOps/s $\color{#d91a1a}-3.11\%$
test_add_one[memmap_tensor0] 89.4460μs 5.2436μs 190.7082 KOps/s 175.2390 KOps/s $\textbf{\color{#35bf28}+8.83\%}$
test_contiguous[memmap_tensor0] 21.7410μs 0.6321μs 1.5820 MOps/s 1.5813 MOps/s $\color{#35bf28}+0.04\%$
test_stack[memmap_tensor0] 16.9220μs 3.5596μs 280.9334 KOps/s 265.2076 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_memmaptd_index 0.9549ms 0.2538ms 3.9407 KOps/s 3.9607 KOps/s $\color{#d91a1a}-0.51\%$
test_memmaptd_index_astensor 0.7716ms 0.3266ms 3.0621 KOps/s 3.0830 KOps/s $\color{#d91a1a}-0.68\%$
test_memmaptd_index_op 0.8960ms 0.6149ms 1.6263 KOps/s 1.6772 KOps/s $\color{#d91a1a}-3.03\%$
test_serialize_model 0.1750s 0.1112s 8.9918 Ops/s 8.9016 Ops/s $\color{#35bf28}+1.01\%$
test_serialize_model_pickle 0.4663s 0.3850s 2.5972 Ops/s 2.6222 Ops/s $\color{#d91a1a}-0.95\%$
test_serialize_weights 0.1079s 0.1018s 9.8209 Ops/s 8.7553 Ops/s $\textbf{\color{#35bf28}+12.17\%}$
test_serialize_weights_returnearly 0.1930s 0.1354s 7.3837 Ops/s 7.3855 Ops/s $\color{#d91a1a}-0.02\%$
test_serialize_weights_pickle 0.8475s 0.4971s 2.0115 Ops/s 1.5489 Ops/s $\textbf{\color{#35bf28}+29.87\%}$
test_serialize_weights_filesystem 0.1503s 97.4455ms 10.2621 Ops/s 10.2794 Ops/s $\color{#d91a1a}-0.17\%$
test_serialize_model_filesystem 0.1004s 94.7147ms 10.5580 Ops/s 10.9260 Ops/s $\color{#d91a1a}-3.37\%$
test_reshape_pytree 54.7930μs 25.3869μs 39.3904 KOps/s 38.0322 KOps/s $\color{#35bf28}+3.57\%$
test_reshape_td 0.1096ms 34.1862μs 29.2515 KOps/s 28.8925 KOps/s $\color{#35bf28}+1.24\%$
test_view_pytree 55.3230μs 25.2340μs 39.6290 KOps/s 39.2903 KOps/s $\color{#35bf28}+0.86\%$
test_view_td 0.1017ms 38.4660μs 25.9970 KOps/s 25.0416 KOps/s $\color{#35bf28}+3.81\%$
test_unbind_pytree 75.5810μs 29.6453μs 33.7322 KOps/s 34.0307 KOps/s $\color{#d91a1a}-0.88\%$
test_unbind_td 0.4276ms 37.6249μs 26.5781 KOps/s 26.5193 KOps/s $\color{#35bf28}+0.22\%$
test_split_pytree 71.2120μs 29.0923μs 34.3733 KOps/s 34.5544 KOps/s $\color{#d91a1a}-0.52\%$
test_split_td 0.5844ms 40.3137μs 24.8055 KOps/s 24.3541 KOps/s $\color{#35bf28}+1.85\%$
test_add_pytree 0.1137ms 34.2533μs 29.1943 KOps/s 28.5863 KOps/s $\color{#35bf28}+2.13\%$
test_add_td 0.1191ms 53.9339μs 18.5412 KOps/s 18.9175 KOps/s $\color{#d91a1a}-1.99\%$
test_distributed 0.2556ms 0.1025ms 9.7558 KOps/s 9.8269 KOps/s $\color{#d91a1a}-0.72\%$
test_tdmodule 86.9020μs 18.1419μs 55.1209 KOps/s 58.6647 KOps/s $\textbf{\color{#d91a1a}-6.04\%}$
test_tdmodule_dispatch 71.0320μs 34.8932μs 28.6588 KOps/s 29.5248 KOps/s $\color{#d91a1a}-2.93\%$
test_tdseq 39.3130μs 20.8895μs 47.8710 KOps/s 42.9639 KOps/s $\textbf{\color{#35bf28}+11.42\%}$
test_tdseq_dispatch 79.9090μs 41.4587μs 24.1204 KOps/s 25.5743 KOps/s $\textbf{\color{#d91a1a}-5.69\%}$
test_instantiation_functorch 1.5655ms 1.3058ms 765.7923 Ops/s 760.6286 Ops/s $\color{#35bf28}+0.68\%$
test_instantiation_td 2.0610ms 1.0476ms 954.5771 Ops/s 991.2174 Ops/s $\color{#d91a1a}-3.70\%$
test_exec_functorch 0.3133ms 0.1588ms 6.2963 KOps/s 6.2404 KOps/s $\color{#35bf28}+0.90\%$
test_exec_functional_call 0.2787ms 0.1470ms 6.8045 KOps/s 6.6428 KOps/s $\color{#35bf28}+2.43\%$
test_exec_td 0.2963ms 0.1423ms 7.0270 KOps/s 6.8309 KOps/s $\color{#35bf28}+2.87\%$
test_exec_td_decorator 0.6566ms 0.2195ms 4.5556 KOps/s 4.5460 KOps/s $\color{#35bf28}+0.21\%$
test_vmap_mlp_speed[True-True] 0.8162ms 0.5016ms 1.9938 KOps/s 2.0817 KOps/s $\color{#d91a1a}-4.22\%$
test_vmap_mlp_speed[True-False] 0.5962ms 0.4771ms 2.0960 KOps/s 2.1133 KOps/s $\color{#d91a1a}-0.82\%$
test_vmap_mlp_speed[False-True] 0.6692ms 0.3881ms 2.5765 KOps/s 2.5759 KOps/s $\color{#35bf28}+0.02\%$
test_vmap_mlp_speed[False-False] 0.7284ms 0.3928ms 2.5459 KOps/s 2.5677 KOps/s $\color{#d91a1a}-0.85\%$
test_vmap_mlp_speed_decorator[True-True] 1.0313ms 0.5497ms 1.8190 KOps/s 1.8211 KOps/s $\color{#d91a1a}-0.11\%$
test_vmap_mlp_speed_decorator[True-False] 1.0375ms 0.5532ms 1.8077 KOps/s 1.8251 KOps/s $\color{#d91a1a}-0.95\%$
test_vmap_mlp_speed_decorator[False-True] 0.7745ms 0.4519ms 2.2127 KOps/s 2.1619 KOps/s $\color{#35bf28}+2.35\%$
test_vmap_mlp_speed_decorator[False-False] 0.5844ms 0.4514ms 2.2153 KOps/s 2.2254 KOps/s $\color{#d91a1a}-0.45\%$
test_to_module_speed[True] 2.6682ms 1.6793ms 595.4995 Ops/s 595.1880 Ops/s $\color{#35bf28}+0.05\%$
test_to_module_speed[False] 1.8689ms 1.6617ms 601.7841 Ops/s 595.1387 Ops/s $\color{#35bf28}+1.12\%$
test_tc_init 62.5360μs 30.7900μs 32.4781 KOps/s 37.4213 KOps/s $\textbf{\color{#d91a1a}-13.21\%}$
test_tc_init_nested 0.1536ms 63.2081μs 15.8208 KOps/s 19.0623 KOps/s $\textbf{\color{#d91a1a}-17.00\%}$
test_tc_first_layer_tensor 5.1109μs 0.6861μs 1.4575 MOps/s 1.5354 MOps/s $\textbf{\color{#d91a1a}-5.08\%}$
test_tc_first_layer_nontensor 1.9100μs 0.6722μs 1.4875 MOps/s 1.4861 MOps/s $\color{#35bf28}+0.10\%$
test_tc_second_layer_tensor 14.0560μs 1.8349μs 544.9962 KOps/s 549.7330 KOps/s $\color{#d91a1a}-0.86\%$
test_tc_second_layer_nontensor 11.6557μs 1.5101μs 662.2217 KOps/s 673.8859 KOps/s $\color{#d91a1a}-1.73\%$
test_unbind 74.8064ms 6.1297ms 163.1411 Ops/s 190.4981 Ops/s $\textbf{\color{#d91a1a}-14.36\%}$
test_full_like 17.7217ms 11.1250ms 89.8879 Ops/s 91.0798 Ops/s $\color{#d91a1a}-1.31\%$
test_zeros_like 8.5218ms 5.6388ms 177.3441 Ops/s 181.7638 Ops/s $\color{#d91a1a}-2.43\%$
test_ones_like 13.0622ms 6.2063ms 161.1257 Ops/s 164.3370 Ops/s $\color{#d91a1a}-1.95\%$
test_clone 11.6775ms 7.4011ms 135.1145 Ops/s 129.8253 Ops/s $\color{#35bf28}+4.07\%$
test_squeeze 60.1930μs 14.1203μs 70.8202 KOps/s 72.3273 KOps/s $\color{#d91a1a}-2.08\%$
test_unsqueeze 0.1151ms 60.8481μs 16.4344 KOps/s 16.5537 KOps/s $\color{#d91a1a}-0.72\%$
test_split 0.2167ms 0.1113ms 8.9834 KOps/s 8.4940 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_permute 0.1893ms 0.1258ms 7.9516 KOps/s 7.9465 KOps/s $\color{#35bf28}+0.06\%$
test_stack 27.7363ms 20.7999ms 48.0773 Ops/s 44.7744 Ops/s $\textbf{\color{#35bf28}+7.38\%}$
test_cat 33.9267ms 21.3976ms 46.7343 Ops/s 44.4662 Ops/s $\textbf{\color{#35bf28}+5.10\%}$

Copy link

github-actions bot commented Jun 25, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}28$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 33.2310μs 13.5662μs 73.7125 KOps/s 83.3950 KOps/s $\textbf{\color{#d91a1a}-11.61\%}$
test_plain_set_stack_nested 28.6610μs 13.6059μs 73.4976 KOps/s 81.5875 KOps/s $\textbf{\color{#d91a1a}-9.92\%}$
test_plain_set_nested_inplace 37.9410μs 14.9620μs 66.8359 KOps/s 75.0313 KOps/s $\textbf{\color{#d91a1a}-10.92\%}$
test_plain_set_stack_nested_inplace 34.5900μs 14.9730μs 66.7867 KOps/s 74.5433 KOps/s $\textbf{\color{#d91a1a}-10.41\%}$
test_items 25.8000μs 4.6680μs 214.2244 KOps/s 207.9323 KOps/s $\color{#35bf28}+3.03\%$
test_items_nested 0.3637ms 0.3366ms 2.9713 KOps/s 2.9901 KOps/s $\color{#d91a1a}-0.63\%$
test_items_nested_locked 0.5182ms 0.3377ms 2.9616 KOps/s 2.9298 KOps/s $\color{#35bf28}+1.08\%$
test_items_nested_leaf 0.1043ms 83.2033μs 12.0187 KOps/s 12.2562 KOps/s $\color{#d91a1a}-1.94\%$
test_items_stack_nested 0.3909ms 0.3398ms 2.9429 KOps/s 2.9893 KOps/s $\color{#d91a1a}-1.55\%$
test_items_stack_nested_leaf 0.1019ms 84.2188μs 11.8738 KOps/s 12.0136 KOps/s $\color{#d91a1a}-1.16\%$
test_items_stack_nested_locked 0.3906ms 0.3413ms 2.9299 KOps/s 2.9545 KOps/s $\color{#d91a1a}-0.83\%$
test_keys 19.4690μs 4.3700μs 228.8351 KOps/s 229.4254 KOps/s $\color{#d91a1a}-0.26\%$
test_keys_nested 99.2510μs 66.8735μs 14.9536 KOps/s 15.0372 KOps/s $\color{#d91a1a}-0.56\%$
test_keys_nested_locked 2.2080ms 71.3377μs 14.0178 KOps/s 13.9912 KOps/s $\color{#35bf28}+0.19\%$
test_keys_nested_leaf 79.2300μs 57.2263μs 17.4745 KOps/s 17.4281 KOps/s $\color{#35bf28}+0.27\%$
test_keys_stack_nested 83.3010μs 66.8179μs 14.9660 KOps/s 14.9979 KOps/s $\color{#d91a1a}-0.21\%$
test_keys_stack_nested_leaf 75.5520μs 57.6293μs 17.3523 KOps/s 17.4967 KOps/s $\color{#d91a1a}-0.83\%$
test_keys_stack_nested_locked 94.1510μs 72.0801μs 13.8735 KOps/s 14.1117 KOps/s $\color{#d91a1a}-1.69\%$
test_values 8.6870μs 1.8163μs 550.5657 KOps/s 554.7816 KOps/s $\color{#d91a1a}-0.76\%$
test_values_nested 58.9700μs 35.4805μs 28.1845 KOps/s 28.5509 KOps/s $\color{#d91a1a}-1.28\%$
test_values_nested_locked 62.9400μs 37.5731μs 26.6148 KOps/s 26.7301 KOps/s $\color{#d91a1a}-0.43\%$
test_values_nested_leaf 50.3900μs 31.8022μs 31.4443 KOps/s 31.7758 KOps/s $\color{#d91a1a}-1.04\%$
test_values_stack_nested 76.1200μs 36.3199μs 27.5331 KOps/s 27.7163 KOps/s $\color{#d91a1a}-0.66\%$
test_values_stack_nested_leaf 60.3410μs 32.3118μs 30.9485 KOps/s 30.9990 KOps/s $\color{#d91a1a}-0.16\%$
test_values_stack_nested_locked 63.4200μs 38.3462μs 26.0782 KOps/s 26.4579 KOps/s $\color{#d91a1a}-1.44\%$
test_membership 3.3700μs 0.7225μs 1.3842 MOps/s 1.4078 MOps/s $\color{#d91a1a}-1.68\%$
test_membership_nested 28.7910μs 2.6202μs 381.6528 KOps/s 387.8971 KOps/s $\color{#d91a1a}-1.61\%$
test_membership_nested_leaf 18.3710μs 2.6361μs 379.3537 KOps/s 385.9314 KOps/s $\color{#d91a1a}-1.70\%$
test_membership_stacked_nested 19.6900μs 2.6518μs 377.0984 KOps/s 386.1520 KOps/s $\color{#d91a1a}-2.34\%$
test_membership_stacked_nested_leaf 37.8400μs 2.6489μs 377.5185 KOps/s 384.6815 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_nested_last 29.5000μs 3.2012μs 312.3845 KOps/s 320.0066 KOps/s $\color{#d91a1a}-2.38\%$
test_membership_nested_leaf_last 31.9800μs 3.1812μs 314.3473 KOps/s 319.1405 KOps/s $\color{#d91a1a}-1.50\%$
test_membership_stacked_nested_last 19.7800μs 3.1710μs 315.3555 KOps/s 234.6146 KOps/s $\textbf{\color{#35bf28}+34.41\%}$
test_membership_stacked_nested_leaf_last 33.8210μs 3.1682μs 315.6403 KOps/s 236.0892 KOps/s $\textbf{\color{#35bf28}+33.70\%}$
test_nested_getleaf 25.0300μs 8.3633μs 119.5706 KOps/s 120.1045 KOps/s $\color{#d91a1a}-0.44\%$
test_nested_get 45.6410μs 7.8876μs 126.7810 KOps/s 126.8527 KOps/s $\color{#d91a1a}-0.06\%$
test_stacked_getleaf 38.0100μs 8.3946μs 119.1246 KOps/s 118.2514 KOps/s $\color{#35bf28}+0.74\%$
test_stacked_get 40.1410μs 7.8802μs 126.9004 KOps/s 126.0947 KOps/s $\color{#35bf28}+0.64\%$
test_nested_getitemleaf 35.0000μs 8.5079μs 117.5374 KOps/s 118.2628 KOps/s $\color{#d91a1a}-0.61\%$
test_nested_getitem 31.0310μs 8.0482μs 124.2519 KOps/s 123.6422 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_getitemleaf 25.9010μs 8.5920μs 116.3875 KOps/s 116.7206 KOps/s $\color{#d91a1a}-0.29\%$
test_stacked_getitem 35.5010μs 8.0715μs 123.8930 KOps/s 124.7772 KOps/s $\color{#d91a1a}-0.71\%$
test_lock_nested 58.8248ms 0.4062ms 2.4616 KOps/s 2.5009 KOps/s $\color{#d91a1a}-1.58\%$
test_lock_stack_nested 0.3481ms 0.3021ms 3.3097 KOps/s 3.3433 KOps/s $\color{#d91a1a}-1.00\%$
test_unlock_nested 61.6587ms 0.4115ms 2.4302 KOps/s 2.4679 KOps/s $\color{#d91a1a}-1.53\%$
test_unlock_stack_nested 0.3529ms 0.3101ms 3.2247 KOps/s 3.2499 KOps/s $\color{#d91a1a}-0.78\%$
test_flatten_speed 0.3021ms 0.1017ms 9.8376 KOps/s 9.9973 KOps/s $\color{#d91a1a}-1.60\%$
test_unflatten_speed 0.3578ms 0.2903ms 3.4452 KOps/s 3.3997 KOps/s $\color{#35bf28}+1.34\%$
test_common_ops 1.0621ms 0.6074ms 1.6464 KOps/s 1.7973 KOps/s $\textbf{\color{#d91a1a}-8.40\%}$
test_creation 33.2300μs 1.6545μs 604.4077 KOps/s 609.4030 KOps/s $\color{#d91a1a}-0.82\%$
test_creation_empty 38.4300μs 10.1478μs 98.5435 KOps/s 140.8598 KOps/s $\textbf{\color{#d91a1a}-30.04\%}$
test_creation_nested_1 34.6300μs 11.9276μs 83.8391 KOps/s 112.2319 KOps/s $\textbf{\color{#d91a1a}-25.30\%}$
test_creation_nested_2 32.6900μs 14.1901μs 70.4716 KOps/s 89.7790 KOps/s $\textbf{\color{#d91a1a}-21.51\%}$
test_clone 81.3810μs 11.6947μs 85.5088 KOps/s 84.4036 KOps/s $\color{#35bf28}+1.31\%$
test_getitem[int] 36.8600μs 10.9560μs 91.2745 KOps/s 92.5361 KOps/s $\color{#d91a1a}-1.36\%$
test_getitem[slice_int] 52.1310μs 20.6762μs 48.3649 KOps/s 48.5250 KOps/s $\color{#d91a1a}-0.33\%$
test_getitem[range] 64.0820μs 46.7890μs 21.3725 KOps/s 20.7663 KOps/s $\color{#35bf28}+2.92\%$
test_getitem[tuple] 56.4010μs 18.7530μs 53.3249 KOps/s 54.3952 KOps/s $\color{#d91a1a}-1.97\%$
test_getitem[list] 0.1427ms 33.7724μs 29.6099 KOps/s 29.6511 KOps/s $\color{#d91a1a}-0.14\%$
test_setitem_dim[int] 54.2600μs 32.1359μs 31.1179 KOps/s 37.7243 KOps/s $\textbf{\color{#d91a1a}-17.51\%}$
test_setitem_dim[slice_int] 76.2310μs 51.9384μs 19.2536 KOps/s 20.9877 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_setitem_dim[range] 0.1069ms 70.9950μs 14.0855 KOps/s 14.8046 KOps/s $\color{#d91a1a}-4.86\%$
test_setitem_dim[tuple] 70.8310μs 46.1334μs 21.6763 KOps/s 23.9616 KOps/s $\textbf{\color{#d91a1a}-9.54\%}$
test_setitem 62.7400μs 17.4831μs 57.1981 KOps/s 64.2658 KOps/s $\textbf{\color{#d91a1a}-11.00\%}$
test_set 43.5810μs 16.7761μs 59.6087 KOps/s 65.9859 KOps/s $\textbf{\color{#d91a1a}-9.66\%}$
test_set_shared 1.2277ms 0.1002ms 9.9815 KOps/s 9.4426 KOps/s $\textbf{\color{#35bf28}+5.71\%}$
test_update 64.9000μs 20.3655μs 49.1026 KOps/s 57.6639 KOps/s $\textbf{\color{#d91a1a}-14.85\%}$
test_update_nested 59.7620μs 25.5348μs 39.1623 KOps/s 44.1737 KOps/s $\textbf{\color{#d91a1a}-11.34\%}$
test_update__nested 48.5100μs 22.4008μs 44.6412 KOps/s 43.6775 KOps/s $\color{#35bf28}+2.21\%$
test_set_nested 48.5910μs 17.8434μs 56.0432 KOps/s 60.3646 KOps/s $\textbf{\color{#d91a1a}-7.16\%}$
test_set_nested_new 43.9910μs 20.6221μs 48.4918 KOps/s 51.5140 KOps/s $\textbf{\color{#d91a1a}-5.87\%}$
test_select 63.0320μs 33.2793μs 30.0487 KOps/s 30.7363 KOps/s $\color{#d91a1a}-2.24\%$
test_select_nested 86.7220μs 55.6506μs 17.9693 KOps/s 18.4641 KOps/s $\color{#d91a1a}-2.68\%$
test_exclude_nested 0.1517ms 0.1101ms 9.0866 KOps/s 9.1189 KOps/s $\color{#d91a1a}-0.35\%$
test_empty[True] 0.4127ms 0.3481ms 2.8727 KOps/s 2.9031 KOps/s $\color{#d91a1a}-1.05\%$
test_empty[False] 3.1670μs 0.9530μs 1.0493 MOps/s 1.0692 MOps/s $\color{#d91a1a}-1.86\%$
test_to 0.1010ms 76.2662μs 13.1120 KOps/s 13.1711 KOps/s $\color{#d91a1a}-0.45\%$
test_to_nonblocking 93.3210μs 61.8057μs 16.1797 KOps/s 16.0619 KOps/s $\color{#35bf28}+0.73\%$
test_unbind_speed 1.4791ms 0.2687ms 3.7216 KOps/s 3.7851 KOps/s $\color{#d91a1a}-1.68\%$
test_unbind_speed_stack0 0.3159ms 0.2684ms 3.7259 KOps/s 3.8319 KOps/s $\color{#d91a1a}-2.76\%$
test_unbind_speed_stack1 75.7465ms 0.8042ms 1.2435 KOps/s 1.2626 KOps/s $\color{#d91a1a}-1.52\%$
test_split 76.4065ms 1.6637ms 601.0695 Ops/s 598.0208 Ops/s $\color{#35bf28}+0.51\%$
test_chunk 76.1278ms 1.6612ms 601.9825 Ops/s 600.0465 Ops/s $\color{#35bf28}+0.32\%$
test_creation[device0] 0.1279ms 58.2331μs 17.1724 KOps/s 16.8620 KOps/s $\color{#35bf28}+1.84\%$
test_creation_from_tensor 0.1286ms 54.7347μs 18.2700 KOps/s 18.4779 KOps/s $\color{#d91a1a}-1.13\%$
test_add_one[memmap_tensor0] 65.4120μs 6.9965μs 142.9283 KOps/s 143.8599 KOps/s $\color{#d91a1a}-0.65\%$
test_contiguous[memmap_tensor0] 23.5910μs 0.7143μs 1.4000 MOps/s 1.4758 MOps/s $\textbf{\color{#d91a1a}-5.13\%}$
test_stack[memmap_tensor0] 36.6800μs 5.0449μs 198.2192 KOps/s 201.9077 KOps/s $\color{#d91a1a}-1.83\%$
test_memmaptd_index 0.5136ms 0.2926ms 3.4172 KOps/s 3.4302 KOps/s $\color{#d91a1a}-0.38\%$
test_memmaptd_index_astensor 0.6573ms 0.3653ms 2.7371 KOps/s 2.7875 KOps/s $\color{#d91a1a}-1.81\%$
test_memmaptd_index_op 1.2978ms 0.6930ms 1.4429 KOps/s 1.5946 KOps/s $\textbf{\color{#d91a1a}-9.51\%}$
test_serialize_model 0.1816s 0.1102s 9.0739 Ops/s 8.4651 Ops/s $\textbf{\color{#35bf28}+7.19\%}$
test_serialize_model_pickle 1.3487s 1.2360s 0.8090 Ops/s 0.8087 Ops/s $\color{#35bf28}+0.04\%$
test_serialize_weights 0.1803s 0.1088s 9.1936 Ops/s 8.8520 Ops/s $\color{#35bf28}+3.86\%$
test_serialize_weights_returnearly 0.2729s 0.1039s 9.6218 Ops/s 9.8012 Ops/s $\color{#d91a1a}-1.83\%$
test_serialize_weights_pickle 1.3868s 1.2525s 0.7984 Ops/s 0.7970 Ops/s $\color{#35bf28}+0.18\%$
test_reshape_pytree 0.2465ms 26.3664μs 37.9271 KOps/s 37.7697 KOps/s $\color{#35bf28}+0.42\%$
test_reshape_td 52.7900μs 31.8387μs 31.4084 KOps/s 31.4455 KOps/s $\color{#d91a1a}-0.12\%$
test_view_pytree 0.2276ms 26.0627μs 38.3691 KOps/s 37.7970 KOps/s $\color{#35bf28}+1.51\%$
test_view_td 58.7410μs 35.9768μs 27.7957 KOps/s 27.1878 KOps/s $\color{#35bf28}+2.24\%$
test_unbind_pytree 57.7100μs 32.0707μs 31.1811 KOps/s 30.7119 KOps/s $\color{#35bf28}+1.53\%$
test_unbind_td 0.4209ms 41.0300μs 24.3724 KOps/s 24.7660 KOps/s $\color{#d91a1a}-1.59\%$
test_split_pytree 57.4410μs 35.2803μs 28.3444 KOps/s 28.3405 KOps/s $\color{#35bf28}+0.01\%$
test_split_td 0.1056ms 40.8907μs 24.4555 KOps/s 25.3880 KOps/s $\color{#d91a1a}-3.67\%$
test_add_pytree 61.8410μs 37.7558μs 26.4860 KOps/s 26.5476 KOps/s $\color{#d91a1a}-0.23\%$
test_add_td 0.2676ms 56.1302μs 17.8157 KOps/s 22.1084 KOps/s $\textbf{\color{#d91a1a}-19.42\%}$
test_distributed 0.1929ms 66.5051μs 15.0364 KOps/s 14.8831 KOps/s $\color{#35bf28}+1.03\%$
test_tdmodule 30.3600μs 15.7616μs 63.4453 KOps/s 70.5356 KOps/s $\textbf{\color{#d91a1a}-10.05\%}$
test_tdmodule_dispatch 56.1610μs 30.7455μs 32.5251 KOps/s 36.9676 KOps/s $\textbf{\color{#d91a1a}-12.02\%}$
test_tdseq 40.2000μs 17.6321μs 56.7149 KOps/s 62.5041 KOps/s $\textbf{\color{#d91a1a}-9.26\%}$
test_tdseq_dispatch 0.2340ms 34.5700μs 28.9269 KOps/s 32.4874 KOps/s $\textbf{\color{#d91a1a}-10.96\%}$
test_instantiation_functorch 1.7359ms 1.5458ms 646.8973 Ops/s 649.7274 Ops/s $\color{#d91a1a}-0.44\%$
test_instantiation_td 1.5432ms 1.0417ms 959.9343 Ops/s 953.3234 Ops/s $\color{#35bf28}+0.69\%$
test_exec_functorch 0.3607ms 0.1517ms 6.5917 KOps/s 6.6281 KOps/s $\color{#d91a1a}-0.55\%$
test_exec_functional_call 0.1736ms 0.1356ms 7.3730 KOps/s 7.1842 KOps/s $\color{#35bf28}+2.63\%$
test_exec_td 0.3331ms 0.1365ms 7.3241 KOps/s 7.3611 KOps/s $\color{#d91a1a}-0.50\%$
test_exec_td_decorator 0.7415ms 0.2073ms 4.8249 KOps/s 4.8525 KOps/s $\color{#d91a1a}-0.57\%$
test_vmap_mlp_speed[True-True] 0.7720ms 0.5804ms 1.7230 KOps/s 1.7550 KOps/s $\color{#d91a1a}-1.82\%$
test_vmap_mlp_speed[True-False] 0.8190ms 0.5783ms 1.7292 KOps/s 1.7650 KOps/s $\color{#d91a1a}-2.03\%$
test_vmap_mlp_speed[False-True] 0.6947ms 0.4982ms 2.0071 KOps/s 1.9639 KOps/s $\color{#35bf28}+2.20\%$
test_vmap_mlp_speed[False-False] 0.6963ms 0.5030ms 1.9879 KOps/s 1.9255 KOps/s $\color{#35bf28}+3.24\%$
test_vmap_mlp_speed_decorator[True-True] 1.1316ms 0.6348ms 1.5753 KOps/s 1.5749 KOps/s $\color{#35bf28}+0.03\%$
test_vmap_mlp_speed_decorator[True-False] 0.8230ms 0.6348ms 1.5753 KOps/s 1.5782 KOps/s $\color{#d91a1a}-0.19\%$
test_vmap_mlp_speed_decorator[False-True] 0.8814ms 0.5777ms 1.7309 KOps/s 1.7289 KOps/s $\color{#35bf28}+0.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.8107ms 0.5611ms 1.7822 KOps/s 1.6827 KOps/s $\textbf{\color{#35bf28}+5.91\%}$
test_vmap_transformer_speed[True-True] 7.6620ms 7.4639ms 133.9790 Ops/s 126.9154 Ops/s $\textbf{\color{#35bf28}+5.57\%}$
test_vmap_transformer_speed[True-False] 7.6857ms 7.4443ms 134.3303 Ops/s 129.3541 Ops/s $\color{#35bf28}+3.85\%$
test_vmap_transformer_speed[False-True] 7.5400ms 7.3692ms 135.6991 Ops/s 130.6223 Ops/s $\color{#35bf28}+3.89\%$
test_vmap_transformer_speed[False-False] 7.5936ms 7.3734ms 135.6232 Ops/s 130.2411 Ops/s $\color{#35bf28}+4.13\%$
test_vmap_transformer_speed_decorator[True-True] 18.3660ms 18.0961ms 55.2605 Ops/s 53.6315 Ops/s $\color{#35bf28}+3.04\%$
test_vmap_transformer_speed_decorator[True-False] 18.3861ms 18.1234ms 55.1774 Ops/s 53.6449 Ops/s $\color{#35bf28}+2.86\%$
test_vmap_transformer_speed_decorator[False-True] 18.1394ms 18.0227ms 55.4856 Ops/s 53.9365 Ops/s $\color{#35bf28}+2.87\%$
test_vmap_transformer_speed_decorator[False-False] 18.6512ms 17.9843ms 55.6040 Ops/s 53.9914 Ops/s $\color{#35bf28}+2.99\%$
test_to_module_speed[True] 1.7921ms 1.5360ms 651.0611 Ops/s 635.8809 Ops/s $\color{#35bf28}+2.39\%$
test_to_module_speed[False] 1.7130ms 1.5138ms 660.5840 Ops/s 646.0900 Ops/s $\color{#35bf28}+2.24\%$
test_tc_init 48.4000μs 28.0543μs 35.6452 KOps/s 45.1343 KOps/s $\textbf{\color{#d91a1a}-21.02\%}$
test_tc_init_nested 0.2803ms 57.4651μs 17.4019 KOps/s 22.4039 KOps/s $\textbf{\color{#d91a1a}-22.33\%}$
test_tc_first_layer_tensor 6.1226μs 0.3565μs 2.8054 MOps/s 2.7745 MOps/s $\color{#35bf28}+1.11\%$
test_tc_first_layer_nontensor 16.2326μs 0.3849μs 2.5980 MOps/s 2.5569 MOps/s $\color{#35bf28}+1.61\%$
test_tc_second_layer_tensor 41.2628μs 0.9635μs 1.0379 MOps/s 1.0171 MOps/s $\color{#35bf28}+2.05\%$
test_tc_second_layer_nontensor 11.2466μs 0.7931μs 1.2609 MOps/s 1.2126 MOps/s $\color{#35bf28}+3.98\%$
test_unbind 0.1061s 9.1107ms 109.7612 Ops/s 121.2908 Ops/s $\textbf{\color{#d91a1a}-9.51\%}$
test_full_like 11.5544ms 11.1641ms 89.5728 Ops/s 75.9560 Ops/s $\textbf{\color{#35bf28}+17.93\%}$
test_zeros_like 8.4378ms 7.9201ms 126.2617 Ops/s 127.1458 Ops/s $\color{#d91a1a}-0.70\%$
test_ones_like 8.6782ms 7.9518ms 125.7570 Ops/s 126.8979 Ops/s $\color{#d91a1a}-0.90\%$
test_clone 10.1409ms 9.3540ms 106.9064 Ops/s 106.6972 Ops/s $\color{#35bf28}+0.20\%$
test_squeeze 71.0400μs 11.2952μs 88.5329 KOps/s 89.5777 KOps/s $\color{#d91a1a}-1.17\%$
test_unsqueeze 0.3198ms 55.0650μs 18.1604 KOps/s 19.1503 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_split 0.2027ms 98.1774μs 10.1856 KOps/s 10.1744 KOps/s $\color{#35bf28}+0.11\%$
test_permute 0.4014ms 0.1134ms 8.8173 KOps/s 9.1131 KOps/s $\color{#d91a1a}-3.25\%$
test_stack 27.3211ms 26.8643ms 37.2241 Ops/s 37.2306 Ops/s $\color{#d91a1a}-0.02\%$
test_cat 27.3210ms 26.8892ms 37.1896 Ops/s 37.1868 Ops/s $+0.01\%$

# Conflicts:
#	tensordict/utils.py
@vmoens vmoens merged commit 15ff3a4 into main Jun 25, 2024
18 of 20 checks passed
@vmoens vmoens deleted the better-dtypes branch June 25, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support torch.uint16 dtype
2 participants