Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Fewer syncs during calls to to #819

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 18, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2024
Copy link

github-actions bot commented Jun 18, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 27.4520μs 16.9719μs 58.9210 KOps/s 60.1777 KOps/s $\color{#d91a1a}-2.09\%$
test_plain_set_stack_nested 47.1480μs 17.9692μs 55.6508 KOps/s 58.2602 KOps/s $\color{#d91a1a}-4.48\%$
test_plain_set_nested_inplace 48.0300μs 20.0397μs 49.9008 KOps/s 52.1245 KOps/s $\color{#d91a1a}-4.27\%$
test_plain_set_stack_nested_inplace 45.1840μs 20.0824μs 49.7948 KOps/s 51.4836 KOps/s $\color{#d91a1a}-3.28\%$
test_items 21.4900μs 2.6558μs 376.5401 KOps/s 370.7209 KOps/s $\color{#35bf28}+1.57\%$
test_items_nested 1.2875ms 0.2697ms 3.7075 KOps/s 3.7934 KOps/s $\color{#d91a1a}-2.26\%$
test_items_nested_locked 0.4232ms 0.2713ms 3.6857 KOps/s 3.7998 KOps/s $\color{#d91a1a}-3.00\%$
test_items_nested_leaf 0.1498ms 77.5556μs 12.8940 KOps/s 13.0905 KOps/s $\color{#d91a1a}-1.50\%$
test_items_stack_nested 0.3307ms 0.2733ms 3.6597 KOps/s 3.7117 KOps/s $\color{#d91a1a}-1.40\%$
test_items_stack_nested_leaf 0.1516ms 79.5025μs 12.5782 KOps/s 12.6924 KOps/s $\color{#d91a1a}-0.90\%$
test_items_stack_nested_locked 0.4243ms 0.2708ms 3.6926 KOps/s 3.7499 KOps/s $\color{#d91a1a}-1.53\%$
test_keys 49.6840μs 3.7922μs 263.6985 KOps/s 257.4315 KOps/s $\color{#35bf28}+2.43\%$
test_keys_nested 0.2848ms 0.1413ms 7.0784 KOps/s 7.2194 KOps/s $\color{#d91a1a}-1.95\%$
test_keys_nested_locked 0.7179ms 0.1459ms 6.8548 KOps/s 7.0620 KOps/s $\color{#d91a1a}-2.93\%$
test_keys_nested_leaf 0.2355ms 0.1205ms 8.2986 KOps/s 8.6306 KOps/s $\color{#d91a1a}-3.85\%$
test_keys_stack_nested 0.2034ms 0.1399ms 7.1471 KOps/s 7.2552 KOps/s $\color{#d91a1a}-1.49\%$
test_keys_stack_nested_leaf 0.2410ms 0.1202ms 8.3214 KOps/s 8.6069 KOps/s $\color{#d91a1a}-3.32\%$
test_keys_stack_nested_locked 0.2553ms 0.1442ms 6.9342 KOps/s 7.0825 KOps/s $\color{#d91a1a}-2.09\%$
test_values 8.7224μs 1.1674μs 856.5931 KOps/s 876.7103 KOps/s $\color{#d91a1a}-2.29\%$
test_values_nested 94.7880μs 51.2814μs 19.5002 KOps/s 19.4146 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested_locked 0.1052ms 51.7041μs 19.3408 KOps/s 19.5007 KOps/s $\color{#d91a1a}-0.82\%$
test_values_nested_leaf 93.4070μs 46.4710μs 21.5188 KOps/s 21.4449 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested 94.8780μs 52.4486μs 19.0663 KOps/s 19.1127 KOps/s $\color{#d91a1a}-0.24\%$
test_values_stack_nested_leaf 98.1440μs 46.6711μs 21.4265 KOps/s 21.2884 KOps/s $\color{#35bf28}+0.65\%$
test_values_stack_nested_locked 0.1074ms 52.1143μs 19.1886 KOps/s 19.2780 KOps/s $\color{#d91a1a}-0.46\%$
test_membership 30.3770μs 1.3611μs 734.6876 KOps/s 754.4887 KOps/s $\color{#d91a1a}-2.62\%$
test_membership_nested 32.5510μs 3.4793μs 287.4120 KOps/s 294.1604 KOps/s $\color{#d91a1a}-2.29\%$
test_membership_nested_leaf 23.0430μs 3.5253μs 283.6643 KOps/s 281.6909 KOps/s $\color{#35bf28}+0.70\%$
test_membership_stacked_nested 31.8200μs 3.4933μs 286.2582 KOps/s 296.6883 KOps/s $\color{#d91a1a}-3.52\%$
test_membership_stacked_nested_leaf 41.0370μs 3.4425μs 290.4903 KOps/s 295.0962 KOps/s $\color{#d91a1a}-1.56\%$
test_membership_nested_last 45.0620μs 4.2040μs 237.8698 KOps/s 241.1007 KOps/s $\color{#d91a1a}-1.34\%$
test_membership_nested_leaf_last 47.5250μs 4.2421μs 235.7331 KOps/s 237.8992 KOps/s $\color{#d91a1a}-0.91\%$
test_membership_stacked_nested_last 40.3450μs 4.8098μs 207.9091 KOps/s 187.4677 KOps/s $\textbf{\color{#35bf28}+10.90\%}$
test_membership_stacked_nested_leaf_last 44.8740μs 4.8588μs 205.8114 KOps/s 189.6819 KOps/s $\textbf{\color{#35bf28}+8.50\%}$
test_nested_getleaf 47.2690μs 11.3476μs 88.1244 KOps/s 95.5364 KOps/s $\textbf{\color{#d91a1a}-7.76\%}$
test_nested_get 39.2930μs 10.7850μs 92.7211 KOps/s 99.5649 KOps/s $\textbf{\color{#d91a1a}-6.87\%}$
test_stacked_getleaf 31.6500μs 11.3625μs 88.0085 KOps/s 97.0107 KOps/s $\textbf{\color{#d91a1a}-9.28\%}$
test_stacked_get 50.3750μs 10.7035μs 93.4277 KOps/s 101.9007 KOps/s $\textbf{\color{#d91a1a}-8.31\%}$
test_nested_getitemleaf 43.0710μs 11.7907μs 84.8124 KOps/s 89.7102 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_nested_getitem 34.9760μs 11.0513μs 90.4868 KOps/s 97.1827 KOps/s $\textbf{\color{#d91a1a}-6.89\%}$
test_stacked_getitemleaf 41.9290μs 11.7690μs 84.9689 KOps/s 90.7619 KOps/s $\textbf{\color{#d91a1a}-6.38\%}$
test_stacked_getitem 36.9290μs 11.0143μs 90.7911 KOps/s 98.1992 KOps/s $\textbf{\color{#d91a1a}-7.54\%}$
test_lock_nested 0.7709ms 0.3437ms 2.9096 KOps/s 2.9384 KOps/s $\color{#d91a1a}-0.98\%$
test_lock_stack_nested 0.4281ms 0.3105ms 3.2207 KOps/s 3.2712 KOps/s $\color{#d91a1a}-1.54\%$
test_unlock_nested 0.7538ms 0.3451ms 2.8974 KOps/s 2.9018 KOps/s $\color{#d91a1a}-0.15\%$
test_unlock_stack_nested 0.3772ms 0.3177ms 3.1479 KOps/s 3.2020 KOps/s $\color{#d91a1a}-1.69\%$
test_flatten_speed 0.5498ms 95.8806μs 10.4296 KOps/s 10.3721 KOps/s $\color{#35bf28}+0.56\%$
test_unflatten_speed 0.6185ms 0.4246ms 2.3552 KOps/s 2.4576 KOps/s $\color{#d91a1a}-4.17\%$
test_common_ops 3.3784ms 0.7144ms 1.3999 KOps/s 1.4228 KOps/s $\color{#d91a1a}-1.61\%$
test_creation 15.5390μs 1.8912μs 528.7603 KOps/s 525.6850 KOps/s $\color{#35bf28}+0.58\%$
test_creation_empty 34.7660μs 10.5216μs 95.0427 KOps/s 96.8261 KOps/s $\color{#d91a1a}-1.84\%$
test_creation_nested_1 57.8490μs 13.3237μs 75.0544 KOps/s 76.5377 KOps/s $\color{#d91a1a}-1.94\%$
test_creation_nested_2 37.5710μs 16.5537μs 60.4093 KOps/s 60.7456 KOps/s $\color{#d91a1a}-0.55\%$
test_clone 97.1220μs 13.5201μs 73.9637 KOps/s 74.5563 KOps/s $\color{#d91a1a}-0.79\%$
test_getitem[int] 35.1160μs 11.4210μs 87.5582 KOps/s 85.2826 KOps/s $\color{#35bf28}+2.67\%$
test_getitem[slice_int] 64.0800μs 21.9760μs 45.5042 KOps/s 42.4712 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_getitem[range] 78.8080μs 58.8326μs 16.9974 KOps/s 16.7296 KOps/s $\color{#35bf28}+1.60\%$
test_getitem[tuple] 48.0610μs 18.5085μs 54.0292 KOps/s 51.8040 KOps/s $\color{#35bf28}+4.30\%$
test_getitem[list] 0.1202ms 41.0517μs 24.3595 KOps/s 23.8675 KOps/s $\color{#35bf28}+2.06\%$
test_setitem_dim[int] 64.5610μs 34.4546μs 29.0237 KOps/s 27.0361 KOps/s $\textbf{\color{#35bf28}+7.35\%}$
test_setitem_dim[slice_int] 99.0560μs 59.8697μs 16.7029 KOps/s 15.3170 KOps/s $\textbf{\color{#35bf28}+9.05\%}$
test_setitem_dim[range] 0.1288ms 83.0458μs 12.0415 KOps/s 11.5278 KOps/s $\color{#35bf28}+4.46\%$
test_setitem_dim[tuple] 91.8320μs 49.4098μs 20.2389 KOps/s 19.1029 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_setitem 58.9410μs 20.0971μs 49.7584 KOps/s 49.6156 KOps/s $\color{#35bf28}+0.29\%$
test_set 62.3170μs 19.7327μs 50.6774 KOps/s 50.9618 KOps/s $\color{#d91a1a}-0.56\%$
test_set_shared 1.1408ms 0.1413ms 7.0750 KOps/s 6.9247 KOps/s $\color{#35bf28}+2.17\%$
test_update 0.1057ms 21.6336μs 46.2244 KOps/s 47.2356 KOps/s $\color{#d91a1a}-2.14\%$
test_update_nested 67.6770μs 30.3228μs 32.9785 KOps/s 33.2471 KOps/s $\color{#d91a1a}-0.81\%$
test_update__nested 55.8050μs 26.1034μs 38.3091 KOps/s 39.7347 KOps/s $\color{#d91a1a}-3.59\%$
test_set_nested 54.1620μs 22.2873μs 44.8685 KOps/s 46.3313 KOps/s $\color{#d91a1a}-3.16\%$
test_set_nested_new 61.1050μs 26.3505μs 37.9500 KOps/s 38.7554 KOps/s $\color{#d91a1a}-2.08\%$
test_select 0.9548ms 41.9560μs 23.8345 KOps/s 23.4570 KOps/s $\color{#35bf28}+1.61\%$
test_select_nested 0.1221ms 60.2343μs 16.6018 KOps/s 16.3883 KOps/s $\color{#35bf28}+1.30\%$
test_exclude_nested 0.1954ms 0.1202ms 8.3216 KOps/s 8.1897 KOps/s $\color{#35bf28}+1.61\%$
test_empty[True] 0.4738ms 0.3966ms 2.5212 KOps/s 2.5124 KOps/s $\color{#35bf28}+0.35\%$
test_empty[False] 7.0732μs 1.1717μs 853.4628 KOps/s 871.1122 KOps/s $\color{#d91a1a}-2.03\%$
test_unbind_speed 0.4084ms 0.2539ms 3.9392 KOps/s 3.9213 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_speed_stack0 0.3485ms 0.2527ms 3.9567 KOps/s 4.0123 KOps/s $\color{#d91a1a}-1.39\%$
test_unbind_speed_stack1 68.6579ms 0.7236ms 1.3819 KOps/s 1.3934 KOps/s $\color{#d91a1a}-0.82\%$
test_split 63.6140ms 1.5800ms 632.9043 Ops/s 619.8271 Ops/s $\color{#35bf28}+2.11\%$
test_chunk 63.9552ms 1.5890ms 629.3255 Ops/s 620.1725 Ops/s $\color{#35bf28}+1.48\%$
test_creation[device0] 0.1530ms 83.5224μs 11.9728 KOps/s 11.7932 KOps/s $\color{#35bf28}+1.52\%$
test_creation_from_tensor 0.2163ms 84.0522μs 11.8974 KOps/s 11.6311 KOps/s $\color{#35bf28}+2.29\%$
test_add_one[memmap_tensor0] 65.8630μs 5.3638μs 186.4352 KOps/s 187.9410 KOps/s $\color{#d91a1a}-0.80\%$
test_contiguous[memmap_tensor0] 6.7820μs 0.6369μs 1.5701 MOps/s 1.6052 MOps/s $\color{#d91a1a}-2.19\%$
test_stack[memmap_tensor0] 22.8120μs 3.6477μs 274.1436 KOps/s 281.5431 KOps/s $\color{#d91a1a}-2.63\%$
test_memmaptd_index 0.9910ms 0.2554ms 3.9160 KOps/s 3.9734 KOps/s $\color{#d91a1a}-1.44\%$
test_memmaptd_index_astensor 0.7441ms 0.3288ms 3.0415 KOps/s 3.0633 KOps/s $\color{#d91a1a}-0.71\%$
test_memmaptd_index_op 1.1634ms 0.6124ms 1.6330 KOps/s 1.6728 KOps/s $\color{#d91a1a}-2.38\%$
test_serialize_model 0.1723s 0.1135s 8.8127 Ops/s 8.6590 Ops/s $\color{#35bf28}+1.77\%$
test_serialize_model_pickle 0.4471s 0.3796s 2.6344 Ops/s 2.6218 Ops/s $\color{#35bf28}+0.48\%$
test_serialize_weights 0.1722s 0.1115s 8.9706 Ops/s 9.5646 Ops/s $\textbf{\color{#d91a1a}-6.21\%}$
test_serialize_weights_returnearly 0.1833s 0.1301s 7.6849 Ops/s 7.3784 Ops/s $\color{#35bf28}+4.15\%$
test_serialize_weights_pickle 0.6937s 0.4857s 2.0587 Ops/s 2.3532 Ops/s $\textbf{\color{#d91a1a}-12.51\%}$
test_serialize_weights_filesystem 0.1598s 0.1011s 9.8893 Ops/s 9.9802 Ops/s $\color{#d91a1a}-0.91\%$
test_serialize_model_filesystem 94.3306ms 92.0260ms 10.8665 Ops/s 10.7636 Ops/s $\color{#35bf28}+0.96\%$
test_reshape_pytree 62.9380μs 25.2270μs 39.6401 KOps/s 39.7371 KOps/s $\color{#d91a1a}-0.24\%$
test_reshape_td 95.4290μs 34.2551μs 29.1927 KOps/s 28.9931 KOps/s $\color{#35bf28}+0.69\%$
test_view_pytree 63.5390μs 25.4574μs 39.2813 KOps/s 40.0870 KOps/s $\color{#d91a1a}-2.01\%$
test_view_td 82.5960μs 38.1396μs 26.2195 KOps/s 25.7785 KOps/s $\color{#35bf28}+1.71\%$
test_unbind_pytree 66.2050μs 28.9829μs 34.5031 KOps/s 34.4760 KOps/s $\color{#35bf28}+0.08\%$
test_unbind_td 0.3982ms 37.9507μs 26.3500 KOps/s 26.3833 KOps/s $\color{#d91a1a}-0.13\%$
test_split_pytree 66.5150μs 29.5963μs 33.7880 KOps/s 34.8065 KOps/s $\color{#d91a1a}-2.93\%$
test_split_td 0.1227ms 40.7699μs 24.5279 KOps/s 24.2802 KOps/s $\color{#35bf28}+1.02\%$
test_add_pytree 0.1099ms 37.1185μs 26.9407 KOps/s 28.8627 KOps/s $\textbf{\color{#d91a1a}-6.66\%}$
test_add_td 0.1022ms 56.6162μs 17.6628 KOps/s 17.7298 KOps/s $\color{#d91a1a}-0.38\%$
test_distributed 0.2282ms 0.1002ms 9.9763 KOps/s 9.8006 KOps/s $\color{#35bf28}+1.79\%$
test_tdmodule 29.8860μs 18.0783μs 55.3149 KOps/s 50.2899 KOps/s $\textbf{\color{#35bf28}+9.99\%}$
test_tdmodule_dispatch 65.8330μs 35.9529μs 27.8142 KOps/s 28.6405 KOps/s $\color{#d91a1a}-2.89\%$
test_tdseq 44.3530μs 21.4608μs 46.5965 KOps/s 47.7909 KOps/s $\color{#d91a1a}-2.50\%$
test_tdseq_dispatch 64.1310μs 41.3412μs 24.1890 KOps/s 21.2054 KOps/s $\textbf{\color{#35bf28}+14.07\%}$
test_instantiation_functorch 2.7877ms 1.3253ms 754.5553 Ops/s 770.1453 Ops/s $\color{#d91a1a}-2.02\%$
test_instantiation_td 70.0751ms 1.1044ms 905.4358 Ops/s 989.6496 Ops/s $\textbf{\color{#d91a1a}-8.51\%}$
test_exec_functorch 0.2249ms 0.1634ms 6.1188 KOps/s 6.1792 KOps/s $\color{#d91a1a}-0.98\%$
test_exec_functional_call 0.3177ms 0.1503ms 6.6524 KOps/s 6.6214 KOps/s $\color{#35bf28}+0.47\%$
test_exec_td 0.2556ms 0.1466ms 6.8201 KOps/s 6.8317 KOps/s $\color{#d91a1a}-0.17\%$
test_exec_td_decorator 0.3500ms 0.2243ms 4.4581 KOps/s 4.5178 KOps/s $\color{#d91a1a}-1.32\%$
test_vmap_mlp_speed[True-True] 0.9067ms 0.4888ms 2.0460 KOps/s 2.0489 KOps/s $\color{#d91a1a}-0.14\%$
test_vmap_mlp_speed[True-False] 0.6940ms 0.4849ms 2.0624 KOps/s 2.0542 KOps/s $\color{#35bf28}+0.40\%$
test_vmap_mlp_speed[False-True] 0.5284ms 0.3922ms 2.5498 KOps/s 2.5195 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_mlp_speed[False-False] 0.6521ms 0.3938ms 2.5392 KOps/s 2.5306 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed_decorator[True-True] 0.9865ms 0.5539ms 1.8053 KOps/s 1.7778 KOps/s $\color{#35bf28}+1.55\%$
test_vmap_mlp_speed_decorator[True-False] 0.8642ms 0.5519ms 1.8120 KOps/s 1.6283 KOps/s $\textbf{\color{#35bf28}+11.28\%}$
test_vmap_mlp_speed_decorator[False-True] 0.6293ms 0.4519ms 2.2127 KOps/s 2.1309 KOps/s $\color{#35bf28}+3.83\%$
test_vmap_mlp_speed_decorator[False-False] 0.7777ms 0.4546ms 2.1999 KOps/s 2.0591 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_to_module_speed[True] 2.3973ms 1.7173ms 582.3160 Ops/s 596.3287 Ops/s $\color{#d91a1a}-2.35\%$
test_to_module_speed[False] 2.3065ms 1.6822ms 594.4620 Ops/s 602.0977 Ops/s $\color{#d91a1a}-1.27\%$
test_tc_init 60.4840μs 29.6378μs 33.7407 KOps/s 35.6483 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_tc_init_nested 0.1459ms 61.4368μs 16.2769 KOps/s 18.2784 KOps/s $\textbf{\color{#d91a1a}-10.95\%}$
test_tc_first_layer_tensor 4.7289μs 0.6984μs 1.4318 MOps/s 1.4685 MOps/s $\color{#d91a1a}-2.50\%$
test_tc_first_layer_nontensor 1.8289μs 0.6743μs 1.4831 MOps/s 1.5083 MOps/s $\color{#d91a1a}-1.67\%$
test_tc_second_layer_tensor 21.7410μs 1.8586μs 538.0396 KOps/s 541.7695 KOps/s $\color{#d91a1a}-0.69\%$
test_tc_second_layer_nontensor 8.9970μs 1.5321μs 652.7104 KOps/s 657.7871 KOps/s $\color{#d91a1a}-0.77\%$
test_unbind 80.5611ms 6.3890ms 156.5196 Ops/s 140.5268 Ops/s $\textbf{\color{#35bf28}+11.38\%}$
test_full_like 16.4915ms 10.8209ms 92.4136 Ops/s 97.7183 Ops/s $\textbf{\color{#d91a1a}-5.43\%}$
test_zeros_like 11.8999ms 5.5794ms 179.2316 Ops/s 171.0368 Ops/s $\color{#35bf28}+4.79\%$
test_ones_like 14.0965ms 5.9528ms 167.9894 Ops/s 164.8356 Ops/s $\color{#35bf28}+1.91\%$
test_clone 12.2555ms 7.4496ms 134.2349 Ops/s 129.5448 Ops/s $\color{#35bf28}+3.62\%$
test_squeeze 68.2780μs 14.9362μs 66.9514 KOps/s 70.9312 KOps/s $\textbf{\color{#d91a1a}-5.61\%}$
test_unsqueeze 0.1110ms 60.5221μs 16.5229 KOps/s 16.2168 KOps/s $\color{#35bf28}+1.89\%$
test_split 0.1696ms 0.1120ms 8.9263 KOps/s 8.9043 KOps/s $\color{#35bf28}+0.25\%$
test_permute 0.2093ms 0.1260ms 7.9338 KOps/s 7.8436 KOps/s $\color{#35bf28}+1.15\%$
test_stack 24.7824ms 20.8681ms 47.9199 Ops/s 46.7340 Ops/s $\color{#35bf28}+2.54\%$
test_cat 24.4549ms 20.8878ms 47.8748 Ops/s 46.3966 Ops/s $\color{#35bf28}+3.19\%$

Copy link

github-actions bot commented Jun 18, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 152. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.1141ms 13.6671μs 73.1682 KOps/s 78.0721 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_plain_set_stack_nested 28.9510μs 13.9635μs 71.6152 KOps/s 76.6288 KOps/s $\textbf{\color{#d91a1a}-6.54\%}$
test_plain_set_nested_inplace 39.1300μs 15.0363μs 66.5056 KOps/s 70.9863 KOps/s $\textbf{\color{#d91a1a}-6.31\%}$
test_plain_set_stack_nested_inplace 39.7810μs 15.0733μs 66.3425 KOps/s 70.2583 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_items 23.0100μs 4.6166μs 216.6110 KOps/s 217.5550 KOps/s $\color{#d91a1a}-0.43\%$
test_items_nested 0.3831ms 0.3360ms 2.9759 KOps/s 2.9795 KOps/s $\color{#d91a1a}-0.12\%$
test_items_nested_locked 0.3931ms 0.3386ms 2.9534 KOps/s 2.9827 KOps/s $\color{#d91a1a}-0.98\%$
test_items_nested_leaf 0.1028ms 82.4014μs 12.1357 KOps/s 12.1664 KOps/s $\color{#d91a1a}-0.25\%$
test_items_stack_nested 0.4040ms 0.3428ms 2.9170 KOps/s 2.9640 KOps/s $\color{#d91a1a}-1.59\%$
test_items_stack_nested_leaf 0.1104ms 84.0762μs 11.8940 KOps/s 12.1617 KOps/s $\color{#d91a1a}-2.20\%$
test_items_stack_nested_locked 0.4155ms 0.3434ms 2.9124 KOps/s 2.9696 KOps/s $\color{#d91a1a}-1.92\%$
test_keys 31.5310μs 4.2978μs 232.6765 KOps/s 230.2486 KOps/s $\color{#35bf28}+1.05\%$
test_keys_nested 91.2810μs 67.7513μs 14.7599 KOps/s 14.9879 KOps/s $\color{#d91a1a}-1.52\%$
test_keys_nested_locked 2.0737ms 72.9738μs 13.7036 KOps/s 13.9437 KOps/s $\color{#d91a1a}-1.72\%$
test_keys_nested_leaf 86.9510μs 57.7867μs 17.3050 KOps/s 17.5431 KOps/s $\color{#d91a1a}-1.36\%$
test_keys_stack_nested 96.6420μs 67.8984μs 14.7279 KOps/s 14.9547 KOps/s $\color{#d91a1a}-1.52\%$
test_keys_stack_nested_leaf 95.0210μs 58.4006μs 17.1231 KOps/s 17.4023 KOps/s $\color{#d91a1a}-1.60\%$
test_keys_stack_nested_locked 0.1030ms 72.3150μs 13.8284 KOps/s 13.9753 KOps/s $\color{#d91a1a}-1.05\%$
test_values 12.3037μs 1.8059μs 553.7391 KOps/s 549.4052 KOps/s $\color{#35bf28}+0.79\%$
test_values_nested 65.2100μs 35.1922μs 28.4153 KOps/s 28.2309 KOps/s $\color{#35bf28}+0.65\%$
test_values_nested_locked 61.5910μs 36.8270μs 27.1540 KOps/s 27.1737 KOps/s $\color{#d91a1a}-0.07\%$
test_values_nested_leaf 48.3710μs 31.0585μs 32.1973 KOps/s 31.9263 KOps/s $\color{#35bf28}+0.85\%$
test_values_stack_nested 62.2110μs 35.8252μs 27.9133 KOps/s 27.8208 KOps/s $\color{#35bf28}+0.33\%$
test_values_stack_nested_leaf 71.9810μs 31.7500μs 31.4961 KOps/s 31.0021 KOps/s $\color{#35bf28}+1.59\%$
test_values_stack_nested_locked 65.1310μs 37.7592μs 26.4836 KOps/s 26.5730 KOps/s $\color{#d91a1a}-0.34\%$
test_membership 1.6405μs 0.7002μs 1.4281 MOps/s 1.4289 MOps/s $\color{#d91a1a}-0.05\%$
test_membership_nested 13.9910μs 2.4741μs 404.1912 KOps/s 403.0279 KOps/s $\color{#35bf28}+0.29\%$
test_membership_nested_leaf 15.8400μs 2.4679μs 405.2100 KOps/s 401.8616 KOps/s $\color{#35bf28}+0.83\%$
test_membership_stacked_nested 26.8310μs 2.5236μs 396.2577 KOps/s 397.7349 KOps/s $\color{#d91a1a}-0.37\%$
test_membership_stacked_nested_leaf 32.8710μs 2.4729μs 404.3821 KOps/s 400.9003 KOps/s $\color{#35bf28}+0.87\%$
test_membership_nested_last 16.3190μs 2.9954μs 333.8439 KOps/s 333.1456 KOps/s $\color{#35bf28}+0.21\%$
test_membership_nested_leaf_last 43.1010μs 3.0109μs 332.1263 KOps/s 334.2476 KOps/s $\color{#d91a1a}-0.63\%$
test_membership_stacked_nested_last 15.9800μs 3.0028μs 333.0225 KOps/s 334.4661 KOps/s $\color{#d91a1a}-0.43\%$
test_membership_stacked_nested_leaf_last 26.5500μs 3.0054μs 332.7296 KOps/s 334.7012 KOps/s $\color{#d91a1a}-0.59\%$
test_nested_getleaf 38.9300μs 8.3285μs 120.0695 KOps/s 120.3894 KOps/s $\color{#d91a1a}-0.27\%$
test_nested_get 31.5500μs 7.7818μs 128.5045 KOps/s 127.2977 KOps/s $\color{#35bf28}+0.95\%$
test_stacked_getleaf 31.4000μs 8.3107μs 120.3263 KOps/s 119.6821 KOps/s $\color{#35bf28}+0.54\%$
test_stacked_get 36.0310μs 7.8422μs 127.5148 KOps/s 126.8039 KOps/s $\color{#35bf28}+0.56\%$
test_nested_getitemleaf 25.0410μs 8.4593μs 118.2132 KOps/s 116.8286 KOps/s $\color{#35bf28}+1.19\%$
test_nested_getitem 31.7800μs 8.0956μs 123.5244 KOps/s 124.2038 KOps/s $\color{#d91a1a}-0.55\%$
test_stacked_getitemleaf 34.8710μs 8.7148μs 114.7473 KOps/s 117.2304 KOps/s $\color{#d91a1a}-2.12\%$
test_stacked_getitem 23.9600μs 8.0673μs 123.9571 KOps/s 124.8569 KOps/s $\color{#d91a1a}-0.72\%$
test_lock_nested 58.8258ms 0.3960ms 2.5254 KOps/s 2.4835 KOps/s $\color{#35bf28}+1.69\%$
test_lock_stack_nested 0.3650ms 0.2976ms 3.3599 KOps/s 3.3498 KOps/s $\color{#35bf28}+0.30\%$
test_unlock_nested 60.9245ms 0.4036ms 2.4778 KOps/s 2.4605 KOps/s $\color{#35bf28}+0.70\%$
test_unlock_stack_nested 0.3616ms 0.3087ms 3.2389 KOps/s 3.2517 KOps/s $\color{#d91a1a}-0.39\%$
test_flatten_speed 0.3420ms 0.1008ms 9.9226 KOps/s 9.9772 KOps/s $\color{#d91a1a}-0.55\%$
test_unflatten_speed 0.3455ms 0.2915ms 3.4300 KOps/s 3.4453 KOps/s $\color{#d91a1a}-0.44\%$
test_common_ops 1.0860ms 0.6060ms 1.6501 KOps/s 1.7184 KOps/s $\color{#d91a1a}-3.97\%$
test_creation 36.5210μs 1.6044μs 623.2779 KOps/s 625.1654 KOps/s $\color{#d91a1a}-0.30\%$
test_creation_empty 24.6610μs 10.3911μs 96.2358 KOps/s 113.1088 KOps/s $\textbf{\color{#d91a1a}-14.92\%}$
test_creation_nested_1 30.5510μs 12.0518μs 82.9755 KOps/s 93.4381 KOps/s $\textbf{\color{#d91a1a}-11.20\%}$
test_creation_nested_2 41.8610μs 14.3056μs 69.9029 KOps/s 78.2383 KOps/s $\textbf{\color{#d91a1a}-10.65\%}$
test_clone 71.6810μs 11.3747μs 87.9146 KOps/s 88.3223 KOps/s $\color{#d91a1a}-0.46\%$
test_getitem[int] 35.8600μs 10.5382μs 94.8926 KOps/s 91.9311 KOps/s $\color{#35bf28}+3.22\%$
test_getitem[slice_int] 36.6000μs 20.0597μs 49.8513 KOps/s 48.7748 KOps/s $\color{#35bf28}+2.21\%$
test_getitem[range] 64.7200μs 45.7488μs 21.8585 KOps/s 21.7729 KOps/s $\color{#35bf28}+0.39\%$
test_getitem[tuple] 42.1000μs 18.2530μs 54.7856 KOps/s 54.9595 KOps/s $\color{#d91a1a}-0.32\%$
test_getitem[list] 0.1592ms 33.8962μs 29.5019 KOps/s 30.6659 KOps/s $\color{#d91a1a}-3.80\%$
test_setitem_dim[int] 49.5610μs 32.4252μs 30.8402 KOps/s 35.7706 KOps/s $\textbf{\color{#d91a1a}-13.78\%}$
test_setitem_dim[slice_int] 73.9610μs 52.9934μs 18.8703 KOps/s 20.4952 KOps/s $\textbf{\color{#d91a1a}-7.93\%}$
test_setitem_dim[range] 96.7500μs 71.6504μs 13.9566 KOps/s 15.4394 KOps/s $\textbf{\color{#d91a1a}-9.60\%}$
test_setitem_dim[tuple] 67.0110μs 45.6037μs 21.9280 KOps/s 24.3818 KOps/s $\textbf{\color{#d91a1a}-10.06\%}$
test_setitem 57.4510μs 16.8511μs 59.3432 KOps/s 61.3303 KOps/s $\color{#d91a1a}-3.24\%$
test_set 47.7010μs 16.3718μs 61.0808 KOps/s 64.7284 KOps/s $\textbf{\color{#d91a1a}-5.64\%}$
test_set_shared 1.3730ms 99.0246μs 10.0985 KOps/s 10.2984 KOps/s $\color{#d91a1a}-1.94\%$
test_update 95.5210μs 19.9502μs 50.1247 KOps/s 56.3009 KOps/s $\textbf{\color{#d91a1a}-10.97\%}$
test_update_nested 87.0720μs 24.5887μs 40.6690 KOps/s 43.7156 KOps/s $\textbf{\color{#d91a1a}-6.97\%}$
test_update__nested 52.3510μs 21.5634μs 46.3748 KOps/s 45.9103 KOps/s $\color{#35bf28}+1.01\%$
test_set_nested 61.5710μs 17.1380μs 58.3499 KOps/s 59.6343 KOps/s $\color{#d91a1a}-2.15\%$
test_set_nested_new 95.5720μs 20.0822μs 49.7952 KOps/s 52.0651 KOps/s $\color{#d91a1a}-4.36\%$
test_select 67.1610μs 32.4807μs 30.7875 KOps/s 31.1908 KOps/s $\color{#d91a1a}-1.29\%$
test_select_nested 89.3320μs 54.9990μs 18.1822 KOps/s 18.2351 KOps/s $\color{#d91a1a}-0.29\%$
test_exclude_nested 0.1563ms 0.1104ms 9.0576 KOps/s 9.0741 KOps/s $\color{#d91a1a}-0.18\%$
test_empty[True] 0.4027ms 0.3397ms 2.9436 KOps/s 2.9382 KOps/s $\color{#35bf28}+0.18\%$
test_empty[False] 2.8911μs 0.9237μs 1.0826 MOps/s 1.0901 MOps/s $\color{#d91a1a}-0.69\%$
test_to 97.2020μs 71.1886μs 14.0472 KOps/s 12.9805 KOps/s $\textbf{\color{#35bf28}+8.22\%}$
test_to_nonblocking 0.1143ms 64.4849μs 15.5075 KOps/s 16.5345 KOps/s $\textbf{\color{#d91a1a}-6.21\%}$
test_unbind_speed 0.2930ms 0.2598ms 3.8491 KOps/s 3.8159 KOps/s $\color{#35bf28}+0.87\%$
test_unbind_speed_stack0 0.3705ms 0.2617ms 3.8210 KOps/s 3.8346 KOps/s $\color{#d91a1a}-0.35\%$
test_unbind_speed_stack1 75.8264ms 0.7970ms 1.2547 KOps/s 1.2563 KOps/s $\color{#d91a1a}-0.13\%$
test_split 76.2312ms 1.6960ms 589.6111 Ops/s 590.5487 Ops/s $\color{#d91a1a}-0.16\%$
test_chunk 76.1631ms 1.7091ms 585.1034 Ops/s 591.2135 Ops/s $\color{#d91a1a}-1.03\%$
test_creation[device0] 0.1319ms 56.3764μs 17.7379 KOps/s 17.8680 KOps/s $\color{#d91a1a}-0.73\%$
test_creation_from_tensor 0.1340ms 52.8389μs 18.9255 KOps/s 18.8382 KOps/s $\color{#35bf28}+0.46\%$
test_add_one[memmap_tensor0] 79.1610μs 6.8268μs 146.4806 KOps/s 149.4470 KOps/s $\color{#d91a1a}-1.98\%$
test_contiguous[memmap_tensor0] 10.2010μs 0.6282μs 1.5919 MOps/s 1.5684 MOps/s $\color{#35bf28}+1.50\%$
test_stack[memmap_tensor0] 28.8700μs 4.7401μs 210.9675 KOps/s 215.7849 KOps/s $\color{#d91a1a}-2.23\%$
test_memmaptd_index 1.0617ms 0.2816ms 3.5512 KOps/s 3.5211 KOps/s $\color{#35bf28}+0.85\%$
test_memmaptd_index_astensor 0.6121ms 0.3506ms 2.8519 KOps/s 2.8241 KOps/s $\color{#35bf28}+0.98\%$
test_memmaptd_index_op 0.9681ms 0.6836ms 1.4629 KOps/s 1.5446 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_serialize_model 0.1830s 0.1109s 9.0183 Ops/s 8.5146 Ops/s $\textbf{\color{#35bf28}+5.92\%}$
test_serialize_model_pickle 1.3579s 1.2357s 0.8093 Ops/s 0.8085 Ops/s $\color{#35bf28}+0.10\%$
test_serialize_weights 0.1806s 0.1080s 9.2550 Ops/s 8.7874 Ops/s $\textbf{\color{#35bf28}+5.32\%}$
test_serialize_weights_returnearly 0.3002s 0.1043s 9.5919 Ops/s 10.3566 Ops/s $\textbf{\color{#d91a1a}-7.38\%}$
test_serialize_weights_pickle 1.3539s 1.2485s 0.8009 Ops/s 0.8087 Ops/s $\color{#d91a1a}-0.96\%$
test_reshape_pytree 48.8610μs 25.6972μs 38.9148 KOps/s 38.3664 KOps/s $\color{#35bf28}+1.43\%$
test_reshape_td 74.6610μs 30.1510μs 33.1664 KOps/s 32.2733 KOps/s $\color{#35bf28}+2.77\%$
test_view_pytree 90.6910μs 25.8604μs 38.6692 KOps/s 39.2348 KOps/s $\color{#d91a1a}-1.44\%$
test_view_td 61.2610μs 35.5705μs 28.1132 KOps/s 28.2579 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_pytree 54.3710μs 31.4364μs 31.8102 KOps/s 31.9740 KOps/s $\color{#d91a1a}-0.51\%$
test_unbind_td 0.4767ms 40.3232μs 24.7996 KOps/s 24.1878 KOps/s $\color{#35bf28}+2.53\%$
test_split_pytree 59.3710μs 34.3811μs 29.0857 KOps/s 28.1578 KOps/s $\color{#35bf28}+3.30\%$
test_split_td 0.1061ms 38.7872μs 25.7817 KOps/s 25.5367 KOps/s $\color{#35bf28}+0.96\%$
test_add_pytree 74.3510μs 37.9485μs 26.3515 KOps/s 26.4985 KOps/s $\color{#d91a1a}-0.55\%$
test_add_td 85.1420μs 57.3453μs 17.4382 KOps/s 19.7347 KOps/s $\textbf{\color{#d91a1a}-11.64\%}$
test_distributed 2.8447ms 88.4177μs 11.3100 KOps/s 14.9865 KOps/s $\textbf{\color{#d91a1a}-24.53\%}$
test_tdmodule 30.9410μs 15.5608μs 64.2640 KOps/s 67.5866 KOps/s $\color{#d91a1a}-4.92\%$
test_tdmodule_dispatch 52.9200μs 30.8563μs 32.4083 KOps/s 34.7692 KOps/s $\textbf{\color{#d91a1a}-6.79\%}$
test_tdseq 33.4100μs 17.4805μs 57.2067 KOps/s 60.3959 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_tdseq_dispatch 51.5920μs 34.3819μs 29.0851 KOps/s 31.1933 KOps/s $\textbf{\color{#d91a1a}-6.76\%}$
test_instantiation_functorch 1.6502ms 1.5217ms 657.1749 Ops/s 657.5885 Ops/s $\color{#d91a1a}-0.06\%$
test_instantiation_td 1.5694ms 1.0524ms 950.2055 Ops/s 961.1444 Ops/s $\color{#d91a1a}-1.14\%$
test_exec_functorch 0.2351ms 0.1488ms 6.7196 KOps/s 6.6723 KOps/s $\color{#35bf28}+0.71\%$
test_exec_functional_call 0.1832ms 0.1349ms 7.4104 KOps/s 7.3576 KOps/s $\color{#35bf28}+0.72\%$
test_exec_td 0.1677ms 0.1348ms 7.4169 KOps/s 7.4480 KOps/s $\color{#d91a1a}-0.42\%$
test_exec_td_decorator 0.7006ms 0.2072ms 4.8258 KOps/s 4.8106 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed[True-True] 0.7542ms 0.5755ms 1.7376 KOps/s 1.7722 KOps/s $\color{#d91a1a}-1.95\%$
test_vmap_mlp_speed[True-False] 0.7149ms 0.5770ms 1.7331 KOps/s 1.7481 KOps/s $\color{#d91a1a}-0.86\%$
test_vmap_mlp_speed[False-True] 0.5952ms 0.5217ms 1.9167 KOps/s 2.0136 KOps/s $\color{#d91a1a}-4.81\%$
test_vmap_mlp_speed[False-False] 0.6070ms 0.5186ms 1.9283 KOps/s 2.0294 KOps/s $\color{#d91a1a}-4.99\%$
test_vmap_mlp_speed_decorator[True-True] 1.3773ms 0.6363ms 1.5716 KOps/s 1.5967 KOps/s $\color{#d91a1a}-1.57\%$
test_vmap_mlp_speed_decorator[True-False] 0.7556ms 0.6324ms 1.5813 KOps/s 1.5985 KOps/s $\color{#d91a1a}-1.08\%$
test_vmap_mlp_speed_decorator[False-True] 0.7134ms 0.5618ms 1.7801 KOps/s 1.8157 KOps/s $\color{#d91a1a}-1.96\%$
test_vmap_mlp_speed_decorator[False-False] 0.7254ms 0.5674ms 1.7625 KOps/s 1.8021 KOps/s $\color{#d91a1a}-2.20\%$
test_vmap_transformer_speed[True-True] 7.6819ms 7.3985ms 135.1632 Ops/s 134.2127 Ops/s $\color{#35bf28}+0.71\%$
test_vmap_transformer_speed[True-False] 7.6557ms 7.3623ms 135.8265 Ops/s 129.0384 Ops/s $\textbf{\color{#35bf28}+5.26\%}$
test_vmap_transformer_speed[False-True] 7.8032ms 7.4004ms 135.1281 Ops/s 133.1076 Ops/s $\color{#35bf28}+1.52\%$
test_vmap_transformer_speed[False-False] 7.7914ms 7.4244ms 134.6906 Ops/s 133.4651 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_transformer_speed_decorator[True-True] 18.7091ms 18.1782ms 55.0111 Ops/s 54.3071 Ops/s $\color{#35bf28}+1.30\%$
test_vmap_transformer_speed_decorator[True-False] 18.4415ms 18.0661ms 55.3522 Ops/s 53.9662 Ops/s $\color{#35bf28}+2.57\%$
test_vmap_transformer_speed_decorator[False-True] 18.5552ms 18.0265ms 55.4739 Ops/s 54.5605 Ops/s $\color{#35bf28}+1.67\%$
test_vmap_transformer_speed_decorator[False-False] 18.8151ms 18.0841ms 55.2972 Ops/s 54.8361 Ops/s $\color{#35bf28}+0.84\%$
test_to_module_speed[True] 1.6920ms 1.5440ms 647.6740 Ops/s 630.1736 Ops/s $\color{#35bf28}+2.78\%$
test_to_module_speed[False] 2.0284ms 1.5262ms 655.2077 Ops/s 647.1554 Ops/s $\color{#35bf28}+1.24\%$
test_tc_init 50.9910μs 29.1908μs 34.2574 KOps/s 38.1760 KOps/s $\textbf{\color{#d91a1a}-10.26\%}$
test_tc_init_nested 88.6620μs 57.4431μs 17.4085 KOps/s 18.1597 KOps/s $\color{#d91a1a}-4.14\%$
test_tc_first_layer_tensor 3.2050μs 0.3965μs 2.5218 MOps/s 2.8059 MOps/s $\textbf{\color{#d91a1a}-10.12\%}$
test_tc_first_layer_nontensor 1.4724μs 0.3897μs 2.5662 MOps/s 2.5836 MOps/s $\color{#d91a1a}-0.67\%$
test_tc_second_layer_tensor 3.8680μs 0.9687μs 1.0324 MOps/s 936.8412 KOps/s $\textbf{\color{#35bf28}+10.20\%}$
test_tc_second_layer_nontensor 2.6539μs 0.8064μs 1.2400 MOps/s 1.2162 MOps/s $\color{#35bf28}+1.95\%$
test_unbind 0.1102s 6.6968ms 149.3243 Ops/s 204.3132 Ops/s $\textbf{\color{#d91a1a}-26.91\%}$
test_full_like 13.7695ms 13.1701ms 75.9298 Ops/s 75.3013 Ops/s $\color{#35bf28}+0.83\%$
test_zeros_like 7.9992ms 7.8265ms 127.7714 Ops/s 128.3861 Ops/s $\color{#d91a1a}-0.48\%$
test_ones_like 8.0176ms 7.8113ms 128.0200 Ops/s 126.9401 Ops/s $\color{#35bf28}+0.85\%$
test_clone 9.5457ms 9.3962ms 106.4257 Ops/s 107.3349 Ops/s $\color{#d91a1a}-0.85\%$
test_squeeze 71.5220μs 10.5980μs 94.3570 KOps/s 94.1865 KOps/s $\color{#35bf28}+0.18\%$
test_unsqueeze 0.1196ms 49.8651μs 20.0541 KOps/s 20.0138 KOps/s $\color{#35bf28}+0.20\%$
test_split 0.1478ms 98.2883μs 10.1742 KOps/s 9.9351 KOps/s $\color{#35bf28}+2.41\%$
test_permute 0.1622ms 0.1096ms 9.1279 KOps/s 8.6173 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_stack 27.6940ms 27.3483ms 36.5653 Ops/s 36.7886 Ops/s $\color{#d91a1a}-0.61\%$
test_cat 27.3653ms 27.1452ms 36.8389 Ops/s 36.8376 Ops/s $+0.00\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants