Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster dispatch #487

Merged
merged 4 commits into from
Dec 4, 2023
Merged

[Performance] Faster dispatch #487

merged 4 commits into from
Dec 4, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 11, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2023
@github-actions
Copy link

github-actions bot commented Jul 11, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 113. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 30.5260μs 15.6958μs 63.7114 KOps/s 61.9288 KOps/s $\color{#35bf28}+2.88\%$
test_plain_set_stack_nested 0.2428ms 0.1414ms 7.0709 KOps/s 6.8254 KOps/s $\color{#35bf28}+3.60\%$
test_plain_set_nested_inplace 43.5120μs 18.2504μs 54.7933 KOps/s 53.7699 KOps/s $\color{#35bf28}+1.90\%$
test_plain_set_stack_nested_inplace 0.3471ms 0.1747ms 5.7255 KOps/s 5.5595 KOps/s $\color{#35bf28}+2.99\%$
test_items 18.3650μs 2.4134μs 414.3484 KOps/s 414.9788 KOps/s $\color{#d91a1a}-0.15\%$
test_items_nested 0.4152ms 0.2658ms 3.7619 KOps/s 3.7083 KOps/s $\color{#35bf28}+1.45\%$
test_items_nested_locked 1.0286ms 0.2663ms 3.7547 KOps/s 3.6711 KOps/s $\color{#35bf28}+2.28\%$
test_items_nested_leaf 0.5890ms 0.1659ms 6.0278 KOps/s 5.9852 KOps/s $\color{#35bf28}+0.71\%$
test_items_stack_nested 2.5178ms 1.5200ms 657.8966 Ops/s 675.0379 Ops/s $\color{#d91a1a}-2.54\%$
test_items_stack_nested_leaf 2.2768ms 1.3650ms 732.5766 Ops/s 737.5453 Ops/s $\color{#d91a1a}-0.67\%$
test_items_stack_nested_locked 2.0061ms 0.7595ms 1.3166 KOps/s 1.2946 KOps/s $\color{#35bf28}+1.69\%$
test_keys 24.5250μs 3.8789μs 257.8063 KOps/s 257.8284 KOps/s $-0.01\%$
test_keys_nested 0.5844ms 0.1397ms 7.1596 KOps/s 6.6960 KOps/s $\textbf{\color{#35bf28}+6.92\%}$
test_keys_nested_locked 0.2592ms 0.1383ms 7.2302 KOps/s 7.1588 KOps/s $\color{#35bf28}+1.00\%$
test_keys_nested_leaf 0.3842ms 0.1390ms 7.1939 KOps/s 7.1261 KOps/s $\color{#35bf28}+0.95\%$
test_keys_stack_nested 2.1886ms 1.4125ms 707.9519 Ops/s 710.0191 Ops/s $\color{#d91a1a}-0.29\%$
test_keys_stack_nested_leaf 1.5480ms 1.4064ms 711.0130 Ops/s 713.1970 Ops/s $\color{#d91a1a}-0.31\%$
test_keys_stack_nested_locked 0.8370ms 0.6745ms 1.4827 KOps/s 1.4768 KOps/s $\color{#35bf28}+0.40\%$
test_values 18.8150μs 1.1682μs 856.0403 KOps/s 878.3878 KOps/s $\color{#d91a1a}-2.54\%$
test_values_nested 0.1498ms 48.9412μs 20.4327 KOps/s 19.2628 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_values_nested_locked 0.1098ms 49.1303μs 20.3541 KOps/s 19.1211 KOps/s $\textbf{\color{#35bf28}+6.45\%}$
test_values_nested_leaf 58.0880μs 43.8428μs 22.8087 KOps/s 21.9550 KOps/s $\color{#35bf28}+3.89\%$
test_values_stack_nested 1.3629ms 1.1946ms 837.1037 Ops/s 826.4048 Ops/s $\color{#35bf28}+1.29\%$
test_values_stack_nested_leaf 1.4554ms 1.1922ms 838.7543 Ops/s 837.9293 Ops/s $\color{#35bf28}+0.10\%$
test_values_stack_nested_locked 0.9492ms 0.5069ms 1.9726 KOps/s 1.9646 KOps/s $\color{#35bf28}+0.41\%$
test_membership 40.1150μs 1.3232μs 755.7500 KOps/s 746.0375 KOps/s $\color{#35bf28}+1.30\%$
test_membership_nested 25.7680μs 2.7471μs 364.0214 KOps/s 352.7022 KOps/s $\color{#35bf28}+3.21\%$
test_membership_nested_leaf 25.7980μs 2.7653μs 361.6307 KOps/s 353.6573 KOps/s $\color{#35bf28}+2.25\%$
test_membership_stacked_nested 44.2320μs 11.6669μs 85.7128 KOps/s 83.8660 KOps/s $\color{#35bf28}+2.20\%$
test_membership_stacked_nested_leaf 51.8570μs 11.7869μs 84.8400 KOps/s 83.6280 KOps/s $\color{#35bf28}+1.45\%$
test_membership_nested_last 24.9360μs 5.9095μs 169.2190 KOps/s 153.2159 KOps/s $\textbf{\color{#35bf28}+10.44\%}$
test_membership_nested_leaf_last 31.4980μs 5.9197μs 168.9267 KOps/s 158.0767 KOps/s $\textbf{\color{#35bf28}+6.86\%}$
test_membership_stacked_nested_last 0.2346ms 0.1668ms 5.9958 KOps/s 5.7948 KOps/s $\color{#35bf28}+3.47\%$
test_membership_stacked_nested_leaf_last 38.1810μs 13.8143μs 72.3890 KOps/s 71.1061 KOps/s $\color{#35bf28}+1.80\%$
test_nested_getleaf 36.6390μs 10.5426μs 94.8532 KOps/s 93.8267 KOps/s $\color{#35bf28}+1.09\%$
test_nested_get 37.6610μs 10.0242μs 99.7582 KOps/s 99.1306 KOps/s $\color{#35bf28}+0.63\%$
test_stacked_getleaf 1.1108ms 0.6408ms 1.5605 KOps/s 1.5404 KOps/s $\color{#35bf28}+1.31\%$
test_stacked_get 0.7868ms 0.6117ms 1.6348 KOps/s 1.6227 KOps/s $\color{#35bf28}+0.75\%$
test_nested_getitemleaf 33.9340μs 10.7571μs 92.9623 KOps/s 92.7375 KOps/s $\color{#35bf28}+0.24\%$
test_nested_getitem 37.3290μs 10.1335μs 98.6824 KOps/s 98.1794 KOps/s $\color{#35bf28}+0.51\%$
test_stacked_getitemleaf 0.7437ms 0.6406ms 1.5611 KOps/s 1.5355 KOps/s $\color{#35bf28}+1.67\%$
test_stacked_getitem 0.7179ms 0.6128ms 1.6319 KOps/s 1.6195 KOps/s $\color{#35bf28}+0.76\%$
test_lock_nested 68.1233ms 0.6250ms 1.6000 KOps/s 1.7374 KOps/s $\textbf{\color{#d91a1a}-7.90\%}$
test_lock_stack_nested 10.0283ms 5.1915ms 192.6216 Ops/s 188.7329 Ops/s $\color{#35bf28}+2.06\%$
test_unlock_nested 1.1159ms 0.4455ms 2.2446 KOps/s 2.2012 KOps/s $\color{#35bf28}+1.97\%$
test_unlock_stack_nested 88.1510ms 7.5359ms 132.6974 Ops/s 130.0962 Ops/s $\color{#35bf28}+2.00\%$
test_flatten_speed 0.5608ms 0.2655ms 3.7665 KOps/s 3.7330 KOps/s $\color{#35bf28}+0.90\%$
test_unflatten_speed 0.5305ms 0.4518ms 2.2133 KOps/s 2.1607 KOps/s $\color{#35bf28}+2.43\%$
test_common_ops 4.2245ms 0.6770ms 1.4772 KOps/s 1.3824 KOps/s $\textbf{\color{#35bf28}+6.85\%}$
test_creation 26.3390μs 2.4319μs 411.2070 KOps/s 397.7633 KOps/s $\color{#35bf28}+3.38\%$
test_creation_empty 24.0850μs 8.5216μs 117.3491 KOps/s 112.6013 KOps/s $\color{#35bf28}+4.22\%$
test_creation_nested_1 32.8110μs 11.7469μs 85.1286 KOps/s 83.1161 KOps/s $\color{#35bf28}+2.42\%$
test_creation_nested_2 41.7480μs 15.5627μs 64.2560 KOps/s 64.0741 KOps/s $\color{#35bf28}+0.28\%$
test_clone 63.8900μs 13.0330μs 76.7284 KOps/s 72.7030 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_getitem[int] 34.4840μs 13.1768μs 75.8909 KOps/s 74.9631 KOps/s $\color{#35bf28}+1.24\%$
test_getitem[slice_int] 61.9260μs 25.9399μs 38.5507 KOps/s 38.4083 KOps/s $\color{#35bf28}+0.37\%$
test_getitem[range] 0.1022ms 44.3927μs 22.5262 KOps/s 22.0677 KOps/s $\color{#35bf28}+2.08\%$
test_getitem[tuple] 68.7390μs 20.0260μs 49.9352 KOps/s 48.5040 KOps/s $\color{#35bf28}+2.95\%$
test_getitem[list] 0.1069ms 40.2161μs 24.8656 KOps/s 24.5739 KOps/s $\color{#35bf28}+1.19\%$
test_setitem_dim[int] 51.7870μs 28.1146μs 35.5687 KOps/s 33.4296 KOps/s $\textbf{\color{#35bf28}+6.40\%}$
test_setitem_dim[slice_int] 80.5610μs 53.0344μs 18.8557 KOps/s 18.6405 KOps/s $\color{#35bf28}+1.15\%$
test_setitem_dim[range] 0.1095ms 69.6835μs 14.3506 KOps/s 13.5184 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_setitem_dim[tuple] 66.0830μs 41.2589μs 24.2372 KOps/s 23.5753 KOps/s $\color{#35bf28}+2.81\%$
test_setitem 0.1347ms 18.3469μs 54.5051 KOps/s 52.3642 KOps/s $\color{#35bf28}+4.09\%$
test_set 0.1048ms 17.7077μs 56.4727 KOps/s 53.4342 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_set_shared 3.3297ms 0.1443ms 6.9281 KOps/s 6.6625 KOps/s $\color{#35bf28}+3.99\%$
test_update 0.1722ms 19.2014μs 52.0796 KOps/s 48.5285 KOps/s $\textbf{\color{#35bf28}+7.32\%}$
test_update_nested 0.1163ms 26.7969μs 37.3177 KOps/s 34.9906 KOps/s $\textbf{\color{#35bf28}+6.65\%}$
test_set_nested 0.1045ms 19.4948μs 51.2958 KOps/s 48.4564 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_set_nested_new 0.1165ms 24.5027μs 40.8119 KOps/s 37.6817 KOps/s $\textbf{\color{#35bf28}+8.31\%}$
test_select 0.1525ms 50.3697μs 19.8532 KOps/s 19.3081 KOps/s $\color{#35bf28}+2.82\%$
test_unbind_speed 0.5041ms 0.3696ms 2.7058 KOps/s 2.6641 KOps/s $\color{#35bf28}+1.57\%$
test_unbind_speed_stack0 77.6024ms 4.8405ms 206.5923 Ops/s 215.0144 Ops/s $\color{#d91a1a}-3.92\%$
test_unbind_speed_stack1 1.9647μs 0.6467μs 1.5464 MOps/s 1.4957 MOps/s $\color{#35bf28}+3.39\%$
test_split 67.3025ms 1.7939ms 557.4445 Ops/s 539.8391 Ops/s $\color{#35bf28}+3.26\%$
test_chunk 69.5471ms 1.7569ms 569.1771 Ops/s 554.8260 Ops/s $\color{#35bf28}+2.59\%$
test_creation[device0] 0.7357ms 0.2995ms 3.3387 KOps/s 3.3473 KOps/s $\color{#d91a1a}-0.26\%$
test_creation_from_tensor 4.8286ms 0.3399ms 2.9423 KOps/s 2.9828 KOps/s $\color{#d91a1a}-1.36\%$
test_add_one[memmap_tensor0] 95.2280μs 25.8432μs 38.6949 KOps/s 38.7606 KOps/s $\color{#d91a1a}-0.17\%$
test_contiguous[memmap_tensor0] 24.6160μs 5.6784μs 176.1065 KOps/s 175.9022 KOps/s $\color{#35bf28}+0.12\%$
test_stack[memmap_tensor0] 0.1246ms 19.4475μs 51.4204 KOps/s 51.9102 KOps/s $\color{#d91a1a}-0.94\%$
test_memmaptd_index 0.2746ms 0.2013ms 4.9665 KOps/s 4.9756 KOps/s $\color{#d91a1a}-0.18\%$
test_memmaptd_index_astensor 0.3811ms 0.2624ms 3.8107 KOps/s 3.8317 KOps/s $\color{#d91a1a}-0.55\%$
test_memmaptd_index_op 0.6397ms 0.5149ms 1.9420 KOps/s 1.9159 KOps/s $\color{#35bf28}+1.36\%$
test_reshape_pytree 54.9520μs 22.6726μs 44.1061 KOps/s 43.2106 KOps/s $\color{#35bf28}+2.07\%$
test_reshape_td 0.1001ms 31.9581μs 31.2909 KOps/s 30.6752 KOps/s $\color{#35bf28}+2.01\%$
test_view_pytree 53.1800μs 22.6555μs 44.1394 KOps/s 43.0792 KOps/s $\color{#35bf28}+2.46\%$
test_view_td 27.7820μs 4.8826μs 204.8097 KOps/s 204.5462 KOps/s $\color{#35bf28}+0.13\%$
test_unbind_pytree 1.5129ms 26.3819μs 37.9048 KOps/s 38.3672 KOps/s $\color{#d91a1a}-1.21\%$
test_unbind_td 0.1220ms 59.1526μs 16.9054 KOps/s 16.3627 KOps/s $\color{#35bf28}+3.32\%$
test_split_pytree 70.8620μs 26.0919μs 38.3261 KOps/s 38.4914 KOps/s $\color{#d91a1a}-0.43\%$
test_split_td 0.1065ms 47.0455μs 21.2560 KOps/s 21.1228 KOps/s $\color{#35bf28}+0.63\%$
test_add_pytree 81.0010μs 32.4108μs 30.8539 KOps/s 31.2902 KOps/s $\color{#d91a1a}-1.39\%$
test_add_td 95.9280μs 46.6120μs 21.4537 KOps/s 20.8235 KOps/s $\color{#35bf28}+3.03\%$
test_distributed 24.3650μs 6.2148μs 160.9066 KOps/s 164.4790 KOps/s $\color{#d91a1a}-2.17\%$
test_tdmodule 0.1357ms 20.6830μs 48.3488 KOps/s 46.2319 KOps/s $\color{#35bf28}+4.58\%$
test_tdmodule_dispatch 0.2028ms 40.1882μs 24.8830 KOps/s 24.7973 KOps/s $\color{#35bf28}+0.35\%$
test_tdseq 56.9760μs 23.5455μs 42.4710 KOps/s 40.1976 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_tdseq_dispatch 0.4909ms 43.5487μs 22.9628 KOps/s 22.5591 KOps/s $\color{#35bf28}+1.79\%$
test_instantiation_functorch 1.3817ms 1.2620ms 792.3864 Ops/s 761.6305 Ops/s $\color{#35bf28}+4.04\%$
test_instantiation_td 2.1840ms 1.0061ms 993.9611 Ops/s 967.8178 Ops/s $\color{#35bf28}+2.70\%$
test_exec_functorch 0.2471ms 0.1566ms 6.3860 KOps/s 6.1369 KOps/s $\color{#35bf28}+4.06\%$
test_exec_functional_call 0.2240ms 0.1469ms 6.8061 KOps/s 6.6538 KOps/s $\color{#35bf28}+2.29\%$
test_exec_td 0.2255ms 0.1430ms 6.9937 KOps/s 6.6993 KOps/s $\color{#35bf28}+4.40\%$
test_exec_td_decorator 78.4193ms 0.2002ms 4.9950 KOps/s 5.3828 KOps/s $\textbf{\color{#d91a1a}-7.20\%}$
test_vmap_mlp_speed[True-True] 1.2088ms 0.8917ms 1.1215 KOps/s 1.1007 KOps/s $\color{#35bf28}+1.89\%$
test_vmap_mlp_speed[True-False] 0.6983ms 0.4650ms 2.1504 KOps/s 2.0878 KOps/s $\color{#35bf28}+3.00\%$
test_vmap_mlp_speed[False-True] 1.0660ms 0.7752ms 1.2900 KOps/s 1.2435 KOps/s $\color{#35bf28}+3.74\%$
test_vmap_mlp_speed[False-False] 0.5836ms 0.3890ms 2.5710 KOps/s 2.5353 KOps/s $\color{#35bf28}+1.41\%$
test_vmap_mlp_speed_decorator[True-True] 2.7496ms 1.7888ms 559.0249 Ops/s 544.0713 Ops/s $\color{#35bf28}+2.75\%$
test_vmap_mlp_speed_decorator[True-False] 1.1699ms 0.5139ms 1.9458 KOps/s 1.8694 KOps/s $\color{#35bf28}+4.09\%$
test_vmap_mlp_speed_decorator[False-True] 2.0782ms 1.4733ms 678.7629 Ops/s 638.5780 Ops/s $\textbf{\color{#35bf28}+6.29\%}$
test_vmap_mlp_speed_decorator[False-False] 1.2733ms 0.4019ms 2.4881 KOps/s 2.4594 KOps/s $\color{#35bf28}+1.17\%$

@vmoens vmoens changed the title [WI] Faster dispatch [WIP] Faster dispatch Jul 26, 2023
# Conflicts:
#	tensordict/nn/common.py
#	tensordict/tensordict.py
@vmoens vmoens changed the title [WIP] Faster dispatch [Performance] Faster dispatch Dec 4, 2023
Copy link

github-actions bot commented Dec 4, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.5137ms 12.7531μs 78.4123 KOps/s 78.5245 KOps/s $\color{#d91a1a}-0.14\%$
test_plain_set_stack_nested 0.1564ms 0.1159ms 8.6262 KOps/s 8.3392 KOps/s $\color{#35bf28}+3.44\%$
test_plain_set_nested_inplace 35.7100μs 14.0447μs 71.2011 KOps/s 70.6553 KOps/s $\color{#35bf28}+0.77\%$
test_plain_set_stack_nested_inplace 0.1721ms 0.1446ms 6.9136 KOps/s 6.8927 KOps/s $\color{#35bf28}+0.30\%$
test_items 60.4200μs 4.7425μs 210.8606 KOps/s 213.0502 KOps/s $\color{#d91a1a}-1.03\%$
test_items_nested 0.3832ms 0.3358ms 2.9779 KOps/s 2.9476 KOps/s $\color{#35bf28}+1.03\%$
test_items_nested_locked 0.3889ms 0.3408ms 2.9345 KOps/s 2.9129 KOps/s $\color{#35bf28}+0.74\%$
test_items_nested_leaf 0.2187ms 0.1984ms 5.0405 KOps/s 4.9850 KOps/s $\color{#35bf28}+1.11\%$
test_items_stack_nested 1.7790ms 1.4971ms 667.9760 Ops/s 670.9979 Ops/s $\color{#d91a1a}-0.45\%$
test_items_stack_nested_leaf 1.3719ms 1.3198ms 757.6999 Ops/s 763.0325 Ops/s $\color{#d91a1a}-0.70\%$
test_items_stack_nested_locked 0.8804ms 0.8397ms 1.1909 KOps/s 1.1831 KOps/s $\color{#35bf28}+0.66\%$
test_keys 19.7700μs 4.6033μs 217.2354 KOps/s 216.0365 KOps/s $\color{#35bf28}+0.55\%$
test_keys_nested 3.3810ms 90.6469μs 11.0318 KOps/s 11.1171 KOps/s $\color{#d91a1a}-0.77\%$
test_keys_nested_locked 0.1204ms 90.0355μs 11.1067 KOps/s 11.1980 KOps/s $\color{#d91a1a}-0.82\%$
test_keys_nested_leaf 41.1171ms 87.2258μs 11.4645 KOps/s 12.2559 KOps/s $\textbf{\color{#d91a1a}-6.46\%}$
test_keys_stack_nested 1.3705ms 1.2988ms 769.9358 Ops/s 759.5057 Ops/s $\color{#35bf28}+1.37\%$
test_keys_stack_nested_leaf 1.3339ms 1.2860ms 777.6083 Ops/s 775.1688 Ops/s $\color{#35bf28}+0.31\%$
test_keys_stack_nested_locked 0.6835ms 0.6386ms 1.5659 KOps/s 1.5579 KOps/s $\color{#35bf28}+0.51\%$
test_values 11.6567μs 1.8739μs 533.6542 KOps/s 527.7192 KOps/s $\color{#35bf28}+1.12\%$
test_values_nested 63.0410μs 43.3559μs 23.0649 KOps/s 23.1825 KOps/s $\color{#d91a1a}-0.51\%$
test_values_nested_locked 71.4210μs 45.7255μs 21.8696 KOps/s 21.9513 KOps/s $\color{#d91a1a}-0.37\%$
test_values_nested_leaf 58.4310μs 37.6098μs 26.5888 KOps/s 26.6280 KOps/s $\color{#d91a1a}-0.15\%$
test_values_stack_nested 1.1903ms 1.1546ms 866.1262 Ops/s 874.1020 Ops/s $\color{#d91a1a}-0.91\%$
test_values_stack_nested_leaf 1.2249ms 1.1322ms 883.2507 Ops/s 888.0929 Ops/s $\color{#d91a1a}-0.55\%$
test_values_stack_nested_locked 0.5473ms 0.5133ms 1.9483 KOps/s 1.9729 KOps/s $\color{#d91a1a}-1.25\%$
test_membership 5.1202μs 0.9479μs 1.0550 MOps/s 1.0326 MOps/s $\color{#35bf28}+2.16\%$
test_membership_nested 51.5510μs 2.2511μs 444.2255 KOps/s 445.8706 KOps/s $\color{#d91a1a}-0.37\%$
test_membership_nested_leaf 21.5305μs 2.1482μs 465.5068 KOps/s 465.2635 KOps/s $\color{#35bf28}+0.05\%$
test_membership_stacked_nested 53.7410μs 11.1687μs 89.5361 KOps/s 89.7294 KOps/s $\color{#d91a1a}-0.22\%$
test_membership_stacked_nested_leaf 61.8910μs 11.1228μs 89.9054 KOps/s 90.0321 KOps/s $\color{#d91a1a}-0.14\%$
test_membership_nested_last 31.2300μs 4.6532μs 214.9038 KOps/s 216.9416 KOps/s $\color{#d91a1a}-0.94\%$
test_membership_nested_leaf_last 31.5300μs 4.6738μs 213.9578 KOps/s 215.4698 KOps/s $\color{#d91a1a}-0.70\%$
test_membership_stacked_nested_last 0.1603ms 0.1345ms 7.4359 KOps/s 7.4054 KOps/s $\color{#35bf28}+0.41\%$
test_membership_stacked_nested_leaf_last 46.4400μs 13.0688μs 76.5183 KOps/s 77.2260 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_getleaf 56.2810μs 8.3933μs 119.1424 KOps/s 118.6573 KOps/s $\color{#35bf28}+0.41\%$
test_nested_get 23.3000μs 7.9481μs 125.8166 KOps/s 125.6936 KOps/s $\color{#35bf28}+0.10\%$
test_stacked_getleaf 0.5993ms 0.5702ms 1.7538 KOps/s 1.7798 KOps/s $\color{#d91a1a}-1.46\%$
test_stacked_get 0.7201ms 0.5291ms 1.8899 KOps/s 1.8945 KOps/s $\color{#d91a1a}-0.24\%$
test_nested_getitemleaf 30.9810μs 8.4792μs 117.9350 KOps/s 118.3127 KOps/s $\color{#d91a1a}-0.32\%$
test_nested_getitem 32.6800μs 8.0201μs 124.6874 KOps/s 125.1163 KOps/s $\color{#d91a1a}-0.34\%$
test_stacked_getitemleaf 0.6217ms 0.5633ms 1.7752 KOps/s 1.7780 KOps/s $\color{#d91a1a}-0.15\%$
test_stacked_getitem 0.5866ms 0.5315ms 1.8815 KOps/s 1.8997 KOps/s $\color{#d91a1a}-0.96\%$
test_lock_nested 3.1556ms 0.5502ms 1.8175 KOps/s 1.8188 KOps/s $\color{#d91a1a}-0.08\%$
test_lock_stack_nested 81.1771ms 7.1552ms 139.7585 Ops/s 138.2470 Ops/s $\color{#35bf28}+1.09\%$
test_unlock_nested 2.3154ms 0.4301ms 2.3249 KOps/s 2.3308 KOps/s $\color{#d91a1a}-0.25\%$
test_unlock_stack_nested 66.8682ms 6.2050ms 161.1606 Ops/s 163.1291 Ops/s $\color{#d91a1a}-1.21\%$
test_flatten_speed 0.2235ms 0.1860ms 5.3775 KOps/s 5.3501 KOps/s $\color{#35bf28}+0.51\%$
test_unflatten_speed 0.4021ms 0.3656ms 2.7349 KOps/s 2.7563 KOps/s $\color{#d91a1a}-0.78\%$
test_common_ops 1.1094ms 0.6193ms 1.6146 KOps/s 1.6127 KOps/s $\color{#35bf28}+0.12\%$
test_creation 51.3110μs 2.1167μs 472.4301 KOps/s 474.3712 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_empty 25.0200μs 7.2725μs 137.5042 KOps/s 139.5736 KOps/s $\color{#d91a1a}-1.48\%$
test_creation_nested_1 30.0900μs 9.6389μs 103.7462 KOps/s 105.4041 KOps/s $\color{#d91a1a}-1.57\%$
test_creation_nested_2 34.5300μs 12.3994μs 80.6488 KOps/s 82.4804 KOps/s $\color{#d91a1a}-2.22\%$
test_clone 86.5710μs 14.8482μs 67.3481 KOps/s 66.2146 KOps/s $\color{#35bf28}+1.71\%$
test_getitem[int] 32.4200μs 12.1158μs 82.5366 KOps/s 80.1325 KOps/s $\color{#35bf28}+3.00\%$
test_getitem[slice_int] 68.3000μs 23.6910μs 42.2101 KOps/s 40.4622 KOps/s $\color{#35bf28}+4.32\%$
test_getitem[range] 69.3700μs 42.4412μs 23.5620 KOps/s 24.0487 KOps/s $\color{#d91a1a}-2.02\%$
test_getitem[tuple] 59.5110μs 20.7867μs 48.1078 KOps/s 48.2770 KOps/s $\color{#d91a1a}-0.35\%$
test_getitem[list] 0.2847ms 38.3327μs 26.0874 KOps/s 26.5028 KOps/s $\color{#d91a1a}-1.57\%$
test_setitem_dim[int] 56.5820μs 28.1540μs 35.5190 KOps/s 37.0973 KOps/s $\color{#d91a1a}-4.25\%$
test_setitem_dim[slice_int] 67.0910μs 48.1196μs 20.7816 KOps/s 21.2626 KOps/s $\color{#d91a1a}-2.26\%$
test_setitem_dim[range] 83.3910μs 64.7650μs 15.4404 KOps/s 15.6291 KOps/s $\color{#d91a1a}-1.21\%$
test_setitem_dim[tuple] 57.9910μs 41.1404μs 24.3070 KOps/s 24.7801 KOps/s $\color{#d91a1a}-1.91\%$
test_setitem 83.3700μs 18.8871μs 52.9463 KOps/s 52.5325 KOps/s $\color{#35bf28}+0.79\%$
test_set 82.7210μs 18.4901μs 54.0830 KOps/s 53.2258 KOps/s $\color{#35bf28}+1.61\%$
test_set_shared 2.8528ms 0.1061ms 9.4288 KOps/s 8.5721 KOps/s $\textbf{\color{#35bf28}+9.99\%}$
test_update 0.1100ms 19.7668μs 50.5898 KOps/s 50.1241 KOps/s $\color{#35bf28}+0.93\%$
test_update_nested 85.1210μs 26.3728μs 37.9178 KOps/s 38.0203 KOps/s $\color{#d91a1a}-0.27\%$
test_set_nested 77.7010μs 19.4939μs 51.2981 KOps/s 47.5042 KOps/s $\textbf{\color{#35bf28}+7.99\%}$
test_set_nested_new 75.0910μs 23.9359μs 41.7782 KOps/s 38.8219 KOps/s $\textbf{\color{#35bf28}+7.61\%}$
test_select 0.1628ms 47.3357μs 21.1257 KOps/s 21.2659 KOps/s $\color{#d91a1a}-0.66\%$
test_to 76.0810μs 55.2496μs 18.0997 KOps/s 18.5178 KOps/s $\color{#d91a1a}-2.26\%$
test_to_nonblocking 0.1728ms 36.0692μs 27.7245 KOps/s 28.3106 KOps/s $\color{#d91a1a}-2.07\%$
test_unbind_speed 0.4036ms 0.3608ms 2.7713 KOps/s 2.8308 KOps/s $\color{#d91a1a}-2.10\%$
test_unbind_speed_stack0 62.3145ms 4.2709ms 234.1453 Ops/s 249.6919 Ops/s $\textbf{\color{#d91a1a}-6.23\%}$
test_unbind_speed_stack1 1.6065μs 0.5276μs 1.8954 MOps/s 1.9174 MOps/s $\color{#d91a1a}-1.15\%$
test_split 54.0970ms 1.7695ms 565.1360 Ops/s 568.3826 Ops/s $\color{#d91a1a}-0.57\%$
test_chunk 53.5724ms 1.7511ms 571.0843 Ops/s 573.1520 Ops/s $\color{#d91a1a}-0.36\%$
test_creation[device0] 0.5353ms 0.3103ms 3.2232 KOps/s 3.2419 KOps/s $\color{#d91a1a}-0.58\%$
test_creation[device1] 54.9833ms 0.3362ms 2.9748 KOps/s 3.2061 KOps/s $\textbf{\color{#d91a1a}-7.21\%}$
test_creation_from_tensor 0.6700ms 0.3381ms 2.9577 KOps/s 2.9667 KOps/s $\color{#d91a1a}-0.31\%$
test_add_one[memmap_tensor0] 0.1535ms 24.7970μs 40.3275 KOps/s 40.4631 KOps/s $\color{#d91a1a}-0.34\%$
test_add_one[memmap_tensor1] 0.2042ms 74.3051μs 13.4580 KOps/s 13.7965 KOps/s $\color{#d91a1a}-2.45\%$
test_contiguous[memmap_tensor0] 32.0900μs 6.1196μs 163.4092 KOps/s 167.1838 KOps/s $\color{#d91a1a}-2.26\%$
test_contiguous[memmap_tensor1] 0.1662ms 22.9552μs 43.5631 KOps/s 44.7876 KOps/s $\color{#d91a1a}-2.73\%$
test_stack[memmap_tensor0] 52.0590μs 20.1529μs 49.6206 KOps/s 50.0033 KOps/s $\color{#d91a1a}-0.77\%$
test_stack[memmap_tensor1] 0.1647ms 74.2727μs 13.4639 KOps/s 13.5986 KOps/s $\color{#d91a1a}-0.99\%$
test_memmaptd_index 0.2666ms 0.2416ms 4.1399 KOps/s 4.2706 KOps/s $\color{#d91a1a}-3.06\%$
test_memmaptd_index_astensor 0.3339ms 0.2959ms 3.3792 KOps/s 3.4285 KOps/s $\color{#d91a1a}-1.44\%$
test_memmaptd_index_op 0.6462ms 0.5885ms 1.6991 KOps/s 1.7267 KOps/s $\color{#d91a1a}-1.60\%$
test_reshape_pytree 63.3090μs 20.9864μs 47.6500 KOps/s 47.3137 KOps/s $\color{#35bf28}+0.71\%$
test_reshape_td 52.2290μs 31.1779μs 32.0740 KOps/s 32.3355 KOps/s $\color{#d91a1a}-0.81\%$
test_view_pytree 49.5190μs 20.6910μs 48.3301 KOps/s 48.0140 KOps/s $\color{#35bf28}+0.66\%$
test_view_td 17.3690μs 4.0640μs 246.0654 KOps/s 246.8623 KOps/s $\color{#d91a1a}-0.32\%$
test_unbind_pytree 46.4500μs 26.2314μs 38.1222 KOps/s 38.6842 KOps/s $\color{#d91a1a}-1.45\%$
test_unbind_td 89.0410μs 56.7894μs 17.6089 KOps/s 17.4791 KOps/s $\color{#35bf28}+0.74\%$
test_split_pytree 44.1600μs 24.6879μs 40.5056 KOps/s 41.0953 KOps/s $\color{#d91a1a}-1.43\%$
test_split_td 71.2610μs 44.1815μs 22.6339 KOps/s 22.3686 KOps/s $\color{#35bf28}+1.19\%$
test_add_pytree 88.7010μs 33.8527μs 29.5397 KOps/s 30.1407 KOps/s $\color{#d91a1a}-1.99\%$
test_add_td 0.1640ms 47.5740μs 21.0199 KOps/s 21.2159 KOps/s $\color{#d91a1a}-0.92\%$
test_distributed 22.3810μs 5.4322μs 184.0890 KOps/s 183.2855 KOps/s $\color{#35bf28}+0.44\%$
test_tdmodule 88.3210μs 17.2330μs 58.0282 KOps/s 59.8419 KOps/s $\color{#d91a1a}-3.03\%$
test_tdmodule_dispatch 0.2007ms 33.9577μs 29.4484 KOps/s 30.0871 KOps/s $\color{#d91a1a}-2.12\%$
test_tdseq 50.2100μs 20.4098μs 48.9961 KOps/s 50.6939 KOps/s $\color{#d91a1a}-3.35\%$
test_tdseq_dispatch 56.8100μs 36.9858μs 27.0374 KOps/s 27.5299 KOps/s $\color{#d91a1a}-1.79\%$
test_instantiation_functorch 1.7363ms 1.6881ms 592.3811 Ops/s 591.2192 Ops/s $\color{#35bf28}+0.20\%$
test_instantiation_td 1.6609ms 1.1927ms 838.4273 Ops/s 837.4719 Ops/s $\color{#35bf28}+0.11\%$
test_exec_functorch 0.2163ms 0.1612ms 6.2038 KOps/s 6.1525 KOps/s $\color{#35bf28}+0.83\%$
test_exec_functional_call 0.2102ms 0.1602ms 6.2426 KOps/s 6.2531 KOps/s $\color{#d91a1a}-0.17\%$
test_exec_td 0.1842ms 0.1479ms 6.7623 KOps/s 6.7717 KOps/s $\color{#d91a1a}-0.14\%$
test_exec_td_decorator 0.8268ms 0.1870ms 5.3479 KOps/s 5.3063 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_mlp_speed[True-True] 1.2408ms 1.0615ms 942.0459 Ops/s 953.3747 Ops/s $\color{#d91a1a}-1.19\%$
test_vmap_mlp_speed[True-False] 0.7547ms 0.6058ms 1.6508 KOps/s 1.6662 KOps/s $\color{#d91a1a}-0.92\%$
test_vmap_mlp_speed[False-True] 1.1140ms 0.9684ms 1.0327 KOps/s 1.0407 KOps/s $\color{#d91a1a}-0.77\%$
test_vmap_mlp_speed[False-False] 0.6589ms 0.5365ms 1.8639 KOps/s 1.8994 KOps/s $\color{#d91a1a}-1.87\%$
test_vmap_mlp_speed_decorator[True-True] 67.3704ms 2.1700ms 460.8278 Ops/s 498.7410 Ops/s $\textbf{\color{#d91a1a}-7.60\%}$
test_vmap_mlp_speed_decorator[True-False] 1.1773ms 0.6521ms 1.5335 KOps/s 1.5518 KOps/s $\color{#d91a1a}-1.18\%$
test_vmap_mlp_speed_decorator[False-True] 2.2404ms 1.7650ms 566.5815 Ops/s 575.9941 Ops/s $\color{#d91a1a}-1.63\%$
test_vmap_mlp_speed_decorator[False-False] 1.0433ms 0.5538ms 1.8058 KOps/s 1.8436 KOps/s $\color{#d91a1a}-2.05\%$
test_vmap_transformer_speed[True-True] 12.6508ms 12.5271ms 79.8270 Ops/s 81.4172 Ops/s $\color{#d91a1a}-1.95\%$
test_vmap_transformer_speed[True-False] 8.3593ms 8.2163ms 121.7087 Ops/s 123.9095 Ops/s $\color{#d91a1a}-1.78\%$
test_vmap_transformer_speed[False-True] 14.2216ms 12.5113ms 79.9276 Ops/s 81.5993 Ops/s $\color{#d91a1a}-2.05\%$
test_vmap_transformer_speed[False-False] 8.3206ms 8.1458ms 122.7623 Ops/s 125.1145 Ops/s $\color{#d91a1a}-1.88\%$
test_vmap_transformer_speed_decorator[True-True] 65.3222ms 64.2980ms 15.5526 Ops/s 14.6537 Ops/s $\textbf{\color{#35bf28}+6.13\%}$
test_vmap_transformer_speed_decorator[True-False] 98.4854ms 21.4509ms 46.6181 Ops/s 50.9391 Ops/s $\textbf{\color{#d91a1a}-8.48\%}$
test_vmap_transformer_speed_decorator[False-True] 59.5347ms 58.4805ms 17.0997 Ops/s 17.3815 Ops/s $\color{#d91a1a}-1.62\%$
test_vmap_transformer_speed_decorator[False-False] 21.6782ms 19.5073ms 51.2627 Ops/s 52.1782 Ops/s $\color{#d91a1a}-1.75\%$

@vmoens vmoens merged commit 08597bc into main Dec 4, 2023
41 of 45 checks passed
@vmoens vmoens deleted the faster_dispatch branch December 4, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants