Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Minor efficiency improvements #703

Merged
merged 1 commit into from
Mar 8, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 8, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2024
@vmoens vmoens marked this pull request as ready for review March 8, 2024 10:21
@vmoens vmoens merged commit be7c991 into main Mar 8, 2024
16 of 33 checks passed
Copy link

github-actions bot commented Mar 8, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}28$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 53.2790μs 16.7948μs 59.5421 KOps/s 59.3768 KOps/s $\color{#35bf28}+0.28\%$
test_plain_set_stack_nested 49.0590μs 16.8181μs 59.4598 KOps/s 59.8188 KOps/s $\color{#d91a1a}-0.60\%$
test_plain_set_nested_inplace 46.6270μs 19.1037μs 52.3457 KOps/s 52.5112 KOps/s $\color{#d91a1a}-0.32\%$
test_plain_set_stack_nested_inplace 59.3610μs 19.0214μs 52.5724 KOps/s 52.5418 KOps/s $\color{#35bf28}+0.06\%$
test_items 34.4040μs 2.3238μs 430.3339 KOps/s 429.5628 KOps/s $\color{#35bf28}+0.18\%$
test_items_nested 0.4760ms 0.2699ms 3.7055 KOps/s 3.7804 KOps/s $\color{#d91a1a}-1.98\%$
test_items_nested_locked 0.3463ms 0.2700ms 3.7035 KOps/s 3.7906 KOps/s $\color{#d91a1a}-2.30\%$
test_items_nested_leaf 0.3395ms 0.1654ms 6.0472 KOps/s 6.1905 KOps/s $\color{#d91a1a}-2.31\%$
test_items_stack_nested 0.3429ms 0.2711ms 3.6884 KOps/s 3.8277 KOps/s $\color{#d91a1a}-3.64\%$
test_items_stack_nested_leaf 0.3059ms 0.1663ms 6.0124 KOps/s 6.0949 KOps/s $\color{#d91a1a}-1.35\%$
test_items_stack_nested_locked 0.3726ms 0.2712ms 3.6874 KOps/s 3.8388 KOps/s $\color{#d91a1a}-3.94\%$
test_keys 29.3440μs 3.7209μs 268.7534 KOps/s 267.0487 KOps/s $\color{#35bf28}+0.64\%$
test_keys_nested 0.8200ms 0.1439ms 6.9501 KOps/s 6.9868 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_nested_locked 0.2412ms 0.1464ms 6.8286 KOps/s 6.7330 KOps/s $\color{#35bf28}+1.42\%$
test_keys_nested_leaf 37.2150ms 0.1300ms 7.6947 KOps/s 8.0764 KOps/s $\color{#d91a1a}-4.73\%$
test_keys_stack_nested 0.2113ms 0.1437ms 6.9614 KOps/s 6.8573 KOps/s $\color{#35bf28}+1.52\%$
test_keys_stack_nested_leaf 0.2240ms 0.1275ms 7.8420 KOps/s 7.7884 KOps/s $\color{#35bf28}+0.69\%$
test_keys_stack_nested_locked 0.2172ms 0.1533ms 6.5247 KOps/s 6.7255 KOps/s $\color{#d91a1a}-2.99\%$
test_values 6.8446μs 1.1218μs 891.4559 KOps/s 880.9146 KOps/s $\color{#35bf28}+1.20\%$
test_values_nested 96.0890μs 50.8660μs 19.6595 KOps/s 20.2135 KOps/s $\color{#d91a1a}-2.74\%$
test_values_nested_locked 95.1760μs 50.6384μs 19.7478 KOps/s 19.6699 KOps/s $\color{#35bf28}+0.40\%$
test_values_nested_leaf 84.9880μs 45.7654μs 21.8506 KOps/s 22.0703 KOps/s $\color{#d91a1a}-1.00\%$
test_values_stack_nested 0.1005ms 51.0894μs 19.5735 KOps/s 19.9743 KOps/s $\color{#d91a1a}-2.01\%$
test_values_stack_nested_leaf 93.1330μs 45.5138μs 21.9714 KOps/s 22.3772 KOps/s $\color{#d91a1a}-1.81\%$
test_values_stack_nested_locked 0.1183ms 51.7904μs 19.3086 KOps/s 19.5553 KOps/s $\color{#d91a1a}-1.26\%$
test_membership 28.9730μs 1.3292μs 752.3521 KOps/s 769.7910 KOps/s $\color{#d91a1a}-2.27\%$
test_membership_nested 27.7020μs 3.3711μs 296.6392 KOps/s 291.9265 KOps/s $\color{#35bf28}+1.61\%$
test_membership_nested_leaf 34.6040μs 3.4143μs 292.8817 KOps/s 289.8442 KOps/s $\color{#35bf28}+1.05\%$
test_membership_stacked_nested 26.9800μs 3.4044μs 293.7408 KOps/s 241.0081 KOps/s $\textbf{\color{#35bf28}+21.88\%}$
test_membership_stacked_nested_leaf 26.1180μs 3.3952μs 294.5316 KOps/s 283.1872 KOps/s $\color{#35bf28}+4.01\%$
test_membership_nested_last 33.5620μs 4.1875μs 238.8080 KOps/s 233.0439 KOps/s $\color{#35bf28}+2.47\%$
test_membership_nested_leaf_last 36.0870μs 4.2018μs 237.9932 KOps/s 235.0371 KOps/s $\color{#35bf28}+1.26\%$
test_membership_stacked_nested_last 23.5830μs 4.1911μs 238.6009 KOps/s 240.3064 KOps/s $\color{#d91a1a}-0.71\%$
test_membership_stacked_nested_leaf_last 32.7110μs 4.1781μs 239.3413 KOps/s 237.6010 KOps/s $\color{#35bf28}+0.73\%$
test_nested_getleaf 47.2370μs 10.5358μs 94.9145 KOps/s 97.3747 KOps/s $\color{#d91a1a}-2.53\%$
test_nested_get 41.0060μs 9.8264μs 101.7662 KOps/s 100.5304 KOps/s $\color{#35bf28}+1.23\%$
test_stacked_getleaf 51.7540μs 10.9491μs 91.3314 KOps/s 94.1009 KOps/s $\color{#d91a1a}-2.94\%$
test_stacked_get 27.3410μs 9.9294μs 100.7115 KOps/s 103.9720 KOps/s $\color{#d91a1a}-3.14\%$
test_nested_getitemleaf 35.4360μs 10.9399μs 91.4085 KOps/s 91.2843 KOps/s $\color{#35bf28}+0.14\%$
test_nested_getitem 30.7370μs 10.4969μs 95.2663 KOps/s 96.6390 KOps/s $\color{#d91a1a}-1.42\%$
test_stacked_getitemleaf 30.7770μs 10.9587μs 91.2518 KOps/s 91.8325 KOps/s $\color{#d91a1a}-0.63\%$
test_stacked_getitem 45.3540μs 10.3934μs 96.2151 KOps/s 96.8220 KOps/s $\color{#d91a1a}-0.63\%$
test_lock_nested 0.7058ms 0.3246ms 3.0808 KOps/s 3.0284 KOps/s $\color{#35bf28}+1.73\%$
test_lock_stack_nested 0.4135ms 0.2957ms 3.3815 KOps/s 3.3492 KOps/s $\color{#35bf28}+0.96\%$
test_unlock_nested 74.9622ms 0.4062ms 2.4621 KOps/s 2.4357 KOps/s $\color{#35bf28}+1.08\%$
test_unlock_stack_nested 0.5010ms 0.3047ms 3.2814 KOps/s 3.2523 KOps/s $\color{#35bf28}+0.89\%$
test_flatten_speed 0.5951ms 0.2832ms 3.5315 KOps/s 3.6646 KOps/s $\color{#d91a1a}-3.63\%$
test_unflatten_speed 0.6101ms 0.4025ms 2.4846 KOps/s 2.5987 KOps/s $\color{#d91a1a}-4.39\%$
test_common_ops 4.9187ms 0.7067ms 1.4150 KOps/s 1.4967 KOps/s $\textbf{\color{#d91a1a}-5.46\%}$
test_creation 16.3200μs 1.7963μs 556.7074 KOps/s 570.5859 KOps/s $\color{#d91a1a}-2.43\%$
test_creation_empty 48.0990μs 10.9198μs 91.5770 KOps/s 105.6021 KOps/s $\textbf{\color{#d91a1a}-13.28\%}$
test_creation_nested_1 36.2770μs 13.4935μs 74.1098 KOps/s 80.7965 KOps/s $\textbf{\color{#d91a1a}-8.28\%}$
test_creation_nested_2 44.8630μs 16.5979μs 60.2487 KOps/s 63.7997 KOps/s $\textbf{\color{#d91a1a}-5.57\%}$
test_clone 59.5810μs 12.7096μs 78.6806 KOps/s 76.9423 KOps/s $\color{#35bf28}+2.26\%$
test_getitem[int] 37.7200μs 10.6934μs 93.5153 KOps/s 90.8638 KOps/s $\color{#35bf28}+2.92\%$
test_getitem[slice_int] 60.3920μs 22.0614μs 45.3280 KOps/s 44.9791 KOps/s $\color{#35bf28}+0.78\%$
test_getitem[range] 0.1532ms 43.4105μs 23.0359 KOps/s 24.1646 KOps/s $\color{#d91a1a}-4.67\%$
test_getitem[tuple] 49.3120μs 18.0811μs 55.3065 KOps/s 53.9962 KOps/s $\color{#35bf28}+2.43\%$
test_getitem[list] 0.1613ms 37.4906μs 26.6734 KOps/s 27.8949 KOps/s $\color{#d91a1a}-4.38\%$
test_setitem_dim[int] 67.1940μs 36.2458μs 27.5894 KOps/s 31.1813 KOps/s $\textbf{\color{#d91a1a}-11.52\%}$
test_setitem_dim[slice_int] 0.1030ms 63.1862μs 15.8262 KOps/s 17.4055 KOps/s $\textbf{\color{#d91a1a}-9.07\%}$
test_setitem_dim[range] 0.1147ms 83.2869μs 12.0067 KOps/s 12.6599 KOps/s $\textbf{\color{#d91a1a}-5.16\%}$
test_setitem_dim[tuple] 87.0320μs 51.5092μs 19.4140 KOps/s 20.3265 KOps/s $\color{#d91a1a}-4.49\%$
test_setitem 73.4970μs 20.0347μs 49.9133 KOps/s 53.5022 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_set 70.9820μs 19.4956μs 51.2935 KOps/s 55.0577 KOps/s $\textbf{\color{#d91a1a}-6.84\%}$
test_set_shared 3.4358ms 0.1388ms 7.2039 KOps/s 7.1631 KOps/s $\color{#35bf28}+0.57\%$
test_update 87.7030μs 22.8901μs 43.6869 KOps/s 46.3136 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_update_nested 84.5670μs 29.7726μs 33.5879 KOps/s 35.6086 KOps/s $\textbf{\color{#d91a1a}-5.67\%}$
test_set_nested 83.8750μs 21.5288μs 46.4494 KOps/s 49.7746 KOps/s $\textbf{\color{#d91a1a}-6.68\%}$
test_set_nested_new 60.9830μs 24.8120μs 40.3031 KOps/s 42.3334 KOps/s $\color{#d91a1a}-4.80\%$
test_select 86.7010μs 39.1428μs 25.5475 KOps/s 26.8230 KOps/s $\color{#d91a1a}-4.76\%$
test_select_nested 0.1586ms 58.1795μs 17.1882 KOps/s 17.4385 KOps/s $\color{#d91a1a}-1.44\%$
test_exclude_nested 0.2672ms 0.1168ms 8.5590 KOps/s 8.8690 KOps/s $\color{#d91a1a}-3.49\%$
test_empty[True] 0.7846ms 0.4040ms 2.4754 KOps/s 2.4846 KOps/s $\color{#d91a1a}-0.37\%$
test_empty[False] 5.0374μs 1.0205μs 979.9268 KOps/s 1.0069 MOps/s $\color{#d91a1a}-2.68\%$
test_unbind_speed 0.5116ms 0.2383ms 4.1959 KOps/s 4.1918 KOps/s $\color{#35bf28}+0.10\%$
test_unbind_speed_stack0 0.3919ms 0.2385ms 4.1926 KOps/s 4.1322 KOps/s $\color{#35bf28}+1.46\%$
test_unbind_speed_stack1 0.1211s 0.6744ms 1.4827 KOps/s 1.4973 KOps/s $\color{#d91a1a}-0.97\%$
test_split 0.1052s 1.5880ms 629.7209 Ops/s 629.3406 Ops/s $\color{#35bf28}+0.06\%$
test_chunk 1.5112ms 1.3812ms 723.9841 Ops/s 714.4765 Ops/s $\color{#35bf28}+1.33\%$
test_creation[device0] 3.3523ms 0.1024ms 9.7633 KOps/s 9.8519 KOps/s $\color{#d91a1a}-0.90\%$
test_creation_from_tensor 0.1711ms 80.5514μs 12.4144 KOps/s 12.2989 KOps/s $\color{#35bf28}+0.94\%$
test_add_one[memmap_tensor0] 92.6520μs 5.2961μs 188.8179 KOps/s 188.1702 KOps/s $\color{#35bf28}+0.34\%$
test_contiguous[memmap_tensor0] 10.8910μs 0.6134μs 1.6302 MOps/s 1.6020 MOps/s $\color{#35bf28}+1.76\%$
test_stack[memmap_tensor0] 21.0890μs 3.4342μs 291.1913 KOps/s 269.0497 KOps/s $\textbf{\color{#35bf28}+8.23\%}$
test_memmaptd_index 1.1414ms 0.2298ms 4.3517 KOps/s 4.1371 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_memmaptd_index_astensor 0.6284ms 0.2922ms 3.4218 KOps/s 3.3254 KOps/s $\color{#35bf28}+2.90\%$
test_memmaptd_index_op 1.1774ms 0.5923ms 1.6884 KOps/s 1.7039 KOps/s $\color{#d91a1a}-0.91\%$
test_serialize_model 0.2081s 0.1115s 8.9663 Ops/s 8.8171 Ops/s $\color{#35bf28}+1.69\%$
test_serialize_model_pickle 0.4524s 0.3753s 2.6643 Ops/s 2.6335 Ops/s $\color{#35bf28}+1.17\%$
test_serialize_weights 0.1085s 0.1003s 9.9728 Ops/s 10.2583 Ops/s $\color{#d91a1a}-2.78\%$
test_serialize_weights_returnearly 0.2344s 0.1314s 7.6086 Ops/s 7.2721 Ops/s $\color{#35bf28}+4.63\%$
test_serialize_weights_pickle 1.0109s 0.5860s 1.7065 Ops/s 2.4168 Ops/s $\textbf{\color{#d91a1a}-29.39\%}$
test_serialize_weights_filesystem 0.1019s 91.5194ms 10.9266 Ops/s 10.6056 Ops/s $\color{#35bf28}+3.03\%$
test_serialize_model_filesystem 95.2376ms 89.9942ms 11.1118 Ops/s 10.7953 Ops/s $\color{#35bf28}+2.93\%$
test_reshape_pytree 45.2840μs 20.6860μs 48.3420 KOps/s 50.3734 KOps/s $\color{#d91a1a}-4.03\%$
test_reshape_td 81.8830μs 30.8450μs 32.4202 KOps/s 33.2837 KOps/s $\color{#d91a1a}-2.59\%$
test_view_pytree 67.8360μs 20.6098μs 48.5206 KOps/s 50.1535 KOps/s $\color{#d91a1a}-3.26\%$
test_view_td 0.1202s 58.1472μs 17.1977 KOps/s 17.9954 KOps/s $\color{#d91a1a}-4.43\%$
test_unbind_pytree 71.5730μs 24.2345μs 41.2635 KOps/s 44.7255 KOps/s $\textbf{\color{#d91a1a}-7.74\%}$
test_unbind_td 0.4184ms 35.6790μs 28.0277 KOps/s 26.5691 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_split_pytree 69.3490μs 23.8176μs 41.9857 KOps/s 43.5307 KOps/s $\color{#d91a1a}-3.55\%$
test_split_td 0.1480ms 38.9018μs 25.7058 KOps/s 26.1930 KOps/s $\color{#d91a1a}-1.86\%$
test_add_pytree 66.8140μs 29.5203μs 33.8750 KOps/s 35.5674 KOps/s $\color{#d91a1a}-4.76\%$
test_add_td 0.1644ms 56.0561μs 17.8393 KOps/s 19.6850 KOps/s $\textbf{\color{#d91a1a}-9.38\%}$
test_distributed 0.3782ms 98.5473μs 10.1474 KOps/s 10.3650 KOps/s $\color{#d91a1a}-2.10\%$
test_tdmodule 68.4770μs 18.1236μs 55.1766 KOps/s 58.1555 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_tdmodule_dispatch 60.7220μs 34.1705μs 29.2650 KOps/s 31.7089 KOps/s $\textbf{\color{#d91a1a}-7.71\%}$
test_tdseq 49.4720μs 20.8243μs 48.0209 KOps/s 52.0778 KOps/s $\textbf{\color{#d91a1a}-7.79\%}$
test_tdseq_dispatch 72.6850μs 39.7685μs 25.1455 KOps/s 26.4804 KOps/s $\textbf{\color{#d91a1a}-5.04\%}$
test_instantiation_functorch 2.0346ms 1.3031ms 767.4175 Ops/s 790.6928 Ops/s $\color{#d91a1a}-2.94\%$
test_instantiation_td 6.3793ms 0.9908ms 1.0093 KOps/s 1.0459 KOps/s $\color{#d91a1a}-3.50\%$
test_exec_functorch 0.2480ms 0.1574ms 6.3537 KOps/s 6.4934 KOps/s $\color{#d91a1a}-2.15\%$
test_exec_functional_call 0.2908ms 0.1496ms 6.6836 KOps/s 7.2649 KOps/s $\textbf{\color{#d91a1a}-8.00\%}$
test_exec_td 0.3435ms 0.1415ms 7.0648 KOps/s 7.3531 KOps/s $\color{#d91a1a}-3.92\%$
test_exec_td_decorator 0.5375ms 0.1941ms 5.1531 KOps/s 5.4406 KOps/s $\textbf{\color{#d91a1a}-5.28\%}$
test_vmap_mlp_speed[True-True] 0.6527ms 0.4758ms 2.1018 KOps/s 2.2168 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_vmap_mlp_speed[True-False] 0.8998ms 0.4759ms 2.1012 KOps/s 2.2206 KOps/s $\textbf{\color{#d91a1a}-5.38\%}$
test_vmap_mlp_speed[False-True] 2.8308ms 0.3911ms 2.5568 KOps/s 2.7174 KOps/s $\textbf{\color{#d91a1a}-5.91\%}$
test_vmap_mlp_speed[False-False] 0.6258ms 0.3854ms 2.5944 KOps/s 2.7411 KOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_vmap_mlp_speed_decorator[True-True] 0.9554ms 0.4963ms 2.0147 KOps/s 2.1279 KOps/s $\textbf{\color{#d91a1a}-5.32\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7598ms 0.4954ms 2.0187 KOps/s 2.1263 KOps/s $\textbf{\color{#d91a1a}-5.06\%}$
test_vmap_mlp_speed_decorator[False-True] 0.5445ms 0.4004ms 2.4977 KOps/s 2.6137 KOps/s $\color{#d91a1a}-4.44\%$
test_vmap_mlp_speed_decorator[False-False] 0.7839ms 0.4021ms 2.4871 KOps/s 2.6271 KOps/s $\textbf{\color{#d91a1a}-5.33\%}$
test_to_module_speed[True] 2.1659ms 1.3673ms 731.3583 Ops/s 756.6089 Ops/s $\color{#d91a1a}-3.34\%$
test_to_module_speed[False] 1.8075ms 1.3624ms 733.9827 Ops/s 746.4541 Ops/s $\color{#d91a1a}-1.67\%$

@vmoens vmoens deleted the minor-efficiency branch March 8, 2024 14:36
vmoens added a commit that referenced this pull request Mar 24, 2024
vmoens added a commit that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants