Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] consistent use of non_blocking in tensordict and torch.Tensor #734

Merged
merged 2 commits into from
Apr 18, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 18, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 18, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 46.0660μs 16.0714μs 62.2224 KOps/s 60.1791 KOps/s $\color{#35bf28}+3.40\%$
test_plain_set_stack_nested 38.6630μs 15.9613μs 62.6517 KOps/s 58.9656 KOps/s $\textbf{\color{#35bf28}+6.25\%}$
test_plain_set_nested_inplace 51.2960μs 18.7123μs 53.4408 KOps/s 52.3316 KOps/s $\color{#35bf28}+2.12\%$
test_plain_set_stack_nested_inplace 40.0650μs 19.0158μs 52.5877 KOps/s 51.9327 KOps/s $\color{#35bf28}+1.26\%$
test_items 28.6340μs 2.4990μs 400.1585 KOps/s 392.8515 KOps/s $\color{#35bf28}+1.86\%$
test_items_nested 0.4895ms 0.2690ms 3.7170 KOps/s 3.7101 KOps/s $\color{#35bf28}+0.19\%$
test_items_nested_locked 6.6776ms 0.2693ms 3.7130 KOps/s 3.6773 KOps/s $\color{#35bf28}+0.97\%$
test_items_nested_leaf 0.1488ms 76.7337μs 13.0321 KOps/s 12.7672 KOps/s $\color{#35bf28}+2.07\%$
test_items_stack_nested 1.3256ms 0.2715ms 3.6830 KOps/s 3.6542 KOps/s $\color{#35bf28}+0.79\%$
test_items_stack_nested_leaf 0.1525ms 78.5861μs 12.7249 KOps/s 13.1298 KOps/s $\color{#d91a1a}-3.08\%$
test_items_stack_nested_locked 0.3991ms 0.2722ms 3.6743 KOps/s 3.6420 KOps/s $\color{#35bf28}+0.89\%$
test_keys 32.2300μs 3.9301μs 254.4457 KOps/s 253.7211 KOps/s $\color{#35bf28}+0.29\%$
test_keys_nested 0.2377ms 0.1358ms 7.3632 KOps/s 7.3512 KOps/s $\color{#35bf28}+0.16\%$
test_keys_nested_locked 0.8783ms 0.1408ms 7.1007 KOps/s 7.1427 KOps/s $\color{#d91a1a}-0.59\%$
test_keys_nested_leaf 0.1984ms 0.1152ms 8.6780 KOps/s 8.7017 KOps/s $\color{#d91a1a}-0.27\%$
test_keys_stack_nested 0.2581ms 0.1370ms 7.3008 KOps/s 7.4877 KOps/s $\color{#d91a1a}-2.50\%$
test_keys_stack_nested_leaf 0.1944ms 0.1149ms 8.7055 KOps/s 8.8714 KOps/s $\color{#d91a1a}-1.87\%$
test_keys_stack_nested_locked 0.2690ms 0.1410ms 7.0911 KOps/s 7.2576 KOps/s $\color{#d91a1a}-2.29\%$
test_values 7.7620μs 1.1585μs 863.1550 KOps/s 862.4270 KOps/s $\color{#35bf28}+0.08\%$
test_values_nested 0.1050ms 50.6321μs 19.7503 KOps/s 19.5438 KOps/s $\color{#35bf28}+1.06\%$
test_values_nested_locked 91.2610μs 50.5478μs 19.7832 KOps/s 19.6746 KOps/s $\color{#35bf28}+0.55\%$
test_values_nested_leaf 93.5150μs 45.7974μs 21.8353 KOps/s 21.8742 KOps/s $\color{#d91a1a}-0.18\%$
test_values_stack_nested 0.1432ms 51.7879μs 19.3095 KOps/s 19.2234 KOps/s $\color{#35bf28}+0.45\%$
test_values_stack_nested_leaf 0.1004ms 45.7943μs 21.8368 KOps/s 22.2885 KOps/s $\color{#d91a1a}-2.03\%$
test_values_stack_nested_locked 83.7670μs 51.3221μs 19.4848 KOps/s 19.4272 KOps/s $\color{#35bf28}+0.30\%$
test_membership 17.1720μs 1.3590μs 735.8101 KOps/s 762.7415 KOps/s $\color{#d91a1a}-3.53\%$
test_membership_nested 28.9140μs 3.4539μs 289.5249 KOps/s 288.9927 KOps/s $\color{#35bf28}+0.18\%$
test_membership_nested_leaf 43.7820μs 3.4698μs 288.1975 KOps/s 282.4631 KOps/s $\color{#35bf28}+2.03\%$
test_membership_stacked_nested 29.4950μs 3.4930μs 286.2843 KOps/s 281.2370 KOps/s $\color{#35bf28}+1.79\%$
test_membership_stacked_nested_leaf 32.0000μs 3.4276μs 291.7473 KOps/s 289.8649 KOps/s $\color{#35bf28}+0.65\%$
test_membership_nested_last 29.2550μs 4.2474μs 235.4355 KOps/s 234.3747 KOps/s $\color{#35bf28}+0.45\%$
test_membership_nested_leaf_last 56.4390μs 4.2939μs 232.8889 KOps/s 231.6506 KOps/s $\color{#35bf28}+0.53\%$
test_membership_stacked_nested_last 28.3530μs 7.2851μs 137.2668 KOps/s 73.9942 KOps/s $\textbf{\color{#35bf28}+85.51\%}$
test_membership_stacked_nested_leaf_last 43.5820μs 7.3169μs 136.6702 KOps/s 74.1721 KOps/s $\textbf{\color{#35bf28}+84.26\%}$
test_nested_getleaf 40.3460μs 10.7366μs 93.1393 KOps/s 92.5121 KOps/s $\color{#35bf28}+0.68\%$
test_nested_get 32.0700μs 10.0224μs 99.7768 KOps/s 96.6559 KOps/s $\color{#35bf28}+3.23\%$
test_stacked_getleaf 44.4640μs 10.6192μs 94.1689 KOps/s 94.4023 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_get 95.0380μs 10.1014μs 98.9959 KOps/s 98.3904 KOps/s $\color{#35bf28}+0.62\%$
test_nested_getitemleaf 48.2500μs 11.2256μs 89.0819 KOps/s 87.9032 KOps/s $\color{#35bf28}+1.34\%$
test_nested_getitem 37.2190μs 10.3996μs 96.1577 KOps/s 96.2683 KOps/s $\color{#d91a1a}-0.11\%$
test_stacked_getitemleaf 29.5860μs 11.1651μs 89.5651 KOps/s 88.5772 KOps/s $\color{#35bf28}+1.12\%$
test_stacked_getitem 94.5070μs 10.4661μs 95.5462 KOps/s 92.4303 KOps/s $\color{#35bf28}+3.37\%$
test_lock_nested 50.7443ms 0.3951ms 2.5308 KOps/s 2.9129 KOps/s $\textbf{\color{#d91a1a}-13.12\%}$
test_lock_stack_nested 0.4254ms 0.3025ms 3.3053 KOps/s 3.4111 KOps/s $\color{#d91a1a}-3.10\%$
test_unlock_nested 99.8771ms 0.4499ms 2.2227 KOps/s 2.2003 KOps/s $\color{#35bf28}+1.02\%$
test_unlock_stack_nested 0.4525ms 0.3123ms 3.2016 KOps/s 3.2945 KOps/s $\color{#d91a1a}-2.82\%$
test_flatten_speed 0.5859ms 92.5311μs 10.8072 KOps/s 10.8323 KOps/s $\color{#d91a1a}-0.23\%$
test_unflatten_speed 0.6103ms 0.4097ms 2.4409 KOps/s 2.4842 KOps/s $\color{#d91a1a}-1.74\%$
test_common_ops 4.8074ms 0.6764ms 1.4785 KOps/s 1.4060 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_creation 14.6780μs 1.9086μs 523.9375 KOps/s 537.1130 KOps/s $\color{#d91a1a}-2.45\%$
test_creation_empty 33.4820μs 8.7742μs 113.9710 KOps/s 99.6015 KOps/s $\textbf{\color{#35bf28}+14.43\%}$
test_creation_nested_1 40.3060μs 11.6237μs 86.0311 KOps/s 79.3169 KOps/s $\textbf{\color{#35bf28}+8.47\%}$
test_creation_nested_2 41.2570μs 14.8691μs 67.2534 KOps/s 61.4990 KOps/s $\textbf{\color{#35bf28}+9.36\%}$
test_clone 0.1097ms 13.6995μs 72.9953 KOps/s 73.7388 KOps/s $\color{#d91a1a}-1.01\%$
test_getitem[int] 29.6250μs 11.4072μs 87.6642 KOps/s 86.9499 KOps/s $\color{#35bf28}+0.82\%$
test_getitem[slice_int] 93.9290μs 22.7518μs 43.9525 KOps/s 42.7840 KOps/s $\color{#35bf28}+2.73\%$
test_getitem[range] 82.2040μs 42.5656μs 23.4932 KOps/s 23.6840 KOps/s $\color{#d91a1a}-0.81\%$
test_getitem[tuple] 63.1480μs 18.8821μs 52.9602 KOps/s 53.6705 KOps/s $\color{#d91a1a}-1.32\%$
test_getitem[list] 0.4537ms 38.3524μs 26.0740 KOps/s 26.0151 KOps/s $\color{#35bf28}+0.23\%$
test_setitem_dim[int] 67.1860μs 32.3917μs 30.8721 KOps/s 28.8167 KOps/s $\textbf{\color{#35bf28}+7.13\%}$
test_setitem_dim[slice_int] 0.1050ms 57.0606μs 17.5252 KOps/s 16.2082 KOps/s $\textbf{\color{#35bf28}+8.13\%}$
test_setitem_dim[range] 0.1391ms 75.3641μs 13.2689 KOps/s 12.7165 KOps/s $\color{#35bf28}+4.34\%$
test_setitem_dim[tuple] 91.6820μs 48.2449μs 20.7276 KOps/s 20.2070 KOps/s $\color{#35bf28}+2.58\%$
test_setitem 0.1495ms 19.3586μs 51.6565 KOps/s 48.5863 KOps/s $\textbf{\color{#35bf28}+6.32\%}$
test_set 0.1772ms 19.1330μs 52.2658 KOps/s 49.2695 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_set_shared 1.6746ms 0.1416ms 7.0620 KOps/s 6.9757 KOps/s $\color{#35bf28}+1.24\%$
test_update 0.1423ms 19.7734μs 50.5729 KOps/s 44.9133 KOps/s $\textbf{\color{#35bf28}+12.60\%}$
test_update_nested 0.1542ms 27.6864μs 36.1188 KOps/s 32.5868 KOps/s $\textbf{\color{#35bf28}+10.84\%}$
test_update__nested 0.1385ms 26.0721μs 38.3551 KOps/s 40.3502 KOps/s $\color{#d91a1a}-4.94\%$
test_set_nested 0.1208ms 20.1564μs 49.6121 KOps/s 45.1859 KOps/s $\textbf{\color{#35bf28}+9.80\%}$
test_set_nested_new 0.1459ms 24.5539μs 40.7267 KOps/s 38.5447 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_select 1.1191ms 39.3876μs 25.3887 KOps/s 24.8411 KOps/s $\color{#35bf28}+2.20\%$
test_select_nested 0.1194ms 59.3850μs 16.8393 KOps/s 16.6388 KOps/s $\color{#35bf28}+1.20\%$
test_exclude_nested 0.2700ms 0.1199ms 8.3425 KOps/s 8.4730 KOps/s $\color{#d91a1a}-1.54\%$
test_empty[True] 0.5607ms 0.3903ms 2.5620 KOps/s 2.5637 KOps/s $\color{#d91a1a}-0.07\%$
test_empty[False] 9.6030μs 1.0612μs 942.2968 KOps/s 959.8328 KOps/s $\color{#d91a1a}-1.83\%$
test_unbind_speed 1.8589ms 0.2507ms 3.9890 KOps/s 4.0156 KOps/s $\color{#d91a1a}-0.66\%$
test_unbind_speed_stack0 0.4808ms 0.2440ms 4.0985 KOps/s 4.1840 KOps/s $\color{#d91a1a}-2.04\%$
test_unbind_speed_stack1 0.1297s 0.6927ms 1.4436 KOps/s 1.4856 KOps/s $\color{#d91a1a}-2.83\%$
test_split 1.7122ms 1.4950ms 668.9110 Ops/s 576.0954 Ops/s $\textbf{\color{#35bf28}+16.11\%}$
test_chunk 0.1318s 1.6966ms 589.4188 Ops/s 663.9852 Ops/s $\textbf{\color{#d91a1a}-11.23\%}$
test_creation[device0] 0.2467ms 0.1020ms 9.8000 KOps/s 9.3962 KOps/s $\color{#35bf28}+4.30\%$
test_creation_from_tensor 5.7008ms 83.0387μs 12.0426 KOps/s 11.8533 KOps/s $\color{#35bf28}+1.60\%$
test_add_one[memmap_tensor0] 0.1032ms 5.5758μs 179.3458 KOps/s 171.6828 KOps/s $\color{#35bf28}+4.46\%$
test_contiguous[memmap_tensor0] 7.8950μs 0.6455μs 1.5493 MOps/s 1.5418 MOps/s $\color{#35bf28}+0.48\%$
test_stack[memmap_tensor0] 39.7040μs 3.5487μs 281.7901 KOps/s 271.3755 KOps/s $\color{#35bf28}+3.84\%$
test_memmaptd_index 0.9921ms 0.2443ms 4.0939 KOps/s 4.1867 KOps/s $\color{#d91a1a}-2.22\%$
test_memmaptd_index_astensor 0.7291ms 0.3080ms 3.2464 KOps/s 3.3141 KOps/s $\color{#d91a1a}-2.04\%$
test_memmaptd_index_op 1.1796ms 0.5805ms 1.7226 KOps/s 1.6667 KOps/s $\color{#35bf28}+3.35\%$
test_serialize_model 0.1147s 0.1037s 9.6418 Ops/s 8.2787 Ops/s $\textbf{\color{#35bf28}+16.47\%}$
test_serialize_model_pickle 0.4480s 0.3778s 2.6467 Ops/s 2.6246 Ops/s $\color{#35bf28}+0.84\%$
test_serialize_weights 0.1102s 0.1019s 9.8149 Ops/s 9.6875 Ops/s $\color{#35bf28}+1.31\%$
test_serialize_weights_returnearly 0.1355s 0.1237s 8.0819 Ops/s 6.9568 Ops/s $\textbf{\color{#35bf28}+16.17\%}$
test_serialize_weights_pickle 0.5801s 0.4506s 2.2194 Ops/s 2.3231 Ops/s $\color{#d91a1a}-4.46\%$
test_serialize_weights_filesystem 0.2307s 0.1127s 8.8702 Ops/s 10.9648 Ops/s $\textbf{\color{#d91a1a}-19.10\%}$
test_serialize_model_filesystem 0.1006s 93.4403ms 10.7020 Ops/s 10.3244 Ops/s $\color{#35bf28}+3.66\%$
test_reshape_pytree 66.5250μs 21.1236μs 47.3404 KOps/s 47.0566 KOps/s $\color{#35bf28}+0.60\%$
test_reshape_td 74.2600μs 33.5718μs 29.7869 KOps/s 30.0081 KOps/s $\color{#d91a1a}-0.74\%$
test_view_pytree 54.7720μs 21.0341μs 47.5418 KOps/s 46.5941 KOps/s $\color{#35bf28}+2.03\%$
test_view_td 0.1375s 68.4285μs 14.6138 KOps/s 15.8680 KOps/s $\textbf{\color{#d91a1a}-7.90\%}$
test_unbind_pytree 69.4900μs 24.5754μs 40.6911 KOps/s 41.1327 KOps/s $\color{#d91a1a}-1.07\%$
test_unbind_td 0.1706s 51.6720μs 19.3529 KOps/s 27.6053 KOps/s $\textbf{\color{#d91a1a}-29.89\%}$
test_split_pytree 56.9470μs 23.9957μs 41.6742 KOps/s 42.1310 KOps/s $\color{#d91a1a}-1.08\%$
test_split_td 0.1274ms 41.4795μs 24.1083 KOps/s 24.3467 KOps/s $\color{#d91a1a}-0.98\%$
test_add_pytree 0.1071ms 30.0780μs 33.2469 KOps/s 32.0342 KOps/s $\color{#35bf28}+3.79\%$
test_add_td 0.1173ms 51.3217μs 19.4849 KOps/s 18.6348 KOps/s $\color{#35bf28}+4.56\%$
test_distributed 0.2440ms 0.1007ms 9.9274 KOps/s 9.7970 KOps/s $\color{#35bf28}+1.33\%$
test_tdmodule 81.9540μs 16.7677μs 59.6385 KOps/s 57.5032 KOps/s $\color{#35bf28}+3.71\%$
test_tdmodule_dispatch 57.3970μs 33.0593μs 30.2486 KOps/s 29.4662 KOps/s $\color{#35bf28}+2.66\%$
test_tdseq 46.5670μs 19.4814μs 51.3310 KOps/s 49.7710 KOps/s $\color{#35bf28}+3.13\%$
test_tdseq_dispatch 74.1290μs 37.7541μs 26.4872 KOps/s 25.7731 KOps/s $\color{#35bf28}+2.77\%$
test_instantiation_functorch 1.5830ms 1.3047ms 766.4729 Ops/s 763.5343 Ops/s $\color{#35bf28}+0.38\%$
test_instantiation_td 1.6662ms 1.0233ms 977.2028 Ops/s 994.6530 Ops/s $\color{#d91a1a}-1.75\%$
test_exec_functorch 0.4182ms 0.1646ms 6.0738 KOps/s 6.2512 KOps/s $\color{#d91a1a}-2.84\%$
test_exec_functional_call 0.3173ms 0.1479ms 6.7591 KOps/s 6.8374 KOps/s $\color{#d91a1a}-1.14\%$
test_exec_td 0.2577ms 0.1451ms 6.8904 KOps/s 7.0272 KOps/s $\color{#d91a1a}-1.95\%$
test_exec_td_decorator 0.8829ms 0.2010ms 4.9750 KOps/s 5.0514 KOps/s $\color{#d91a1a}-1.51\%$
test_vmap_mlp_speed[True-True] 0.7057ms 0.4720ms 2.1188 KOps/s 2.0863 KOps/s $\color{#35bf28}+1.56\%$
test_vmap_mlp_speed[True-False] 0.7506ms 0.4685ms 2.1343 KOps/s 2.0826 KOps/s $\color{#35bf28}+2.48\%$
test_vmap_mlp_speed[False-True] 0.6248ms 0.3842ms 2.6029 KOps/s 2.5656 KOps/s $\color{#35bf28}+1.46\%$
test_vmap_mlp_speed[False-False] 0.6194ms 0.3859ms 2.5912 KOps/s 2.5525 KOps/s $\color{#35bf28}+1.52\%$
test_vmap_mlp_speed_decorator[True-True] 0.8924ms 0.4911ms 2.0364 KOps/s 1.9885 KOps/s $\color{#35bf28}+2.41\%$
test_vmap_mlp_speed_decorator[True-False] 0.7966ms 0.4940ms 2.0243 KOps/s 1.9802 KOps/s $\color{#35bf28}+2.23\%$
test_vmap_mlp_speed_decorator[False-True] 0.6869ms 0.4211ms 2.3749 KOps/s 2.4422 KOps/s $\color{#d91a1a}-2.75\%$
test_vmap_mlp_speed_decorator[False-False] 0.6685ms 0.4034ms 2.4789 KOps/s 2.4403 KOps/s $\color{#35bf28}+1.58\%$
test_to_module_speed[True] 2.5723ms 1.5608ms 640.6998 Ops/s 714.8989 Ops/s $\textbf{\color{#d91a1a}-10.38\%}$
test_to_module_speed[False] 2.8347ms 1.4103ms 709.0915 Ops/s 724.3699 Ops/s $\color{#d91a1a}-2.11\%$

@vmoens vmoens added the bug Something isn't working label Apr 18, 2024
@vmoens vmoens merged commit 6f56bd2 into main Apr 18, 2024
41 of 48 checks passed
@vmoens vmoens deleted the fix-non_blocking branch October 21, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants