-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Faster _is_tensor_collection in eager mode #1060
Conversation
ghstack-source-id: 5a94e1265ecfa50cb528fb5cf3c905816894a18e Pull Request resolved: #1060
ghstack-source-id: b81c1d243a7c72a9d5fd68bf8e65e97a934ae61c Pull Request resolved: #1060
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 48.2500μs | 21.0667μs | 47.4682 KOps/s | 39.2580 KOps/s | |
test_plain_set_stack_nested | 59.5610μs | 20.8463μs | 47.9702 KOps/s | 38.3060 KOps/s | |
test_plain_set_nested_inplace | 58.8300μs | 22.8751μs | 43.7157 KOps/s | 34.9390 KOps/s | |
test_plain_set_stack_nested_inplace | 53.6500μs | 22.7765μs | 43.9049 KOps/s | 35.2923 KOps/s | |
test_items | 40.3750μs | 4.1177μs | 242.8548 KOps/s | 240.0884 KOps/s | |
test_items_nested | 0.7091ms | 0.3396ms | 2.9448 KOps/s | 2.7995 KOps/s | |
test_items_nested_locked | 0.4021ms | 0.3416ms | 2.9278 KOps/s | 2.7795 KOps/s | |
test_items_nested_leaf | 0.1364ms | 70.4245μs | 14.1996 KOps/s | 12.4789 KOps/s | |
test_items_stack_nested | 0.4744ms | 0.3438ms | 2.9084 KOps/s | 2.7781 KOps/s | |
test_items_stack_nested_leaf | 0.1401ms | 73.4047μs | 13.6231 KOps/s | 11.8696 KOps/s | |
test_items_stack_nested_locked | 0.5358ms | 0.3432ms | 2.9141 KOps/s | 2.7521 KOps/s | |
test_keys | 43.1010μs | 3.4933μs | 286.2627 KOps/s | 282.0129 KOps/s | |
test_keys_nested | 0.2272ms | 0.1331ms | 7.5115 KOps/s | 5.4949 KOps/s | |
test_keys_nested_locked | 0.7383ms | 0.1382ms | 7.2378 KOps/s | 5.3100 KOps/s | |
test_keys_nested_leaf | 0.2047ms | 0.1138ms | 8.7866 KOps/s | 6.1738 KOps/s | |
test_keys_stack_nested | 0.2165ms | 0.1310ms | 7.6310 KOps/s | 5.5049 KOps/s | |
test_keys_stack_nested_leaf | 0.2366ms | 0.1120ms | 8.9301 KOps/s | 6.2364 KOps/s | |
test_keys_stack_nested_locked | 0.2583ms | 0.1370ms | 7.2991 KOps/s | 5.3443 KOps/s | |
test_values | 6.0192μs | 1.0232μs | 977.3002 KOps/s | 958.8754 KOps/s | |
test_values_nested | 0.1078ms | 54.9309μs | 18.2047 KOps/s | 14.1172 KOps/s | |
test_values_nested_locked | 0.1040ms | 54.7465μs | 18.2660 KOps/s | 14.2969 KOps/s | |
test_values_nested_leaf | 0.1060ms | 59.7055μs | 16.7489 KOps/s | 11.7159 KOps/s | |
test_values_stack_nested | 0.1104ms | 56.4951μs | 17.7006 KOps/s | 13.5126 KOps/s | |
test_values_stack_nested_leaf | 0.1093ms | 58.8037μs | 17.0057 KOps/s | 12.0896 KOps/s | |
test_values_stack_nested_locked | 0.1111ms | 56.7318μs | 17.6268 KOps/s | 13.8849 KOps/s | |
test_membership | 6.1229μs | 0.7447μs | 1.3428 MOps/s | 1.0802 MOps/s | |
test_membership_nested | 40.3050μs | 2.7546μs | 363.0278 KOps/s | 365.6285 KOps/s | |
test_membership_nested_leaf | 40.3150μs | 2.7667μs | 361.4357 KOps/s | 363.7756 KOps/s | |
test_membership_stacked_nested | 23.1440μs | 2.7543μs | 363.0693 KOps/s | 363.8361 KOps/s | |
test_membership_stacked_nested_leaf | 19.9380μs | 2.7545μs | 363.0371 KOps/s | 359.2849 KOps/s | |
test_membership_nested_last | 97.9530μs | 4.1270μs | 242.3057 KOps/s | 232.5992 KOps/s | |
test_membership_nested_leaf_last | 0.1033ms | 4.2224μs | 236.8343 KOps/s | 232.5689 KOps/s | |
test_membership_stacked_nested_last | 38.0820μs | 13.0259μs | 76.7700 KOps/s | 234.8537 KOps/s | |
test_membership_stacked_nested_leaf_last | 63.7390μs | 12.9831μs | 77.0233 KOps/s | 237.6599 KOps/s | |
test_nested_getleaf | 31.7100μs | 10.5999μs | 94.3408 KOps/s | 93.6465 KOps/s | |
test_nested_get | 66.0840μs | 9.9909μs | 100.0906 KOps/s | 98.0224 KOps/s | |
test_stacked_getleaf | 62.1070μs | 10.2914μs | 97.1681 KOps/s | 93.8099 KOps/s | |
test_stacked_get | 0.1156ms | 9.8284μs | 101.7457 KOps/s | 97.5586 KOps/s | |
test_nested_getitemleaf | 75.1210μs | 10.8964μs | 91.7734 KOps/s | 89.6881 KOps/s | |
test_nested_getitem | 38.3910μs | 10.1402μs | 98.6170 KOps/s | 95.9631 KOps/s | |
test_stacked_getitemleaf | 34.4040μs | 10.7634μs | 92.9077 KOps/s | 90.8159 KOps/s | |
test_stacked_getitem | 43.7210μs | 9.9139μs | 100.8685 KOps/s | 96.0182 KOps/s | |
test_lock_nested | 0.8903ms | 0.4858ms | 2.0584 KOps/s | 1.9805 KOps/s | |
test_lock_stack_nested | 0.6683ms | 0.4421ms | 2.2619 KOps/s | 2.1583 KOps/s | |
test_unlock_nested | 0.7998ms | 0.4091ms | 2.4441 KOps/s | 2.3713 KOps/s | |
test_unlock_stack_nested | 0.4398ms | 0.3591ms | 2.7848 KOps/s | 2.6521 KOps/s | |
test_flatten_speed | 0.1930ms | 91.1831μs | 10.9669 KOps/s | 9.8572 KOps/s | |
test_unflatten_speed | 0.8128ms | 0.4748ms | 2.1061 KOps/s | 1.9510 KOps/s | |
test_common_ops | 6.7560ms | 1.1131ms | 898.4159 Ops/s | 798.2737 Ops/s | |
test_creation | 17.9240μs | 2.1287μs | 469.7670 KOps/s | 483.2972 KOps/s | |
test_creation_empty | 0.1532ms | 16.8329μs | 59.4075 KOps/s | 50.0455 KOps/s | |
test_creation_nested_1 | 0.2316ms | 19.5480μs | 51.1561 KOps/s | 42.5009 KOps/s | |
test_creation_nested_2 | 57.1170μs | 23.8911μs | 41.8566 KOps/s | 36.2703 KOps/s | |
test_clone | 1.3485ms | 17.0067μs | 58.8005 KOps/s | 58.1392 KOps/s | |
test_getitem[int] | 0.7705ms | 16.8307μs | 59.4154 KOps/s | 61.4012 KOps/s | |
test_getitem[slice_int] | 0.1363ms | 31.7463μs | 31.4997 KOps/s | 32.8008 KOps/s | |
test_getitem[range] | 0.3331ms | 61.3720μs | 16.2941 KOps/s | 18.1392 KOps/s | |
test_getitem[tuple] | 0.1312ms | 26.2278μs | 38.1275 KOps/s | 39.6729 KOps/s | |
test_getitem[list] | 0.2894ms | 55.0059μs | 18.1799 KOps/s | 19.4995 KOps/s | |
test_setitem_dim[int] | 69.1190μs | 34.8380μs | 28.7043 KOps/s | 30.4479 KOps/s | |
test_setitem_dim[slice_int] | 0.1094ms | 63.1414μs | 15.8375 KOps/s | 16.0440 KOps/s | |
test_setitem_dim[range] | 0.1540ms | 84.3772μs | 11.8515 KOps/s | 12.0970 KOps/s | |
test_setitem_dim[tuple] | 96.8210μs | 51.1167μs | 19.5631 KOps/s | 19.7998 KOps/s | |
test_setitem | 0.1330ms | 29.7376μs | 33.6275 KOps/s | 31.9162 KOps/s | |
test_set | 0.1523ms | 28.9802μs | 34.5063 KOps/s | 32.0963 KOps/s | |
test_set_shared | 3.4310ms | 0.2136ms | 4.6818 KOps/s | 4.5434 KOps/s | |
test_update | 0.1298ms | 35.4735μs | 28.1901 KOps/s | 24.5163 KOps/s | |
test_update_nested | 0.1450ms | 47.5345μs | 21.0374 KOps/s | 19.5299 KOps/s | |
test_update__nested | 0.3751ms | 40.4692μs | 24.7101 KOps/s | 21.6401 KOps/s | |
test_set_nested | 76.9140μs | 31.0509μs | 32.2052 KOps/s | 29.5770 KOps/s | |
test_set_nested_new | 0.1086ms | 36.0845μs | 27.7127 KOps/s | 25.9822 KOps/s | |
test_select | 0.1472ms | 53.9420μs | 18.5384 KOps/s | 17.7232 KOps/s | |
test_select_nested | 0.1183ms | 60.5339μs | 16.5197 KOps/s | 16.3544 KOps/s | |
test_exclude_nested | 0.1370ms | 75.8951μs | 13.1761 KOps/s | 13.1935 KOps/s | |
test_empty[True] | 0.5032ms | 0.3460ms | 2.8904 KOps/s | 2.5147 KOps/s | |
test_empty[False] | 10.8905μs | 1.2353μs | 809.5345 KOps/s | 819.5789 KOps/s | |
test_unbind_speed | 0.5091ms | 0.3003ms | 3.3298 KOps/s | 3.3288 KOps/s | |
test_unbind_speed_stack0 | 0.5958ms | 0.2836ms | 3.5262 KOps/s | 3.4602 KOps/s | |
test_unbind_speed_stack1 | 0.1017s | 0.8665ms | 1.1541 KOps/s | 1.5257 KOps/s | |
test_split | 0.1004s | 2.2582ms | 442.8390 Ops/s | 446.3209 Ops/s | |
test_chunk | 3.3105ms | 2.0561ms | 486.3483 Ops/s | 450.9768 Ops/s | |
test_creation[device0] | 0.2477ms | 0.1155ms | 8.6560 KOps/s | 8.4757 KOps/s | |
test_creation_from_tensor | 3.7533ms | 0.1175ms | 8.5097 KOps/s | 8.5793 KOps/s | |
test_add_one[memmap_tensor0] | 0.2423ms | 7.0330μs | 142.1868 KOps/s | 143.7132 KOps/s | |
test_contiguous[memmap_tensor0] | 16.9420μs | 1.9206μs | 520.6819 KOps/s | 513.8350 KOps/s | |
test_stack[memmap_tensor0] | 61.4460μs | 5.4817μs | 182.4267 KOps/s | 186.3303 KOps/s | |
test_memmaptd_index | 1.1194ms | 0.4064ms | 2.4604 KOps/s | 2.4788 KOps/s | |
test_memmaptd_index_astensor | 0.9001ms | 0.4868ms | 2.0542 KOps/s | 1.9697 KOps/s | |
test_memmaptd_index_op | 1.6935ms | 1.0186ms | 981.7012 Ops/s | 939.0198 Ops/s | |
test_serialize_model | 0.2204s | 0.1288s | 7.7612 Ops/s | 8.3625 Ops/s | |
test_serialize_model_pickle | 0.4863s | 0.4043s | 2.4737 Ops/s | 2.5539 Ops/s | |
test_serialize_weights | 0.1281s | 0.1180s | 8.4770 Ops/s | 7.5318 Ops/s | |
test_serialize_weights_returnearly | 0.1848s | 0.1662s | 6.0160 Ops/s | 6.2356 Ops/s | |
test_serialize_weights_pickle | 0.5092s | 0.4124s | 2.4248 Ops/s | 1.1914 Ops/s | |
test_serialize_weights_filesystem | 0.2545s | 0.1597s | 6.2609 Ops/s | 7.1234 Ops/s | |
test_serialize_model_filesystem | 0.1579s | 0.1492s | 6.7029 Ops/s | 6.4489 Ops/s | |
test_reshape_pytree | 94.8080μs | 39.4816μs | 25.3282 KOps/s | 25.3056 KOps/s | |
test_reshape_td | 0.1214ms | 47.0335μs | 21.2614 KOps/s | 21.8255 KOps/s | |
test_view_pytree | 0.1070ms | 39.6057μs | 25.2489 KOps/s | 25.6919 KOps/s | |
test_view_td | 0.1119ms | 52.8317μs | 18.9280 KOps/s | 19.2084 KOps/s | |
test_unbind_pytree | 81.9930μs | 35.7736μs | 27.9536 KOps/s | 28.0005 KOps/s | |
test_unbind_td | 0.3176ms | 44.4723μs | 22.4859 KOps/s | 22.6871 KOps/s | |
test_split_pytree | 80.4810μs | 37.7650μs | 26.4795 KOps/s | 26.5296 KOps/s | |
test_split_td | 0.2022ms | 58.7057μs | 17.0341 KOps/s | 17.6941 KOps/s | |
test_add_pytree | 0.1809ms | 45.7554μs | 21.8554 KOps/s | 23.1873 KOps/s | |
test_add_td | 0.5826ms | 84.8475μs | 11.7859 KOps/s | 11.7214 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1340ms | 71.7378μs | 13.9397 KOps/s | 13.6938 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.4022ms | 0.1844ms | 5.4231 KOps/s | 4.9611 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.1056ms | 55.0855μs | 18.1536 KOps/s | 18.4879 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.4016ms | 0.1453ms | 6.8828 KOps/s | 6.9508 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 96.2710μs | 25.8852μs | 38.6321 KOps/s | 36.5849 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1302ms | 70.4170μs | 14.2011 KOps/s | 13.1710 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1786ms | 79.8043μs | 12.5306 KOps/s | 12.5246 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1268ms | 67.6045μs | 14.7919 KOps/s | 14.2204 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1962ms | 0.1145ms | 8.7339 KOps/s | 8.2124 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3949ms | 0.2055ms | 4.8669 KOps/s | 4.0256 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1203ms | 54.5434μs | 18.3340 KOps/s | 18.3820 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.4801ms | 69.6778μs | 14.3518 KOps/s | 13.2165 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.1998ms | 0.1122ms | 8.9091 KOps/s | 9.0338 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.6353ms | 0.3020ms | 3.3115 KOps/s | 3.3048 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.5401ms | 0.2223ms | 4.4990 KOps/s | 3.5354 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.1765ms | 0.1138ms | 8.7856 KOps/s | 8.0882 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.1302ms | 62.8456μs | 15.9120 KOps/s | 13.7504 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1162ms | 55.5115μs | 18.0143 KOps/s | 18.7636 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6510ms | 0.2452ms | 4.0788 KOps/s | 4.0545 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.3988ms | 0.1140ms | 8.7714 KOps/s | 9.0682 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 99.0850μs | 20.2872μs | 49.2921 KOps/s | 35.0785 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1201ms | 59.4285μs | 16.8269 KOps/s | 12.8840 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1735ms | 81.3497μs | 12.2926 KOps/s | 12.1380 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1526ms | 68.9795μs | 14.4971 KOps/s | 14.5152 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3159ms | 0.2153ms | 4.6445 KOps/s | 4.7026 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.0626ms | 1.7280ms | 578.7038 Ops/s | 540.6940 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2828ms | 0.2120ms | 4.7178 KOps/s | 4.8270 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.2393ms | 1.1515ms | 868.4358 Ops/s | 858.7958 Ops/s | |
test_compile_assign_and_add_stack[compile] | 0.5378ms | 0.4591ms | 2.1782 KOps/s | 2.2533 KOps/s | |
test_compile_assign_and_add_stack[eager] | 4.0685ms | 3.9331ms | 254.2534 Ops/s | 177.8062 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.1073ms | 45.3874μs | 22.0326 KOps/s | 23.6466 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5274ms | 51.5643μs | 19.3933 KOps/s | 20.7981 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 87.5640μs | 37.9871μs | 26.3248 KOps/s | 27.8972 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 67.4460μs | 30.3214μs | 32.9800 KOps/s | 34.5550 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 84.3780μs | 38.2366μs | 26.1530 KOps/s | 26.9185 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 95.4490μs | 30.4062μs | 32.8880 KOps/s | 33.7286 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1874ms | 78.6874μs | 12.7085 KOps/s | 12.8977 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.6009ms | 30.2885μs | 33.0158 KOps/s | 34.7950 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1488ms | 71.9681μs | 13.8950 KOps/s | 14.1341 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 57.2070μs | 23.9160μs | 41.8130 KOps/s | 42.0974 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1729ms | 71.7288μs | 13.9414 KOps/s | 13.9376 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 73.8390μs | 23.7756μs | 42.0600 KOps/s | 42.4165 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1473ms | 79.6479μs | 12.5553 KOps/s | 12.6781 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.8940ms | 29.9858μs | 33.3492 KOps/s | 35.7493 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.3861ms | 74.5765μs | 13.4091 KOps/s | 14.0053 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 62.7070μs | 23.7300μs | 42.1408 KOps/s | 42.5013 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1377ms | 72.5381μs | 13.7859 KOps/s | 14.1959 KOps/s | |
test_compile_indexing[int-pytree-eager] | 75.7010μs | 24.1019μs | 41.4904 KOps/s | 42.4293 KOps/s | |
test_mod_add[eager] | 70.0410μs | 24.7384μs | 40.4229 KOps/s | 36.7966 KOps/s | |
test_mod_add[compile] | 0.2910ms | 44.9061μs | 22.2687 KOps/s | 22.9547 KOps/s | |
test_mod_add[compile-overhead] | 0.1028ms | 43.8924μs | 22.7830 KOps/s | 22.9701 KOps/s | |
test_mod_wrap[eager] | 0.3933ms | 0.2156ms | 4.6377 KOps/s | 4.6658 KOps/s | |
test_mod_wrap[compile] | 2.0011ms | 0.2052ms | 4.8735 KOps/s | 4.9765 KOps/s | |
test_mod_wrap[compile-overhead] | 2.0703ms | 0.2072ms | 4.8269 KOps/s | 5.0190 KOps/s | |
test_mod_wrap_and_backward[eager] | 12.7917ms | 11.3613ms | 88.0183 Ops/s | 86.5272 Ops/s | |
test_mod_wrap_and_backward[compile] | 14.2751ms | 12.4995ms | 80.0031 Ops/s | 79.8547 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 16.4416ms | 13.3595ms | 74.8531 Ops/s | 80.3805 Ops/s | |
test_seq_add[eager] | 0.1699ms | 91.0993μs | 10.9770 KOps/s | 10.3432 KOps/s | |
test_seq_add[compile] | 0.1477ms | 60.9812μs | 16.3985 KOps/s | 17.3610 KOps/s | |
test_seq_add[compile-overhead] | 0.1330ms | 59.3165μs | 16.8587 KOps/s | 17.7466 KOps/s | |
test_seq_wrap[eager] | 0.5070ms | 0.3836ms | 2.6071 KOps/s | 2.5253 KOps/s | |
test_seq_wrap[compile] | 0.3288ms | 0.2315ms | 4.3191 KOps/s | 4.4253 KOps/s | |
test_seq_wrap[compile-overhead] | 0.4114ms | 0.2320ms | 4.3102 KOps/s | 4.4359 KOps/s | |
test_func_call_runtime[False-eager] | 1.5317ms | 0.5585ms | 1.7904 KOps/s | 1.8274 KOps/s | |
test_func_call_runtime[False-compile] | 0.5769ms | 0.4314ms | 2.3180 KOps/s | 2.3517 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5890ms | 0.4304ms | 2.3233 KOps/s | 2.3686 KOps/s | |
test_func_call_runtime[True-eager] | 1.0982ms | 0.7751ms | 1.2902 KOps/s | 1.3222 KOps/s | |
test_func_call_runtime[True-compile] | 0.7678ms | 0.4756ms | 2.1025 KOps/s | 2.1690 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.5586ms | 0.4703ms | 2.1263 KOps/s | 2.1577 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9458ms | 0.5460ms | 1.8314 KOps/s | 1.8254 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.5450ms | 0.4330ms | 2.3093 KOps/s | 2.3562 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.6401ms | 0.4325ms | 2.3124 KOps/s | 2.3646 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.4596ms | 0.9016ms | 1.1091 KOps/s | 1.0999 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.0061ms | 0.5070ms | 1.9724 KOps/s | 2.0499 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.7812ms | 0.4972ms | 2.0114 KOps/s | 2.0415 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.6180ms | 1.9216ms | 520.3874 Ops/s | 530.1247 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.8812ms | 0.5165ms | 1.9361 KOps/s | 1.9338 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.7318ms | 0.5264ms | 1.8996 KOps/s | 1.9368 KOps/s | |
test_distributed | 0.2988ms | 0.1294ms | 7.7296 KOps/s | 7.8179 KOps/s | |
test_tdmodule | 50.9360μs | 17.8901μs | 55.8969 KOps/s | 49.8083 KOps/s | |
test_tdmodule_dispatch | 76.0120μs | 34.7097μs | 28.8104 KOps/s | 25.4371 KOps/s | |
test_tdseq | 60.1320μs | 19.9842μs | 50.0395 KOps/s | 44.8797 KOps/s | |
test_tdseq_dispatch | 74.4790μs | 39.6027μs | 25.2508 KOps/s | 22.1753 KOps/s | |
test_instantiation_functorch | 2.8225ms | 1.5556ms | 642.8553 Ops/s | 657.5005 Ops/s | |
test_exec_functorch | 0.4630ms | 0.1825ms | 5.4801 KOps/s | 5.4044 KOps/s | |
test_exec_functional_call | 0.2885ms | 0.1758ms | 5.6882 KOps/s | 5.5691 KOps/s | |
test_exec_td_decorator | 0.5803ms | 0.2311ms | 4.3281 KOps/s | 4.1865 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.7846ms | 0.6432ms | 1.5548 KOps/s | 1.4816 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.9841ms | 0.6491ms | 1.5406 KOps/s | 1.5376 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.9726ms | 0.5343ms | 1.8716 KOps/s | 1.8778 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.8011ms | 0.5385ms | 1.8570 KOps/s | 1.8746 KOps/s | |
test_to_module_speed[True] | 1.5513ms | 1.3080ms | 764.5231 Ops/s | 710.2315 Ops/s | |
test_to_module_speed[False] | 1.4558ms | 1.2573ms | 795.3443 Ops/s | 743.5256 Ops/s | |
test_tc_init | 0.1043ms | 41.5140μs | 24.0883 KOps/s | 20.3004 KOps/s | |
test_tc_init_nested | 0.1525ms | 80.9316μs | 12.3561 KOps/s | 9.8198 KOps/s | |
test_tc_first_layer_tensor | 48.1900μs | 1.5060μs | 663.9985 KOps/s | 665.0462 KOps/s | |
test_tc_first_layer_nontensor | 30.2260μs | 4.6956μs | 212.9675 KOps/s | 215.2457 KOps/s | |
test_tc_second_layer_tensor | 29.7760μs | 2.7827μs | 359.3633 KOps/s | 358.4541 KOps/s | |
test_tc_second_layer_nontensor | 53.2390μs | 5.9628μs | 167.7055 KOps/s | 161.7132 KOps/s | |
test_unbind | 0.2521s | 16.1417ms | 61.9515 Ops/s | 84.4889 Ops/s | |
test_full_like | 10.3004ms | 8.5143ms | 117.4490 Ops/s | 142.3334 Ops/s | |
test_zeros_like | 4.0160ms | 3.3857ms | 295.3576 Ops/s | 366.0661 Ops/s | |
test_ones_like | 4.6103ms | 3.8859ms | 257.3425 Ops/s | 312.8564 Ops/s | |
test_clone | 6.9317ms | 5.6819ms | 175.9976 Ops/s | 201.6075 Ops/s | |
test_squeeze | 89.5270μs | 11.7123μs | 85.3802 KOps/s | 85.6542 KOps/s | |
test_unsqueeze | 0.1973ms | 89.4726μs | 11.1766 KOps/s | 11.1594 KOps/s | |
test_split | 0.4020ms | 0.1983ms | 5.0421 KOps/s | 5.2946 KOps/s | |
test_permute | 0.4850ms | 0.2233ms | 4.4780 KOps/s | 4.6805 KOps/s | |
test_stack | 31.6116ms | 26.1855ms | 38.1891 Ops/s | 38.2287 Ops/s | |
test_cat | 28.2903ms | 26.1830ms | 38.1928 Ops/s | 40.1111 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 36.7410μs | 13.2511μs | 75.4654 KOps/s | 64.3158 KOps/s | |
test_plain_set_stack_nested | 34.4610μs | 13.4414μs | 74.3972 KOps/s | 62.7238 KOps/s | |
test_plain_set_nested_inplace | 0.1123ms | 14.3783μs | 69.5491 KOps/s | 58.4347 KOps/s | |
test_plain_set_stack_nested_inplace | 0.1155ms | 14.2812μs | 70.0219 KOps/s | 58.8784 KOps/s | |
test_items | 25.8510μs | 2.8813μs | 347.0664 KOps/s | 339.3808 KOps/s | |
test_items_nested | 0.3905ms | 0.3232ms | 3.0936 KOps/s | 3.0488 KOps/s | |
test_items_nested_locked | 0.3745ms | 0.3229ms | 3.0971 KOps/s | 3.0108 KOps/s | |
test_items_nested_leaf | 0.1316ms | 58.8213μs | 17.0006 KOps/s | 15.8392 KOps/s | |
test_items_stack_nested | 0.3714ms | 0.3239ms | 3.0874 KOps/s | 3.0354 KOps/s | |
test_items_stack_nested_leaf | 87.0410μs | 58.1053μs | 17.2101 KOps/s | 15.8691 KOps/s | |
test_items_stack_nested_locked | 0.4664ms | 0.3257ms | 3.0705 KOps/s | 3.0248 KOps/s | |
test_keys | 26.4500μs | 3.4663μs | 288.4919 KOps/s | 290.2263 KOps/s | |
test_keys_nested | 0.1045ms | 71.7823μs | 13.9310 KOps/s | 10.6284 KOps/s | |
test_keys_nested_locked | 0.7654ms | 76.9944μs | 12.9880 KOps/s | 9.8713 KOps/s | |
test_keys_nested_leaf | 0.1032ms | 61.5603μs | 16.2442 KOps/s | 11.6796 KOps/s | |
test_keys_stack_nested | 0.1120ms | 71.2926μs | 14.0267 KOps/s | 10.5766 KOps/s | |
test_keys_stack_nested_leaf | 90.8710μs | 61.6755μs | 16.2139 KOps/s | 11.6434 KOps/s | |
test_keys_stack_nested_locked | 0.1047ms | 77.5286μs | 12.8985 KOps/s | 9.9447 KOps/s | |
test_values | 4.9768μs | 0.8467μs | 1.1810 MOps/s | 1.1861 MOps/s | |
test_values_nested | 80.7110μs | 31.5281μs | 31.7178 KOps/s | 26.3006 KOps/s | |
test_values_nested_locked | 85.3110μs | 32.9198μs | 30.3769 KOps/s | 25.3202 KOps/s | |
test_values_nested_leaf | 57.8710μs | 33.5585μs | 29.7987 KOps/s | 22.0075 KOps/s | |
test_values_stack_nested | 0.2099ms | 31.5333μs | 31.7125 KOps/s | 26.2321 KOps/s | |
test_values_stack_nested_leaf | 0.1307ms | 33.8274μs | 29.5618 KOps/s | 21.3709 KOps/s | |
test_values_stack_nested_locked | 61.8600μs | 33.3324μs | 30.0008 KOps/s | 25.1800 KOps/s | |
test_membership | 1.7316μs | 0.5116μs | 1.9546 MOps/s | 1.9702 MOps/s | |
test_membership_nested | 16.7055μs | 1.9103μs | 523.4672 KOps/s | 511.2687 KOps/s | |
test_membership_nested_leaf | 66.2243μs | 1.8944μs | 527.8760 KOps/s | 529.5573 KOps/s | |
test_membership_stacked_nested | 27.7000μs | 1.9722μs | 507.0483 KOps/s | 511.9547 KOps/s | |
test_membership_stacked_nested_leaf | 28.7300μs | 1.9813μs | 504.7072 KOps/s | 517.1124 KOps/s | |
test_membership_nested_last | 26.6000μs | 2.8507μs | 350.7876 KOps/s | 335.2675 KOps/s | |
test_membership_nested_leaf_last | 29.9600μs | 2.8484μs | 351.0698 KOps/s | 335.8328 KOps/s | |
test_membership_stacked_nested_last | 27.9910μs | 2.8468μs | 351.2674 KOps/s | 333.0448 KOps/s | |
test_membership_stacked_nested_leaf_last | 23.1610μs | 2.8249μs | 353.9911 KOps/s | 336.6053 KOps/s | |
test_nested_getleaf | 36.0010μs | 6.0295μs | 165.8504 KOps/s | 167.0368 KOps/s | |
test_nested_get | 30.6600μs | 5.7068μs | 175.2311 KOps/s | 176.8073 KOps/s | |
test_stacked_getleaf | 25.1300μs | 6.0180μs | 166.1694 KOps/s | 166.9578 KOps/s | |
test_stacked_get | 32.4300μs | 5.7076μs | 175.2046 KOps/s | 176.6381 KOps/s | |
test_nested_getitemleaf | 26.3810μs | 6.0873μs | 164.2775 KOps/s | 165.1851 KOps/s | |
test_nested_getitem | 30.9610μs | 5.7712μs | 173.2744 KOps/s | 173.6592 KOps/s | |
test_stacked_getitemleaf | 38.1910μs | 6.1328μs | 163.0580 KOps/s | 165.3179 KOps/s | |
test_stacked_getitem | 0.1953ms | 5.7705μs | 173.2958 KOps/s | 173.9332 KOps/s | |
test_lock_nested | 1.1972ms | 0.4202ms | 2.3796 KOps/s | 2.3067 KOps/s | |
test_lock_stack_nested | 0.5057ms | 0.3887ms | 2.5728 KOps/s | 2.4783 KOps/s | |
test_unlock_nested | 0.7827ms | 0.3546ms | 2.8200 KOps/s | 2.7107 KOps/s | |
test_unlock_stack_nested | 0.4297ms | 0.3247ms | 3.0802 KOps/s | 2.9707 KOps/s | |
test_flatten_speed | 0.1155ms | 73.1371μs | 13.6729 KOps/s | 12.8702 KOps/s | |
test_unflatten_speed | 0.3261ms | 0.2943ms | 3.3978 KOps/s | 3.1245 KOps/s | |
test_common_ops | 1.7723ms | 1.1617ms | 860.8434 Ops/s | 829.1264 Ops/s | |
test_creation | 26.8000μs | 1.4888μs | 671.6893 KOps/s | 671.0429 KOps/s | |
test_creation_empty | 36.3400μs | 13.2256μs | 75.6107 KOps/s | 72.3391 KOps/s | |
test_creation_nested_1 | 38.1310μs | 14.7279μs | 67.8985 KOps/s | 64.9717 KOps/s | |
test_creation_nested_2 | 0.1590ms | 17.8139μs | 56.1361 KOps/s | 55.9864 KOps/s | |
test_clone | 0.2137ms | 28.3594μs | 35.2617 KOps/s | 33.7838 KOps/s | |
test_getitem[int] | 1.1198ms | 16.1814μs | 61.7992 KOps/s | 59.4819 KOps/s | |
test_getitem[slice_int] | 0.1296ms | 28.9717μs | 34.5164 KOps/s | 34.0418 KOps/s | |
test_getitem[range] | 0.2584ms | 0.1155ms | 8.6584 KOps/s | 8.9562 KOps/s | |
test_getitem[tuple] | 95.6467ms | 31.0818μs | 32.1732 KOps/s | 39.1356 KOps/s | |
test_getitem[list] | 0.2223ms | 0.1015ms | 9.8568 KOps/s | 9.7472 KOps/s | |
test_setitem_dim[int] | 73.6010μs | 44.6589μs | 22.3919 KOps/s | 22.6085 KOps/s | |
test_setitem_dim[slice_int] | 92.3910μs | 66.7575μs | 14.9796 KOps/s | 14.8573 KOps/s | |
test_setitem_dim[range] | 0.1624ms | 0.1286ms | 7.7778 KOps/s | 7.8247 KOps/s | |
test_setitem_dim[tuple] | 0.2073ms | 60.7023μs | 16.4739 KOps/s | 16.6689 KOps/s | |
test_setitem | 0.1872ms | 39.6158μs | 25.2424 KOps/s | 24.8585 KOps/s | |
test_set | 0.1900ms | 37.7828μs | 26.4671 KOps/s | 25.2042 KOps/s | |
test_set_shared | 0.3197ms | 49.7724μs | 20.0914 KOps/s | 18.6894 KOps/s | |
test_update | 0.1957ms | 45.5377μs | 21.9598 KOps/s | 20.9882 KOps/s | |
test_update_nested | 0.1978ms | 54.2353μs | 18.4382 KOps/s | 18.1476 KOps/s | |
test_update__nested | 0.4702ms | 64.8449μs | 15.4214 KOps/s | 15.7770 KOps/s | |
test_set_nested | 0.1910ms | 41.5301μs | 24.0789 KOps/s | 23.7422 KOps/s | |
test_set_nested_new | 0.1909ms | 43.5553μs | 22.9593 KOps/s | 21.8730 KOps/s | |
test_select | 0.2143ms | 57.3646μs | 17.4324 KOps/s | 17.0572 KOps/s | |
test_select_nested | 66.5110μs | 42.4638μs | 23.5495 KOps/s | 24.4498 KOps/s | |
test_exclude_nested | 87.5310μs | 59.4864μs | 16.8106 KOps/s | 17.1393 KOps/s | |
test_empty[True] | 0.3001ms | 0.2571ms | 3.8892 KOps/s | 3.5243 KOps/s | |
test_empty[False] | 2.9890μs | 0.7571μs | 1.3208 MOps/s | 1.3243 MOps/s | |
test_to | 51.0910μs | 25.0869μs | 39.8614 KOps/s | 38.3712 KOps/s | |
test_to_nonblocking | 57.9210μs | 23.9552μs | 41.7447 KOps/s | 40.7033 KOps/s | |
test_unbind_speed | 1.1173ms | 0.2790ms | 3.5847 KOps/s | 3.6598 KOps/s | |
test_unbind_speed_stack0 | 0.3769ms | 0.2747ms | 3.6409 KOps/s | 3.6276 KOps/s | |
test_unbind_speed_stack1 | 94.1016ms | 0.7043ms | 1.4199 KOps/s | 1.3986 KOps/s | |
test_split | 95.8361ms | 2.1297ms | 469.5537 Ops/s | 445.1482 Ops/s | |
test_chunk | 97.1866ms | 2.1474ms | 465.6784 Ops/s | 447.3076 Ops/s | |
test_to[False] | 3.3922ms | 3.1777ms | 314.6895 Ops/s | 294.2220 Ops/s | |
test_to[True] | 4.5318ms | 4.1691ms | 239.8588 Ops/s | 225.9464 Ops/s | |
test_to_njt[False] | 0.3252s | 0.2462s | 4.0610 Ops/s | 4.3046 Ops/s | |
test_to_njt[True] | 0.3610s | 0.2751s | 3.6344 Ops/s | 3.5367 Ops/s | |
test_creation[device0] | 0.3381ms | 0.1269ms | 7.8799 KOps/s | 7.7683 KOps/s | |
test_creation_from_tensor | 0.3901ms | 0.1291ms | 7.7439 KOps/s | 7.7077 KOps/s | |
test_add_one[memmap_tensor0] | 0.1452ms | 8.5055μs | 117.5714 KOps/s | 111.3350 KOps/s | |
test_contiguous[memmap_tensor0] | 22.2210μs | 2.1633μs | 462.2524 KOps/s | 457.3643 KOps/s | |
test_stack[memmap_tensor0] | 0.1590ms | 6.8111μs | 146.8189 KOps/s | 142.3959 KOps/s | |
test_memmaptd_index | 1.0183ms | 0.4138ms | 2.4165 KOps/s | 2.3423 KOps/s | |
test_memmaptd_index_astensor | 0.9500ms | 0.4727ms | 2.1154 KOps/s | 2.0033 KOps/s | |
test_memmaptd_index_op | 1.3387ms | 0.9544ms | 1.0478 KOps/s | 1.0084 KOps/s | |
test_serialize_model | 0.1334s | 0.1313s | 7.6149 Ops/s | 7.6623 Ops/s | |
test_serialize_model_pickle | 1.3497s | 1.2167s | 0.8219 Ops/s | 0.8416 Ops/s | |
test_serialize_weights | 0.1311s | 0.1298s | 7.7016 Ops/s | 7.6799 Ops/s | |
test_serialize_weights_returnearly | 0.2292s | 55.9839ms | 17.8623 Ops/s | 21.2561 Ops/s | |
test_serialize_weights_pickle | 1.3731s | 1.1957s | 0.8363 Ops/s | 0.8223 Ops/s | |
test_reshape_pytree | 88.2710μs | 34.9471μs | 28.6147 KOps/s | 26.7844 KOps/s | |
test_reshape_td | 0.1419ms | 42.6565μs | 23.4431 KOps/s | 23.9993 KOps/s | |
test_view_pytree | 0.1431ms | 35.0451μs | 28.5347 KOps/s | 28.1478 KOps/s | |
test_view_td | 0.1130ms | 45.0915μs | 22.1771 KOps/s | 21.7929 KOps/s | |
test_unbind_pytree | 0.1319ms | 33.4365μs | 29.9074 KOps/s | 29.2008 KOps/s | |
test_unbind_td | 98.6471ms | 48.5896μs | 20.5805 KOps/s | 22.9740 KOps/s | |
test_split_pytree | 0.1748ms | 45.4430μs | 22.0056 KOps/s | 21.7628 KOps/s | |
test_split_td | 0.1790ms | 56.0053μs | 17.8555 KOps/s | 17.5085 KOps/s | |
test_add_pytree | 0.2057ms | 55.1721μs | 18.1251 KOps/s | 17.4752 KOps/s | |
test_add_td | 0.2579ms | 87.8822μs | 11.3789 KOps/s | 11.0807 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.3103ms | 0.1611ms | 6.2088 KOps/s | 6.1492 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 0.3088ms | 0.1514ms | 6.6055 KOps/s | 6.2683 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 0.2934ms | 0.1527ms | 6.5501 KOps/s | 6.0905 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.3247ms | 0.1771ms | 5.6480 KOps/s | 5.4722 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 0.1498ms | 21.4729μs | 46.5704 KOps/s | 47.2176 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 81.2510μs | 44.9625μs | 22.2408 KOps/s | 20.5597 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.4599ms | 65.4711μs | 15.2739 KOps/s | 15.1315 KOps/s | |
test_compile_copy_nested[pytree-eager] | 79.5310μs | 50.0717μs | 19.9714 KOps/s | 19.8567 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.4211ms | 0.3098ms | 3.2280 KOps/s | 3.1400 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.3440ms | 0.2140ms | 4.6733 KOps/s | 4.2935 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.2850ms | 0.1281ms | 7.8046 KOps/s | 7.6571 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1921ms | 58.4381μs | 17.1121 KOps/s | 14.7166 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.4654ms | 0.3178ms | 3.1469 KOps/s | 3.0823 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.7619ms | 0.5987ms | 1.6702 KOps/s | 1.5210 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.4048ms | 0.2539ms | 3.9388 KOps/s | 3.4986 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.4459ms | 0.3116ms | 3.2093 KOps/s | 3.1342 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2181ms | 68.3597μs | 14.6285 KOps/s | 13.1206 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.3016ms | 0.1292ms | 7.7388 KOps/s | 7.5526 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.6757ms | 0.5046ms | 1.9816 KOps/s | 1.7465 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.4643ms | 0.3206ms | 3.1190 KOps/s | 3.0767 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 0.1655ms | 17.9428μs | 55.7325 KOps/s | 50.7746 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 64.1710μs | 31.4903μs | 31.7558 KOps/s | 24.7094 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1577ms | 69.7690μs | 14.3330 KOps/s | 14.3202 KOps/s | |
test_compile_copy_flat[pytree-eager] | 81.3110μs | 52.0342μs | 19.2181 KOps/s | 19.6041 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 2.4619ms | 0.8368ms | 1.1950 KOps/s | 1.1298 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 3.2627ms | 2.9992ms | 333.4188 Ops/s | 311.7406 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 2.3890ms | 0.8322ms | 1.2016 KOps/s | 1.1054 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 3.3224ms | 3.0910ms | 323.5227 Ops/s | 310.9027 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 0.2320ms | 0.1206ms | 8.2926 KOps/s | 8.3241 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.2208ms | 60.8046μs | 16.4461 KOps/s | 15.7253 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.2590ms | 0.1132ms | 8.8337 KOps/s | 8.7393 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 0.1999ms | 41.6314μs | 24.0204 KOps/s | 22.9450 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.2646ms | 0.1140ms | 8.7708 KOps/s | 8.6229 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.1851ms | 41.5988μs | 24.0391 KOps/s | 22.9716 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.3227ms | 0.1517ms | 6.5910 KOps/s | 6.7149 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.1695ms | 25.8097μs | 38.7451 KOps/s | 38.1579 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.3275ms | 0.1396ms | 7.1639 KOps/s | 6.9964 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 0.1523ms | 20.2823μs | 49.3041 KOps/s | 46.4260 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.2778ms | 0.1407ms | 7.1051 KOps/s | 6.9208 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 0.1799ms | 23.8332μs | 41.9582 KOps/s | 46.7144 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.3059ms | 0.1495ms | 6.6880 KOps/s | 6.6291 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.5167ms | 25.5365μs | 39.1596 KOps/s | 37.4229 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.2883ms | 0.1418ms | 7.0501 KOps/s | 6.9484 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 53.2710μs | 20.2527μs | 49.3762 KOps/s | 46.4282 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.3035ms | 0.1438ms | 6.9518 KOps/s | 6.9432 KOps/s | |
test_compile_indexing[int-pytree-eager] | 53.6410μs | 20.0095μs | 49.9763 KOps/s | 46.3403 KOps/s | |
test_mod_add[eager] | 0.2029ms | 31.3417μs | 31.9063 KOps/s | 32.9587 KOps/s | |
test_mod_add[compile] | 0.2847ms | 82.5584μs | 12.1126 KOps/s | 12.1125 KOps/s | |
test_mod_add[compile-overhead] | 0.3079ms | 0.1507ms | 6.6365 KOps/s | 5.8785 KOps/s | |
test_mod_wrap[eager] | 0.4402ms | 0.2525ms | 3.9601 KOps/s | 4.0155 KOps/s | |
test_mod_wrap[compile] | 0.6759ms | 0.2885ms | 3.4662 KOps/s | 3.3185 KOps/s | |
test_mod_wrap[compile-overhead] | 7.5270ms | 4.0217ms | 248.6537 Ops/s | 251.8105 Ops/s | |
test_mod_wrap_and_backward[eager] | 1.5461ms | 1.3580ms | 736.3780 Ops/s | 685.0928 Ops/s | |
test_mod_wrap_and_backward[compile] | 1.5795ms | 1.3187ms | 758.3118 Ops/s | 684.7835 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 1.3404ms | 0.9020ms | 1.1087 KOps/s | 969.2714 Ops/s | |
test_seq_add[eager] | 0.2407ms | 92.4745μs | 10.8138 KOps/s | 10.3659 KOps/s | |
test_seq_add[compile] | 0.2531ms | 94.8202μs | 10.5463 KOps/s | 10.4640 KOps/s | |
test_seq_add[compile-overhead] | 0.3308ms | 0.1251ms | 7.9928 KOps/s | 7.9163 KOps/s | |
test_seq_wrap[eager] | 0.5772ms | 0.3804ms | 2.6290 KOps/s | 2.6106 KOps/s | |
test_seq_wrap[compile] | 0.4713ms | 0.3178ms | 3.1463 KOps/s | 3.0011 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3734ms | 0.2197ms | 4.5516 KOps/s | 4.4365 KOps/s | |
test_func_call_runtime[False-eager] | 0.8943ms | 0.7325ms | 1.3652 KOps/s | 1.3386 KOps/s | |
test_func_call_runtime[False-compile] | 0.9219ms | 0.7734ms | 1.2930 KOps/s | 1.1895 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.5553ms | 0.3544ms | 2.8215 KOps/s | 2.7533 KOps/s | |
test_func_call_runtime[True-eager] | 1.0496ms | 0.8852ms | 1.1296 KOps/s | 1.0914 KOps/s | |
test_func_call_runtime[True-compile] | 0.9700ms | 0.7954ms | 1.2572 KOps/s | 1.2092 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.4981ms | 0.3727ms | 2.6831 KOps/s | 2.6208 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.9273ms | 0.7293ms | 1.3712 KOps/s | 1.3425 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.9303ms | 0.7712ms | 1.2967 KOps/s | 1.2328 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 1.0153ms | 0.3539ms | 2.8253 KOps/s | 2.7436 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.1635ms | 0.9817ms | 1.0186 KOps/s | 970.3053 Ops/s | |
test_func_call_cm_runtime[True-compile] | 0.9830ms | 0.8198ms | 1.2198 KOps/s | 1.1664 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 0.5517ms | 0.4005ms | 2.4966 KOps/s | 2.4477 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 2.5153ms | 2.0689ms | 483.3379 Ops/s | 464.7094 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.9921ms | 0.8335ms | 1.1998 KOps/s | 1.1472 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.5334ms | 0.4045ms | 2.4721 KOps/s | 2.4313 KOps/s | |
test_distributed | 2.5331ms | 0.1775ms | 5.6350 KOps/s | 8.8305 KOps/s | |
test_tdmodule | 0.3806ms | 13.1183μs | 76.2292 KOps/s | 71.7911 KOps/s | |
test_tdmodule_dispatch | 45.7610μs | 25.6057μs | 39.0538 KOps/s | 36.3383 KOps/s | |
test_tdseq | 34.0910μs | 13.9470μs | 71.7002 KOps/s | 66.4128 KOps/s | |
test_tdseq_dispatch | 48.3610μs | 28.4277μs | 35.1769 KOps/s | 33.7274 KOps/s | |
test_instantiation_functorch | 2.0045ms | 1.8327ms | 545.6442 Ops/s | 532.9391 Ops/s | |
test_exec_functorch | 0.3548ms | 0.2051ms | 4.8763 KOps/s | 4.6787 KOps/s | |
test_exec_functional_call | 0.3790ms | 0.2058ms | 4.8579 KOps/s | 4.6112 KOps/s | |
test_exec_td_decorator | 0.4418ms | 0.2514ms | 3.9783 KOps/s | 3.7933 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8253ms | 0.6671ms | 1.4991 KOps/s | 1.4493 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8565ms | 0.6658ms | 1.5019 KOps/s | 1.4569 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.7525ms | 0.5898ms | 1.6956 KOps/s | 1.6507 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7316ms | 0.5886ms | 1.6988 KOps/s | 1.6497 KOps/s | |
test_vmap_transformer_speed_decorator[True-True] | 19.6868ms | 19.4242ms | 51.4821 Ops/s | 50.7053 Ops/s | |
test_vmap_transformer_speed_decorator[True-False] | 19.5663ms | 19.4292ms | 51.4689 Ops/s | 50.5223 Ops/s | |
test_vmap_transformer_speed_decorator[False-True] | 19.4454ms | 19.2962ms | 51.8238 Ops/s | 50.9730 Ops/s | |
test_vmap_transformer_speed_decorator[False-False] | 19.5297ms | 19.2996ms | 51.8146 Ops/s | 50.9791 Ops/s | |
test_to_module_speed[True] | 1.4365ms | 0.9231ms | 1.0833 KOps/s | 1.0353 KOps/s | |
test_to_module_speed[False] | 0.9964ms | 0.9041ms | 1.1061 KOps/s | 1.0442 KOps/s | |
test_tc_init | 58.7810μs | 31.7816μs | 31.4647 KOps/s | 30.1014 KOps/s | |
test_tc_init_nested | 0.1091ms | 64.9444μs | 15.3978 KOps/s | 15.1522 KOps/s | |
test_tc_first_layer_tensor | 4.7044μs | 0.6920μs | 1.4451 MOps/s | 1.4419 MOps/s | |
test_tc_first_layer_nontensor | 25.8200μs | 2.3082μs | 433.2301 KOps/s | 428.7910 KOps/s | |
test_tc_second_layer_tensor | 7.8475μs | 1.4188μs | 704.8126 KOps/s | 698.7333 KOps/s | |
test_tc_second_layer_nontensor | 35.5400μs | 3.0307μs | 329.9528 KOps/s | 325.9762 KOps/s | |
test_unbind | 0.1998s | 9.4600ms | 105.7082 Ops/s | 93.2536 Ops/s | |
test_full_like | 0.7579ms | 0.5756ms | 1.7373 KOps/s | 1.7432 KOps/s | |
test_zeros_like | 0.3414ms | 0.1984ms | 5.0408 KOps/s | 5.0441 KOps/s | |
test_ones_like | 0.3495ms | 0.1981ms | 5.0489 KOps/s | 5.0545 KOps/s | |
test_clone | 0.5668ms | 0.4152ms | 2.4084 KOps/s | 2.4101 KOps/s | |
test_squeeze | 37.1300μs | 9.3870μs | 106.5302 KOps/s | 107.5889 KOps/s | |
test_unsqueeze | 0.2114ms | 68.5623μs | 14.5853 KOps/s | 14.4177 KOps/s | |
test_split | 0.4044ms | 0.1603ms | 6.2399 KOps/s | 6.0579 KOps/s | |
test_permute | 0.2757ms | 0.1719ms | 5.8184 KOps/s | 5.7960 KOps/s | |
test_stack | 1.2612ms | 0.8433ms | 1.1858 KOps/s | 1.1580 KOps/s | |
test_cat | 1.3151ms | 1.2314ms | 812.1023 Ops/s | 812.5440 Ops/s |
ghstack-source-id: b81c1d243a7c72a9d5fd68bf8e65e97a934ae61c Pull Request resolved: #1060
is_dynamo = is_dynamo_compiling() | ||
out = None | ||
if not is_dynamo: | ||
out = _TENSOR_COLLECTION_MEMO.get(datatype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why do we need the if-conditon. Does compile not support dict inserts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to avoid re-compiles.
In #1015 I completely removed these checks but the benchmarks in eager mode suffered a lot from this (basically any iteration over tensordict leaves got 10x slower).
So I'm partially reverting this. The compiler shouldn't care too much about this kind of optimization, + it would add a guard on _TENSOR_COLLECTION_MEMO
which would be counter productive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - thank you!
Stack from ghstack (oldest at bottom):