Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
## Description This PR implements `match` for string slices including radix trie optimization and is a task of #5110. For example a simple `match` like ``` fn return_match_on_str_slice(param: str) -> u64 { match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } } ``` will generate code following this logic: ``` let packed_string = "get_a_b" if str.len() == 5 if str[0..4] == "get_" at packed_string[0] if str[4..5] == "b" at packed_string[6] return branch 2 if str[4..5] == "a" at packed_string[4] return branch 0 return wildcard branch return wildcard branch if str.len() == 7 if str[0..7] == "get_a_b" at packed_string[0] return branch 1 return wildcard branch return wildcard branch ``` In logical terms, this boils down to checking the length and an `O(N)` check on the string. Albeit the bytecode will be more complex because of all the branches. Another interesting optimization is the "packed string literal" that coalesces all "match arms string slices" into just one string. In the case above, given that one of the arms contains all the necessary strings for all other comparisons, we will create just one string literal. Saving a lot of bytes in the data section. The section below describes how `rustc` deals with this desugaring. I think these choices make more sense to us for two reasons: 1 - Avoid testing common prefixes multiple times will spend less gas in general (needs more testing); 2 - packing all strings will decrease the data section size. This is the bytecode generated in this case: ``` fn return_match_on_str_slice(param: str) -> u64 { match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:22:1 0x0000017c PSHL 0xf ;; [149, 0, 0, 15] 0x00000180 PSHH 0x80000 ;; [150, 8, 0, 0] 0x00000184 MOVE R59 $sp ;; [26, 236, 80, 0] 0x00000188 CFEI 0x90 ;; [145, 0, 0, 144] 0x0000018c MOVE $writable R58 ;; [26, 67, 160, 0] 0x00000190 MOVE R19 R62 ;; [26, 79, 224, 0] match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:23:5 0x00000194 ADDI R17 R59 0x80 ;; 0x00000198 MOVI R18 0x10 ;; 0x0000019c MCP R17 $writable R18 ;; 0x000001a0 MOVI R17 0x7 ;; 0x7 = "get_a_b".len() @ <autogenerated>:1:1 0x000001a4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001a8 EQ $writable $writable R17 ;; a.len() == 0x7 0x000001ac JNZF $writable $zero 0x3c ;; if false jump to 2a0? 0x000001b0 MOVI R17 0x5 ;; we have two arms with length equals 0x5 0x000001b4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001b8 EQ $writable $writable R17 ;; a.len() == 0x5 0x000001bc MOVI R17 0x3e8 ;; 0x3e8 = 1000 (wildcard return value) 0x000001c0 JNZF $writable $zero 0x1 ;; if true jump to 1c8 0x000001c4 JMPF $zero 0x35 ;; if false jump to 29c (will return R17) 0x000001c8 LW $writable R63 0x3 ;; R63 = start of data section, will load 13c 0x000001cc ADD $writable $writable $pc ;; $writable = 0x308 = packed strings 0x000001d0 ADDI R17 R59 0x20 ;; 0x000001d4 SW R59 $writable 0x4 ;; R59 + 0x4 = packed strings 0x000001d8 MOVI $writable 0x7 ;; 0x000001dc SW R59 $writable 0x5 ;; R59 + 0x5 = 0x7 0x000001e0 ADDI $writable R59 0x30 ;; 0x000001e4 MOVI R18 0x10 ;; 0x000001e8 MCP $writable R17 R18 ;; R59 + 0x30 = R59 + 0x20 0x000001ec MOVI R18 0x4 ;; 0x4 = "get_".len() 0x000001f0 LW $writable R59 0x10 ;; 0x000001f4 ADDI $writable $writable 0x0 ;; 0x000001f8 LW R17 R59 0x6 ;; R17 = a.ptr() 0x000001fc ADDI R17 R17 0x0 ;; 0x00000200 MEQ $writable $writable R17 R18 ;; a[0..4] = packed[0..4] 0x00000204 MOVI R17 0x3e8 ;; 0x3e8 = 1000 (wildcard return value) 0x00000208 JNZF $writable $zero 0x1 ;; if true jump to 210 0x0000020c JMPF $zero 0x23 ;; if false jump to 29c (will return R17) .... .data_section: 0x00000300 .bytes as hex ([]), len i0, as ascii "" 0x00000300 .word i18446744073709486084, as hex be bytes ([FF, FF, FF, FF, FF, FF, 00, 04]) 0x00000308 .bytes as hex ([67, 65, 74, 5F, 61, 5F, 62]), len i7, as ascii "get_a_b" 0x00000310 .word i500, as hex be bytes ([00, 00, 00, 00, 00, 00, 01, F4]) 0x00000318 .word i316, as hex be bytes ([00, 00, 00, 00, 00, 00, 01, 3C]) 0x00000320 .word i244, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, F4]) 0x00000328 .word i176, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, B0]) 0x00000330 .word i100, as hex be bytes ([00, 00, 00, 00, 00, 00, 00, 64]) ``` ## How `rustc` desugar `match` For comparison, this is the generated ASM with comments on how Rust tackles this. First, this is the function used: ``` #[inline(never)] fn f(a: &str) -> u64 { match a { "get_method" => 0, "get_tokens" => 1, "get_something_else" => 2, "get_tokens_2" => 3, "clear" => 4, "get_m" => 5, _ => 6, } } ``` This is the LLVM IR generated. There is a match on the length of each string slice arms. The valid range is (5, 18), everything outside of this is the wildcard match arm. This range will be important later. ``` efine internal fastcc noundef i64 @example::f::hdb860bcd6d383112(ptr noalias nocapture noundef nonnull readonly align 1 %a.0, i64 noundef %a.1) unnamed_addr { start: switch i64 %a.1, label %bb13 [ i64 10, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit" i64 18, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit30" i64 12, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit35" i64 5, label %"_ZN73_$LT$$u5b$A$u5d$$u20$as$u20$core..slice..cmp..SlicePartialEq$LT$B$GT$$GT$5equal17h510120b4d3581de7E.exit40" ] ``` this is how "f" is called ``` mov rbx, qword ptr [rsp + 32] mov r14, qword ptr [rsp + 40] mov rsi, qword ptr [rsp + 48] <- length of the string slice mov rdi, r14 <- ptr to string slice call _ZN4main1f17h126a5dfd4e318ebcE ``` this is `f` body. `ja .LBB8_12` jumps into a simple return, returning EAX as 6. It is the wildcard return value. The cleverness of this is that when `RSI` is smaller than 5, it will become negative (because of `add rsi, -5`, wrapping into huge unsigned ints, and will also trigger `JA` (which stands for `Jump Above`), effectively jumping when the slice length is outside of the expected range which is (5, 18). After that, it uses a jump table based on the string length minus 5. Everywhere the string length is invalid, the jump address is `LBB8_12`., still returning `EAX` as 6. ``` _ZN4main1f17h126a5dfd4e318ebcE: .cfi_startproc mov eax, 6 add rsi, -5 cmp rsi, 13 ja .LBB8_12 lea rcx, [rip + .LJTI8_0] movsxd rdx, dword ptr [rcx + 4*rsi] add rdx, rcx jmp rdx ``` ``` .LBB8_12: ret ``` This is the jump table used: ``` .LJTI8_0: .long .LBB8_9-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_2-.LJTI8_0 <- 5th entry is length = 10 (remember we add -5 to the length) .long .LBB8_12-.LJTI8_0 .long .LBB8_8-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_12-.LJTI8_0 .long .LBB8_6-.LJTI8_0 ``` The interesting entry is entry 5, which has two strings: "get_method" and "get_tokens". Here we can see that `rust` actually compares the complete string slice twice. Even though they have an intersection. ``` .LBB8_2: movabs rcx, 7526752397670245735=6874656D5F746567="htem_teg" (inverted "get_meth") xor rcx, qword ptr [rdi] movzx edx, word ptr [rdi + 8] xor rdx, 25711=646F="do" (inverted "od") or rdx, rcx je .LBB8_3 movabs rcx, 7308057365947114855=656B6F745F746567="ekot_teg" (inverted "get_toke") xor rcx, qword ptr [rdi] movzx edx, word ptr [rdi + 8] xor rdx, 29550=736E="sn" (inverted "ns") or rdx, rcx je .LBB8_5 ``` ``` .LBB8_3: xor eax, eax <- returns 0 ret ``` ``` .LBB8_5: mov eax, 1 <- returns 1 ret ``` This is comparable to what `clang` is doing: rust-lang/rust#61961 ## Code and Bytecode This PR also implements code printing when printing bytecode. For now this is only enable for tests. It gnerates something like: ``` match param { "get_a" => { 1u64 }, "get_a_b" => { 2u64 }, "get_b" => { 3u64 }, _ => { 1000u64 }, } @ /home/xunilrj/github/sway/test/src/e2e_vm_tests/test_programs/should_pass/language/match_expressions_all/src/main.sw:23:5 0x00000194 ADDI R17 R59 0x80 ;; 0x00000198 MOVI R18 0x10 ;; 0x0000019c MCP R17 $writable R18 ;; 0x000001a0 MOVI R17 0x7 ;; 0x7 = "get_a_b".len() @ <autogenerated>:1:1 0x000001a4 LW $writable R59 0x11 ;; R59 + 0x11 = a.len() 0x000001a8 EQ $writable $writable R17 ;; a.len() == 0x7 ``` As we can see, not great, but helpful nonetheless. We can (should?) improve this by better "carrying" spans in all transformations and lowerings. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [ ] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [ ] I have requested a review from the relevant team or maintainers. --------- Co-authored-by: Joshua Batty <joshpbatty@gmail.com> Co-authored-by: IGI-111 <igi-111@protonmail.com>
- Loading branch information