Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multi-reg load/store for EncodeToUtf8 #95513

Merged
merged 5 commits into from
Jan 10, 2024

Conversation

SwapnilGaikwad
Copy link
Contributor

This implements the encode to UTF8 algorithm here.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 1, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 1, 2023
@SwapnilGaikwad
Copy link
Contributor Author

Hi @kunalspathak , this is an initial version of encode to UTF8 using multi-register load/stores. This currently fails some asserts in LSRA phase while doing the crossgen for SPC. It fails while emitting StoreVector128x4AndZip. Not sure but seems it's not able to find four consecutive registers to emit ST4.

@danmoseley
Copy link
Member

Does this need an entry in their part notices file?

Change-Id: Ie56b1786cdf8ac8d2067c0ba1fdfd3924dd9ca13
@SwapnilGaikwad
Copy link
Contributor Author

Does this need an entry in their part notices file?

Sorry @danmoseley , I didn't understand which part notices file you're referring to.

@SwapnilGaikwad
Copy link
Contributor Author

Initial benchmarking on N1 system show some good performance results.

| Method                          | Toolchain                                                                    | NumberOfBytes | Mean     | Error   | StdDev  | Median   | Min      | Max      | Ratio | MannWhitney(2%) |
|-------------------------------- |----------------------------------------------------------------------------- |-------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|---------------- |
| Base64Encode                    | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 363.7 ns | 0.27 ns | 0.24 ns | 363.8 ns | 363.1 ns | 363.9 ns |  1.00 | Base            |
| Base64Encode                    | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 185.7 ns | 0.26 ns | 0.22 ns | 185.6 ns | 185.5 ns | 186.2 ns |  0.51 | Faster          |
|                                 |                                                                              |               |          |         |         |          |          |          |       |                 |
| Base64EncodeDestinationTooSmall | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 369.3 ns | 0.09 ns | 0.09 ns | 369.4 ns | 369.2 ns | 369.4 ns |  1.00 | Base            |
| Base64EncodeDestinationTooSmall | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 196.1 ns | 0.05 ns | 0.04 ns | 196.1 ns | 196.1 ns | 196.2 ns |  0.53 | Faster          |
|                                 |                                                                              |               |          |         |         |          |          |          |       |                 |
| ConvertToBase64CharArray        | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun    | 1000          | 560.9 ns | 0.29 ns | 0.23 ns | 561.0 ns | 560.4 ns | 561.1 ns |  1.00 | Base            |
| ConvertToBase64CharArray        | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 400.0 ns | 0.20 ns | 0.19 ns | 399.9 ns | 399.7 ns | 400.3 ns |  0.71 | Faster          |

if (src == srcEnd)
goto DoneExit;
}

end = srcMax - 16;
if ((Ssse3.IsSupported || AdvSimd.Arm64.IsSupported) && BitConverter.IsLittleEndian && (end >= src))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should remove the AdvSimd check here.

With this PR, it will use the vector128 version if the buffer length is <48 && >16.

That's probably the best option for speed, but results in a bigger library.

@kunalspathak
Copy link
Member

@SwapnilGaikwad - do you mind sharing the disassembly?

@SwapnilGaikwad
Copy link
Contributor Author

SwapnilGaikwad commented Dec 13, 2023

@SwapnilGaikwad - do you mind sharing the disassembly?

A separately compiled AdvSimdEncode emits following assembly.

Full assembly
; Assembly listing for method JIT.HardwareIntrinsics.Arm._AdvSimd.Program:AdvSimdEncode(byref,byref,ulong,int,int,ulong,ulong,ulong) (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Unix
; FullOpts code
; optimized code
; fp based frame
; fully interruptible
; No PGO data
; 4 inlinees with PGO data; 13 single block inlinees; 4 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T03] (  4,  4   )   byref  ->   x0         single-def
;  V01 arg1         [V01,T04] (  4,  4   )   byref  ->   x1         single-def
;  V02 arg2         [V02,T02] (  3, 10   )    long  ->   x2         single-def
;* V03 arg3         [V03    ] (  0,  0   )     int  ->  zero-ref    single-def
;* V04 arg4         [V04    ] (  0,  0   )     int  ->  zero-ref    single-def
;* V05 arg5         [V05    ] (  0,  0   )    long  ->  zero-ref    single-def
;* V06 arg6         [V06    ] (  0,  0   )    long  ->  zero-ref    single-def
;* V07 arg7         [V07    ] (  0,  0   )    long  ->  zero-ref    single-def
;  V08 loc0         [V08,T00] (  6, 34   )    long  ->   x3
;  V09 loc1         [V09,T01] (  5, 26   )    long  ->   x4
;* V10 loc2         [V10    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V11 loc3         [V11    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V12 loc4         [V12    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V13 loc5         [V13    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V14 loc6         [V14    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V15 loc7         [V15    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;* V16 loc8         [V16    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;  V17 loc9         [V17,T06] (  5, 33   )  simd16  ->  d16         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;  V18 loc10        [V18,T07] (  5, 33   )  simd16  ->  d17         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;  V19 loc11        [V19,T08] (  5, 33   )  simd16  ->  d18         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;  V20 loc12        [V20,T09] (  5, 33   )  simd16  ->  d19         HFA(simd16)  <System.Runtime.Intrinsics.Vector128`1[ubyte]>
;# V21 OutArgs      [V21    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V22 tmp1         [V22    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>
;* V23 tmp2         [V23    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument"
;* V24 tmp3         [V24    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>
;* V25 tmp4         [V25    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument"
;* V26 tmp5         [V26    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>
;* V27 tmp6         [V27    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument"
;* V28 tmp7         [V28    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "NewObj constructor temp" <System.ReadOnlySpan`1[ubyte]>
;* V29 tmp8         [V29    ] (  0,  0   )  simd16  ->  zero-ref    "spilled call-like call argument"
;* V30 tmp9         [V30    ] (  0,  0   )  struct (48) zero-ref    HFA(simd16)  multireg-ret "Return value temp for multireg return" <System.ValueTuple`3[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V31 tmp10        [V31    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V32 tmp11        [V32    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V33 tmp12        [V33    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V34 tmp13        [V34    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V35 tmp14        [V35    ] (  0,  0   )  struct (64) zero-ref    HFA(simd16)  ld-addr-op "NewObj constructor temp" <System.ValueTuple`4[System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte],System.Runtime.Intrinsics.Vector128`1[ubyte]]>
;* V36 tmp15        [V36    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V37 tmp16        [V37    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V38 tmp17        [V38    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V39 tmp18        [V39    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V40 tmp19        [V40    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V41 tmp20        [V41    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V42 tmp21        [V42    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V43 tmp22        [V43    ] (  0,  0   )  struct (16) zero-ref    ld-addr-op "Inlining Arg" <System.ReadOnlySpan`1[ubyte]>
;* V44 tmp23        [V44    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V22._reference (fldOffset=0x0)" P-INDEP
;* V45 tmp24        [V45    ] (  0,  0   )     int  ->  zero-ref    single-def "field V22._length (fldOffset=0x8)" P-INDEP
;* V46 tmp25        [V46    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V24._reference (fldOffset=0x0)" P-INDEP
;* V47 tmp26        [V47    ] (  0,  0   )     int  ->  zero-ref    single-def "field V24._length (fldOffset=0x8)" P-INDEP
;* V48 tmp27        [V48    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V26._reference (fldOffset=0x0)" P-INDEP
;* V49 tmp28        [V49    ] (  0,  0   )     int  ->  zero-ref    single-def "field V26._length (fldOffset=0x8)" P-INDEP
;* V50 tmp29        [V50    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V28._reference (fldOffset=0x0)" P-INDEP
;* V51 tmp30        [V51    ] (  0,  0   )     int  ->  zero-ref    single-def "field V28._length (fldOffset=0x8)" P-INDEP
;  V52 tmp31        [V52,T11] (  3, 24   )  simd16  ->  d21         HFA(simd16)  "field V30.Item1 (fldOffset=0x0)" P-INDEP
;  V53 tmp32        [V53,T12] (  3, 24   )  simd16  ->  d22         HFA(simd16)  "field V30.Item2 (fldOffset=0x10)" P-INDEP
;  V54 tmp33        [V54,T13] (  3, 24   )  simd16  ->  d23         HFA(simd16)  "field V30.Item3 (fldOffset=0x20)" P-INDEP
;* V55 tmp34        [V55    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item1 (fldOffset=0x0)" P-INDEP
;* V56 tmp35        [V56    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item2 (fldOffset=0x10)" P-INDEP
;* V57 tmp36        [V57    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item3 (fldOffset=0x20)" P-INDEP
;* V58 tmp37        [V58    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V31.Item4 (fldOffset=0x30)" P-INDEP
;* V59 tmp38        [V59    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item1 (fldOffset=0x0)" P-INDEP
;* V60 tmp39        [V60    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item2 (fldOffset=0x10)" P-INDEP
;* V61 tmp40        [V61    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item3 (fldOffset=0x20)" P-INDEP
;* V62 tmp41        [V62    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V32.Item4 (fldOffset=0x30)" P-INDEP
;* V63 tmp42        [V63    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item1 (fldOffset=0x0)" P-INDEP
;* V64 tmp43        [V64    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item2 (fldOffset=0x10)" P-INDEP
;* V65 tmp44        [V65    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item3 (fldOffset=0x20)" P-INDEP
;* V66 tmp45        [V66    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V33.Item4 (fldOffset=0x30)" P-INDEP
;* V67 tmp46        [V67    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item1 (fldOffset=0x0)" P-INDEP
;* V68 tmp47        [V68    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item2 (fldOffset=0x10)" P-INDEP
;* V69 tmp48        [V69    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item3 (fldOffset=0x20)" P-INDEP
;* V70 tmp49        [V70    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V34.Item4 (fldOffset=0x30)" P-INDEP
;* V71 tmp50        [V71    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item1 (fldOffset=0x0)" P-INDEP
;* V72 tmp51        [V72    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item2 (fldOffset=0x10)" P-INDEP
;* V73 tmp52        [V73    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item3 (fldOffset=0x20)" P-INDEP
;* V74 tmp53        [V74    ] (  0,  0   )  simd16  ->  zero-ref    HFA(simd16)  "field V35.Item4 (fldOffset=0x30)" P-INDEP
;* V75 tmp54        [V75    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V36._reference (fldOffset=0x0)" P-INDEP
;* V76 tmp55        [V76    ] (  0,  0   )     int  ->  zero-ref    single-def "field V36._length (fldOffset=0x8)" P-INDEP
;* V77 tmp56        [V77    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V37._reference (fldOffset=0x0)" P-INDEP
;* V78 tmp57        [V78    ] (  0,  0   )     int  ->  zero-ref    "field V37._length (fldOffset=0x8)" P-INDEP
;* V79 tmp58        [V79    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V38._reference (fldOffset=0x0)" P-INDEP
;* V80 tmp59        [V80    ] (  0,  0   )     int  ->  zero-ref    single-def "field V38._length (fldOffset=0x8)" P-INDEP
;* V81 tmp60        [V81    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V39._reference (fldOffset=0x0)" P-INDEP
;* V82 tmp61        [V82    ] (  0,  0   )     int  ->  zero-ref    "field V39._length (fldOffset=0x8)" P-INDEP
;* V83 tmp62        [V83    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V40._reference (fldOffset=0x0)" P-INDEP
;* V84 tmp63        [V84    ] (  0,  0   )     int  ->  zero-ref    single-def "field V40._length (fldOffset=0x8)" P-INDEP
;* V85 tmp64        [V85    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V41._reference (fldOffset=0x0)" P-INDEP
;* V86 tmp65        [V86    ] (  0,  0   )     int  ->  zero-ref    "field V41._length (fldOffset=0x8)" P-INDEP
;* V87 tmp66        [V87    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V42._reference (fldOffset=0x0)" P-INDEP
;* V88 tmp67        [V88    ] (  0,  0   )     int  ->  zero-ref    single-def "field V42._length (fldOffset=0x8)" P-INDEP
;* V89 tmp68        [V89    ] (  0,  0   )   byref  ->  zero-ref    single-def "field V43._reference (fldOffset=0x0)" P-INDEP
;* V90 tmp69        [V90    ] (  0,  0   )     int  ->  zero-ref    "field V43._length (fldOffset=0x8)" P-INDEP
;* V91 cse0         [V91,T05] (  0,  0   )    long  ->  zero-ref    "CSE - aggressive"
;  V92 cse1         [V92,T10] (  4, 25   )  simd16  ->  d20         hoist "CSE - aggressive"
;
; Lcl frame size = 0

G_M25339_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
						;; size=8 bbWeight=1 PerfScore 1.50
G_M25339_IG02:  ;; offset=0x0008
            ldr     x3, [x0]
            ldr     x4, [x1]
            ldr     q16, [@RWD00]
            ldr     q17, [@RWD16]
            ldr     q18, [@RWD32]
            ldr     q19, [@RWD48]
            movi    v20.16b, #0x3F
            b       G_M25339_IG03
            align   [0 bytes for IG03]
            align   [0 bytes]
            align   [0 bytes]
            align   [0 bytes]
						;; size=32 bbWeight=1 PerfScore 15.50
G_M25339_IG03:  ;; offset=0x0028
            ld3     {v21.16b, v22.16b, v23.16b}, [x3]
            ushr    v24.16b, v21.16b, #2
            tbl     v24.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v24.16b
            ushr    v25.16b, v22.16b, #4
            sli     v25.16b, v21.16b, #4
            and     v21.16b, v25.16b, v20.16b
            tbl     v21.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v21.16b
            ushr    v25.16b, v23.16b, #6
            sli     v25.16b, v22.16b, #2
            and     v22.16b, v25.16b, v20.16b
            tbl     v22.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v22.16b
            and     v23.16b, v23.16b, v20.16b
            tbl     v23.16b, {v16.16b, v17.16b, v18.16b, v19.16b}, v23.16b
            mov     v25.16b, v21.16b
            mov     v26.16b, v22.16b
            mov     v27.16b, v23.16b
            st4     {v24.16b, v25.16b, v26.16b, v27.16b}, [x4]
            add     x3, x3, #48
            add     x4, x4, #64
            cmp     x3, x2
            bls     G_M25339_IG03
						;; size=84 bbWeight=8 PerfScore 284.00
G_M25339_IG04:  ;; offset=0x007C
            str     x3, [x0]
            str     x4, [x1]
						;; size=8 bbWeight=1 PerfScore 2.00
G_M25339_IG05:  ;; offset=0x0084
            ldp     fp, lr, [sp], #0x10
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00
RWD00  	dq	4847464544434241h, 504F4E4D4C4B4A49h
RWD16  	dq	5857565554535251h, 6665646362615A59h
RWD32  	dq	6E6D6C6B6A696867h, 767574737271706Fh
RWD48  	dq	333231307A797877h, 2F2B393837363534h


; Total bytes of code 140, prolog size 8, PerfScore 319.00, instruction count 39, allocated bytes for code 140 (MethodHash=7c379d04) for method JIT.HardwareIntrinsics.Arm._AdvSimd.Program:AdvSimdEncode(byref,byref,ulong,int,int,ulong,ulong,ulong) (FullOpts)
; ============================================================

@kunalspathak
Copy link
Member

Thanks @SwapnilGaikwad for sharing the disassembly. I wanted to see the code quality when consecutive registers are involved.

Vector128<byte> res4;
Vector128<byte> tbl_enc1 = Vector128.Create("ABCDEFGHIJKLMNOP"u8).AsByte();
Vector128<byte> tbl_enc2 = Vector128.Create("QRSTUVWXYZabcdef"u8).AsByte();
Vector128<byte> tbl_enc3 = Vector128.Create("ghijklmnopqrstuv"u8).AsByte();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can load this encoding table from EncodingMap from Line #774. This could help to reduce the code size but loading from memory/cache would be slightly slower than the regs. Benchmarks didn't show signficant difference. from-mem-artifacts load data from EncodingMap.

| Method                          | Toolchain                                                                             | NumberOfBytes | Mean     | Error   | StdDev  | Median   | Min      | Max      | Ratio | MannWhitney(2%) |
|-------------------------------- |-------------------------------------------------------------------------------------- |-------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|---------------- |
| Base64Encode                    | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun             | 1000          | 363.8 ns | 0.14 ns | 0.12 ns | 363.8 ns | 363.5 ns | 363.9 ns |  1.00 | Base            |
| Base64Encode                    | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun          | 1000          | 196.8 ns | 0.02 ns | 0.02 ns | 196.8 ns | 196.8 ns | 196.9 ns |  0.54 | Faster          |
| Base64Encode                    | /runtime/from-mem-artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 195.3 ns | 0.08 ns | 0.07 ns | 195.3 ns | 195.2 ns | 195.5 ns |  0.54 | Faster          |
|                                 |                                                                                       |               |          |         |         |          |          |          |       |                 |
| Base64EncodeDestinationTooSmall | /main/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun             | 1000          | 368.2 ns | 0.30 ns | 0.28 ns | 368.2 ns | 367.8 ns | 368.7 ns |  1.00 | Base            |
| Base64EncodeDestinationTooSmall | /runtime/artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun          | 1000          | 191.5 ns | 0.04 ns | 0.04 ns | 191.5 ns | 191.5 ns | 191.6 ns |  0.52 | Faster          |
| Base64EncodeDestinationTooSmall | /runtime/from-mem-artifacts/tests/coreclr/linux.arm64.Release/Tests/Core_Root/corerun | 1000          | 194.0 ns | 0.22 ns | 0.20 ns | 193.9 ns | 193.7 ns | 194.4 ns |  0.53 | Faster          |

Would you suggest to read encoding table from EncodingMap?

@kunalspathak
Copy link
Member

/azp run runtime-coreclr libraries-jitstress

@kunalspathak
Copy link
Member

/azp run runtime-coreclr libraries-jitstress2-jitstressregs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member

/azp run runtime-coreclr libraries-jitstressregs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Member

Does this need an entry in their part notices file?

Sorry @danmoseley , I didn't understand which part notices file you're referring to.

I guess he is talking about the reference made in https://github.com/dotnet/runtime/pull/95513/files#diff-b3b9edcf4c0d62e78954d826c44005cffb306b6ccf155f1a9228669229b7e765R496, but not sure where exactly to add this. @danmoseley - can you please confirm?

@teo-tsirpanis teo-tsirpanis added area-System.Buffers and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 31, 2023
@ghost
Copy link

ghost commented Dec 31, 2023

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

This implements the encode to UTF8 algorithm here.

Author: SwapnilGaikwad
Assignees: -
Labels:

area-System.Buffers, community-contribution

Milestone: -

@kunalspathak
Copy link
Member

ping @danmoseley

@danmoseley
Copy link
Member

Oops, yes, that was what caught my eye. Generally if we use significant ideas/code from elsewhere we add a credit in THIRD-PARTY-NOTICES.TXT at the root. Up to you.

@a74nh
Copy link
Contributor

a74nh commented Jan 9, 2024

Oops, yes, that was what caught my eye. Generally if we use significant ideas/code from elsewhere we add a credit in THIRD-PARTY-NOTICES.TXT at the root. Up to you.

Looks like there is already an entry in there, but it's not immediately obvious.

License notice for vectorized base64 encoding / decoding

Contains an extra copy of the license from https://github.com/aklomp/base64/blob/master/LICENSE

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

[CompExactlyDependsOn(typeof(AdvSimd.Arm64))]
private static unsafe void AdvSimdEncode(ref byte* srcBytes, ref byte* destBytes, byte* srcEnd, int sourceLength, int destLength, byte* srcStart, byte* destStart)
{
// C# implementatino of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// C# implementatino of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c
// C# implementation of https://github.com/aklomp/base64/blob/3a5add8652076612a8407627a42c768736a4263f/lib/arch/neon64/enc_loop.c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this in a follow-up PR

@kunalspathak kunalspathak merged commit fdb03ca into dotnet:main Jan 10, 2024
178 checks passed
@SwapnilGaikwad SwapnilGaikwad deleted the github-encode-utf8 branch January 10, 2024 12:16
kunalspathak added a commit to kunalspathak/runtime that referenced this pull request Jan 13, 2024
kunalspathak added a commit that referenced this pull request Jan 16, 2024
* Revert "[libs] Skip AdvSimdEncode on Mono (#96829)"

This reverts commit 1a76e37.

* Revert "Use multi-reg load/store for EncodeToUtf8 (#95513)"

This reverts commit fdb03ca.

* Wrap load/store vector APIs in '#if false'

* Disable load/store vector tests

* remove the trailing space
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
* Use multi-reg load/store for EncodeToUtf8

* Use the fixed version of multi-reg store

* Update variable naming
tmds pushed a commit to tmds/runtime that referenced this pull request Jan 23, 2024
…#96944)

* Revert "[libs] Skip AdvSimdEncode on Mono (dotnet#96829)"

This reverts commit 1a76e37.

* Revert "Use multi-reg load/store for EncodeToUtf8 (dotnet#95513)"

This reverts commit fdb03ca.

* Wrap load/store vector APIs in '#if false'

* Disable load/store vector tests

* remove the trailing space
@github-actions github-actions bot locked and limited conversation to collaborators Feb 10, 2024
@richlander
Copy link
Member

I assume this is the API being discussed. If so, it would be good to put in the initial comment so that it is easy for folks to fine.

https://learn.microsoft.com/dotnet/api/system.buffers.text.base64.encodetoutf8

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Buffers community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants