cg: OW does not detect a code sequence for a byte swap (endian change) #1321

winspool · 2024-08-08T22:26:42Z

Due to the current work on Elf64 support, which might need to swap endian,
i tested a code example to swap 64bit endian.
(I extended the example with a 32bit and a 16bit version)

Additionally, i added endian change code (htons/htonl)
and the SWAPNC_16/SWAPNC_32/SWAPNC_64 macros from the OW source.
bswap_demo.c.txt

Such a code sequence is not detected by OW.
The resulting code generated by clang -m32 is much nicer (-O2, -O3, -Os).
(a rol for 16bit, one bswap for 32bit and two bswap for 64bit)
(bswap is available since 486)

bswap_demo_clang32-Os.o:     Dateiformat elf32-i386
Disassembly of section .text:
00000000 <my_htons>:
   0:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
   5:	66 c1 c0 08          	rol    $0x8,%ax
   9:	c3                   	ret

0000000a <my_htonl>:
   a:	8b 44 24 04          	mov    0x4(%esp),%eax
   e:	0f c8                	bswap  %eax
  10:	c3                   	ret

00000011 <use_SWAPNC_16>:
  11:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
  16:	66 c1 c0 08          	rol    $0x8,%ax
  1a:	c3                   	ret

0000001b <use_SWAPNC_32>:
  1b:	8b 44 24 04          	mov    0x4(%esp),%eax
  1f:	0f c8                	bswap  %eax
  21:	c3                   	ret

00000022 <use_SWAPNC_64>:
  22:	8b 54 24 04          	mov    0x4(%esp),%edx
  26:	8b 44 24 08          	mov    0x8(%esp),%eax
  2a:	0f c8                	bswap  %eax
  2c:	0f ca                	bswap  %edx
  2e:	c3                   	ret

The code generated by OW does not use rol or bswap (wcc386 gets -6r)

$ wdis bswap_demo_owcc-Os.o 
0000				my_htons_:
0000  52				push		edx
0001  0F B7 D0				movzx		edx,ax
0004  30 E4				xor		ah,ah
0006  C1 FA 08				sar		edx,0x08
0009  0F B7 C0				movzx		eax,ax
000C  81 E2 FF 00 00 00			and		edx,0x000000ff
0012  C1 E0 08				shl		eax,0x08
0015				L$1:
0015  09 C2				or		edx,eax
0017  89 D0				mov		eax,edx
0019  5A				pop		edx
001A  C3				ret
Routine Size: 27 bytes,    Routine Base: _TEXT + 0000

001B				my_htonl_:
001B  51				push		ecx
001C  52				push		edx
001D  89 C2				mov		edx,eax
001F  89 C1				mov		ecx,eax
0021  C1 EA 10				shr		edx,0x10
0024  C1 E9 18				shr		ecx,0x18
0027  81 E2 FF 00 00 00			and		edx,0x000000ff
002D  81 E1 FF 00 00 00			and		ecx,0x000000ff
0033  C1 E2 08				shl		edx,0x08
0036  09 D1				or		ecx,edx
0038  89 C2				mov		edx,eax
003A  C1 EA 08				shr		edx,0x08
003D  81 E2 FF 00 00 00			and		edx,0x000000ff
0043  25 FF 00 00 00			and		eax,0x000000ff
0048  C1 E2 10				shl		edx,0x10
004B  C1 E0 18				shl		eax,0x18
004E				L$2:
004E  09 CA				or		edx,ecx
0050  09 D0				or		eax,edx
0052  5A				pop		edx
0053  59				pop		ecx
0054  C3				ret
Routine Size: 58 bytes,    Routine Base: _TEXT + 001B

0055				use_SWAPNC_16_:
0055  52				push		edx
0056  89 C2				mov		edx,eax
0058  30 E6				xor		dh,ah
005A  30 C0				xor		al,al
005C  0F B7 D2				movzx		edx,dx
005F  0F B7 C0				movzx		eax,ax
0062  C1 E2 08				shl		edx,0x08
0065  C1 E8 08				shr		eax,0x08
0068  EB AB				jmp		L$1
Routine Size: 21 bytes,    Routine Base: _TEXT + 0055

006A				use_SWAPNC_32_:
006A  51				push		ecx
006B  52				push		edx
006C  89 C1				mov		ecx,eax
006E  89 C2				mov		edx,eax
0070  81 E1 FF 00 00 00			and		ecx,0x000000ff
0076  81 E2 00 FF 00 00			and		edx,0x0000ff00
007C  C1 E1 18				shl		ecx,0x18
007F  C1 E2 08				shl		edx,0x08
0082  09 CA				or		edx,ecx
0084  89 C1				mov		ecx,eax
0086  81 E1 00 00 FF 00			and		ecx,0x00ff0000
008C  25 00 00 00 FF			and		eax,0xff000000
0091  C1 E9 08				shr		ecx,0x08
0094  C1 E8 18				shr		eax,0x18
0097  EB B5				jmp		L$2
Routine Size: 47 bytes,    Routine Base: _TEXT + 006A

0099				use_SWAPNC_64_:
...
01AA  C3				ret
Routine Size: 274 bytes,    Routine Base: _TEXT + 0099

(Other functions skipped)

OpenWatcom looks really bad in the examples using the SWAPNC_* macros from the OW source
(21 / 47 / 274 byte with 7 subfunction calls, args in registers)
compared to clang (10 / 7 / 13 byte, args on stack)

Unfortunately, i have no idea, how the target code is selected in the OW code generator.

There might be different examples, which OW can handle better, but i don't know that.

Usage count of such a code sequence is likely low,
but the OW code size is really huge (args in register)
compared to the clang generated code (args on stack).

The text was updated successfully, but these errors were encountered:

winspool · 2024-08-08T22:53:33Z

Might be adding intrinsics for byte swap (16/32/64 bits) a simple way to handle such cases?

jmalak · 2024-08-08T22:58:54Z

I don't understand what you are reporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cg: OW does not detect a code sequence for a byte swap (endian change) #1321

cg: OW does not detect a code sequence for a byte swap (endian change) #1321

winspool commented Aug 8, 2024

winspool commented Aug 8, 2024

jmalak commented Aug 8, 2024

cg: OW does not detect a code sequence for a byte swap (endian change) #1321

cg: OW does not detect a code sequence for a byte swap (endian change) #1321

Comments

winspool commented Aug 8, 2024

winspool commented Aug 8, 2024

jmalak commented Aug 8, 2024