Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential optimization: Interesting code generation quirk #115

Open
asiekierka opened this issue Jun 24, 2022 · 0 comments
Open

Potential optimization: Interesting code generation quirk #115

asiekierka opened this issue Jun 24, 2022 · 0 comments

Comments

@asiekierka
Copy link

asiekierka commented Jun 24, 2022

I haven't yet wrapped this into a form reproducible with vanilla gcc-ia16, but here's some notes:

// simplified
void main(void) {
	const uint8_t __far *font_data = asset_map(0);
	for (uint16_t i = 0; i < 2048; i++) {
		((uint16_t*) 0x2000)[i] = font_data[i] * 0x101;
	}
}

// asset_map and dependencies:
extern const void *__rom_bank_offset;
#define BANK_INDEX(x) (((uint8_t) (uint16_t) (&__rom_bank_offset)) | (x))

static inline void outportb(uint8_t port, uint8_t value) {
	__asm volatile (
		"outb %0, %1"
		:
		: "Ral" (value), "Nd" ((uint16_t) port)
	);
}

static inline const void __far* asset_map(uint32_t position) {
	uint8_t idx = BANK_INDEX(position >> 16);
	outportb(IO_ROM_BANK0, idx);
	outportb(IO_ROM_BANK1, idx + 1);
	asm volatile("" ::: "memory");
	return MK_FP(0x2000 | ((position >> 4) & 0xFFF) , (position & 0xF));
}

This is translated into:

   fff81:	ba fe 00             	mov    $0xfe,%dx
   fff84:	88 d4                	mov    %dl,%ah
   fff86:	88 d0                	mov    %dl,%al
   fff88:	e6 c2                	out    %al,$0xc2
   fff8a:	fe c0                	inc    %al
   fff8c:	e6 c3                	out    %al,$0xc3
   fff8e:	ba 00 08             	mov    $0x800,%dx
   fff91:	31 db                	xor    %bx,%bx
   fff93:	b9 00 20             	mov    $0x2000,%cx
   fff96:	89 de                	mov    %bx,%si
   fff98:	d1 e6                	shl    %si
   fff9a:	8e d9                	mov    %cx,%ds
   fff9c:	8a 07                	mov    (%bx),%al
   fff9e:	30 e4                	xor    %ah,%ah
   fffa0:	00 c4                	add    %al,%ah
   fffa2:	36 89 84 00 20       	mov    %ax,%ss:0x2000(%si)
   fffa7:	43                   	inc    %bx
   fffa8:	4a                   	dec    %dx
   fffa9:	75 eb                	jne    fff96 <main+0x16>

The "quirk" is somewhere in the fff81-fff8c area - the usage of %dx. It makes sense, given that I'm using a linker-side pointer to pass a constant, that it uses a word; however, I don't see why it's not loading straight into %ax, and I especially don't see why it's also loading %ah, which is never used. This is consistent across -O2, -O3, and -Os.

EDIT: In an even more bizarre twist, if I add a while (1) {} loop to the end of that code, the generation changes as follows:

   fff81:	ba fe 00             	mov    $0xfe,%dx
   fff84:	88 d4                	mov    %dl,%ah
   fff86:	88 d0                	mov    %dl,%al
   fff88:	e6 c2                	out    %al,$0xc2
   fff8a:	fe c0                	inc    %al
   fff8c:	e6 c3                	out    %al,$0xc3
   fff8e:	b9 00 08             	mov    $0x800,%cx
   fff91:	31 db                	xor    %bx,%bx
   fff93:	ba 00 20             	mov    $0x2000,%dx
   fff96:	89 de                	mov    %bx,%si
   fff98:	d1 e6                	shl    %si
   fff9a:	8e da                	mov    %dx,%ds
   fff9c:	8a 07                	mov    (%bx),%al
   fff9e:	b4 00                	mov    $0x0,%ah
   fffa0:	89 84 00 20          	mov    %ax,0x2000(%si)
   fffa4:	43                   	inc    %bx
   fffa5:	49                   	dec    %cx
   fffa6:	75 ee                	jne    fff96 <main+0x16>
   fffa8:	eb fe                	jmp    fffa8 <main+0x28>

See anything different? The ss: modifier is now gone, causing broken code.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant