Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Integer types of unknown signedness for PseudoC #8

Open
maximumspatium opened this issue Dec 25, 2017 · 1 comment
Open

RFC: Integer types of unknown signedness for PseudoC #8

maximumspatium opened this issue Dec 25, 2017 · 1 comment

Comments

@maximumspatium
Copy link
Contributor

maximumspatium commented Dec 25, 2017

The PseudoC description currently states that its type system does currently support either signed or unsigned operands. If no type is specified, unsigned is assumed by default.

Memory accesses have to be type-casted like that:

$r11 = *(u32*)$r5

The issue is that in some architectures (PowerPC, x86 etc.) the same load instruction is used to access both signed and unsigned 32bit memory operands. At the same time, these architectures define other load instructions indicating operand's signedness explicitly:

lhz --> (PowerPC, loads unsigned 16bit operand and extend it to the unsigned 32bit operand)
lha --> (PowerPC, loads signed 16bit operand and extend it to the signed 32bit operand)
movz --> (x86, the operand is unsigned/zero extended)
movs --> (x86, the operand is signed/sign extended)

A type inference algorithm usually employs the following steps:

  • gathering type hints from library calls and already known function prototypes
  • collecting type hints from low-level code, that is, searching the IR for unambiguous type properties (see below)
  • building type constraints for each variable
  • mapping type constraints to particular types of the target language
  • type propagation

The signedness property is very important. It can be derived from specific instructions including

  • comparisons and branches
  • size conversions like zero-extension (for unsigned operands) and sign-extension (for signed operands)
  • memory loads and stores
  • arithmetic and logical operators

There are still many instructions that operate in the exact same way regardless of whether its operand is signed or unsigned.

It therefore may be useful to extend PseudoC's type system with the third class indicating unknown operand signedness: x32, x16 etc.

So, the above mentioned example with the memory load could be rewritten as:

$r11 = *(x32*)$r5

Now it's clear that the memory pointed by $r5 contains a 32bit Word whose signedness is unknown (it can be signed or unsigned).

@pfalcon
Copy link
Owner

pfalcon commented Dec 27, 2017

In general, that sounds good. Naming is ok by me. Feel free to prepare spec patches with "(tentative)" mark, unless you want to prepare patch for the current codebase to actually handle them ;-).

@pfalcon pfalcon changed the title PseudoC and low-level types RFC: Integer types of unknown signedness for PseudoC Dec 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants