RFC: Integer types of unknown signedness for PseudoC #8

maximumspatium · 2017-12-25T22:43:45Z

The PseudoC description currently states that its type system does currently support either signed or unsigned operands. If no type is specified, unsigned is assumed by default.

Memory accesses have to be type-casted like that:

$r11 = *(u32*)$r5

The issue is that in some architectures (PowerPC, x86 etc.) the same load instruction is used to access both signed and unsigned 32bit memory operands. At the same time, these architectures define other load instructions indicating operand's signedness explicitly:

lhz --> (PowerPC, loads unsigned 16bit operand and extend it to the unsigned 32bit operand)
lha --> (PowerPC, loads signed 16bit operand and extend it to the signed 32bit operand)
movz --> (x86, the operand is unsigned/zero extended)
movs --> (x86, the operand is signed/sign extended)

A type inference algorithm usually employs the following steps:

gathering type hints from library calls and already known function prototypes
collecting type hints from low-level code, that is, searching the IR for unambiguous type properties (see below)
building type constraints for each variable
mapping type constraints to particular types of the target language
type propagation

The signedness property is very important. It can be derived from specific instructions including

comparisons and branches
size conversions like zero-extension (for unsigned operands) and sign-extension (for signed operands)
memory loads and stores
arithmetic and logical operators

There are still many instructions that operate in the exact same way regardless of whether its operand is signed or unsigned.

It therefore may be useful to extend PseudoC's type system with the third class indicating unknown operand signedness: x32, x16 etc.

So, the above mentioned example with the memory load could be rewritten as:

$r11 = *(x32*)$r5

Now it's clear that the memory pointed by $r5 contains a 32bit Word whose signedness is unknown (it can be signed or unsigned).

The text was updated successfully, but these errors were encountered:

pfalcon · 2017-12-27T20:36:36Z

In general, that sounds good. Naming is ok by me. Feel free to prepare spec patches with "(tentative)" mark, unless you want to prepare patch for the current codebase to actually handle them ;-).

pfalcon changed the title ~~PseudoC and low-level types~~ RFC: Integer types of unknown signedness for PseudoC Dec 27, 2017

pfalcon mentioned this issue Jan 7, 2018

Erroneous expression propagation in the presence of type casts #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Integer types of unknown signedness for PseudoC #8

RFC: Integer types of unknown signedness for PseudoC #8

maximumspatium commented Dec 25, 2017 •

edited

Loading

pfalcon commented Dec 27, 2017

RFC: Integer types of unknown signedness for PseudoC #8

RFC: Integer types of unknown signedness for PseudoC #8

Comments

maximumspatium commented Dec 25, 2017 • edited Loading

pfalcon commented Dec 27, 2017

maximumspatium commented Dec 25, 2017 •

edited

Loading