You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PseudoC description currently states that its type system does currently support either signed or unsigned operands. If no type is specified, unsigned is assumed by default.
Memory accesses have to be type-casted like that:
$r11 = *(u32*)$r5
The issue is that in some architectures (PowerPC, x86 etc.) the same load instruction is used to access both signed and unsigned 32bit memory operands. At the same time, these architectures define other load instructions indicating operand's signedness explicitly:
lhz --> (PowerPC, loads unsigned 16bit operand and extend it to the unsigned 32bit operand)
lha --> (PowerPC, loads signed 16bit operand and extend it to the signed 32bit operand)
movz --> (x86, the operand is unsigned/zero extended)
movs --> (x86, the operand is signed/sign extended)
A type inference algorithm usually employs the following steps:
gathering type hints from library calls and already known function prototypes
collecting type hints from low-level code, that is, searching the IR for unambiguous type properties (see below)
building type constraints for each variable
mapping type constraints to particular types of the target language
type propagation
The signedness property is very important. It can be derived from specific instructions including
comparisons and branches
size conversions like zero-extension (for unsigned operands) and sign-extension (for signed operands)
memory loads and stores
arithmetic and logical operators
There are still many instructions that operate in the exact same way regardless of whether its operand is signed or unsigned.
It therefore may be useful to extend PseudoC's type system with the third class indicating unknown operand signedness: x32, x16 etc.
So, the above mentioned example with the memory load could be rewritten as:
$r11 = *(x32*)$r5
Now it's clear that the memory pointed by $r5 contains a 32bit Word whose signedness is unknown (it can be signed or unsigned).
The text was updated successfully, but these errors were encountered:
In general, that sounds good. Naming is ok by me. Feel free to prepare spec patches with "(tentative)" mark, unless you want to prepare patch for the current codebase to actually handle them ;-).
pfalcon
changed the title
PseudoC and low-level types
RFC: Integer types of unknown signedness for PseudoC
Dec 27, 2017
The PseudoC description currently states that its type system does currently support either signed or unsigned operands. If no type is specified,
unsigned
is assumed by default.Memory accesses have to be type-casted like that:
$r11 = *(u32*)$r5
The issue is that in some architectures (PowerPC, x86 etc.) the same
load
instruction is used to access both signed and unsigned 32bit memory operands. At the same time, these architectures define otherload
instructions indicating operand's signedness explicitly:A type inference algorithm usually employs the following steps:
The signedness property is very important. It can be derived from specific instructions including
There are still many instructions that operate in the exact same way regardless of whether its operand is signed or unsigned.
It therefore may be useful to extend PseudoC's type system with the third class indicating unknown operand signedness:
x32, x16 etc.
So, the above mentioned example with the memory load could be rewritten as:
$r11 = *(x32*)$r5
Now it's clear that the memory pointed by $r5 contains a 32bit Word whose signedness is unknown (it can be signed or unsigned).
The text was updated successfully, but these errors were encountered: