You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GR0 similar to PR0 is hardwired to 0 (PR0 is hardwired to 1) and writing to it triggers a processor exception.
GR1 is called the Global Pointer and points to the current function's global variables because Itanium has no absolute addressing mode.
In the Win32 calling convention for Itanium GR8...GR11 are used for return values GR12 is the stack pointer (unknown if Itanium generally or Win32 only)
the NotAThing bit is used for speculative execution to indicate the Value of the register isn't valid yet. Accessing such registers in for example arithmetic operations will spread the NotAThing bit to other Registers aswell, and alot of instructions disallow NotAThing'ed registers meaning uninitialized variable access could lead to a program crash.
FR0 is hardwired to 0.0 FR1 is hardwired to 1.0
same as GR's GR/FR 0 through 31 are static, 32 to 127 are rotating. Through the Win32 calling convention however FR0...FR5 and FR16...FR31 are preserved across calls, others are scratch.
PR0...PR15 are static PR16...PR63 are rotating
In Win32 calling convention PR0...PR5 are preserved while PR6...PR63 are scratch
BR0 in Win32 calling convention is the return address, it is automatically set when br.call is executed.
In Win32 calling convention BR1...BR5 are preserved while BR6 and BR7 are scratch.
BSP is a Application Register (AR) which is called ia64's second stack pointer, which grows downwards as opposed to the normal stack which grows upwards, and it's used to store register states from long ago, I speculate that this is where the RSE (Register Stack Engine) saves registers in case of a allocation requiring more registers than are available.
Stops are used as a indication that the instruction after the stop relies on data that may have been processed in the instructions before the stop, which means the instructions that are before the stop can be executed in parallel.
A sequence of instructions without a single stop is called a instruction group
Exceptions to the 'no dependencies in an instruction group' are that branch instructions are allowed to depend on PRs and or BRs set up earlier
The result of a successfull ld.c is allowed without a stop
Whatever this means:
"Comparison instructions .and, .andcm, .or, and .orcm are allowed to combine with others of the same type into the same targets. (In other words, you can combine two .ands, but not an .and and an .or.)"
Writing to registers read previously is allowed
2 instructions in the same group are not allowed to write to the same register
CONCEPTUAL SO FAR
On entry to a function, assuming the function takes in 2 parameters, because starting at GR32 the stacked registers begin, this is where function parameters go, GR32 is parameter 1, GR33 is parameter 2, immediately afterwards are the private local registers, assuming the function requires 4 registers for private use GR34, GR35, GR36, GR37 would be local registers, after those come the output registers, lets assume the function wants to call a function which takes in 3 parameters, it would put those into registers R38, R39 and R40, so it needs to be accounted for what sort of functions the function is calling to allocate enough register to be able to hold the outputs of the functions its calling.
Input and Local Registers are collectively called the local region, the Input and Local and Output registers are collectively known as the register frame.
Any registers higher than the last output register are off limits to the function, they do not exist and trying to access them is disallowed.
the alloc instruction takes in first in what register to store the previous register frame state, how many input registers, how many local registers, and how many output registers and lastly how many rotating registers to allocate for the function.
Afterwards the return address is immediately set as such mov r<x> = b0
GR0
similar toPR0
is hardwired to 0 (PR0 is hardwired to 1) and writing to it triggers a processor exception.GR1
is called the Global Pointer and points to the current function's global variables because Itanium has no absolute addressing mode.In the Win32 calling convention for Itanium
GR8...GR11
are used for return valuesGR12
is the stack pointer (unknown if Itanium generally or Win32 only)the
NotAThing
bit is used for speculative execution to indicate the Value of the register isn't valid yet. Accessing such registers in for example arithmetic operations will spread the NotAThing bit to other Registers aswell, and alot of instructions disallow NotAThing'ed registers meaning uninitialized variable access could lead to a program crash.FR0
is hardwired to 0.0FR1
is hardwired to 1.0same as
GR
's GR/FR 0 through 31 are static, 32 to 127 are rotating. Through the Win32 calling convention howeverFR0...FR5
andFR16...FR31
are preserved across calls, others are scratch.PR0...PR15
are staticPR16...PR63
are rotatingIn Win32 calling convention
PR0...PR5
are preserved whilePR6...PR63
are scratchBR0
in Win32 calling convention is the return address, it is automatically set whenbr.call
is executed.In Win32 calling convention
BR1...BR5
are preserved whileBR6
andBR7
are scratch.BSP
is a Application Register (AR) which is called ia64's second stack pointer, which grows downwards as opposed to the normal stack which grows upwards, and it's used to store register states from long ago, I speculate that this is where the RSE (Register Stack Engine) saves registers in case of a allocation requiring more registers than are available.Stops are used as a indication that the instruction after the stop relies on data that may have been processed in the instructions before the stop, which means the instructions that are before the stop can be executed in parallel.
A sequence of instructions without a single stop is called a instruction group
ld.c
is allowed without a stop"Comparison instructions .and, .andcm, .or, and .orcm are allowed to combine with others of the same type into the same targets. (In other words, you can combine two .ands, but not an .and and an .or.)"
CONCEPTUAL SO FAR
On entry to a function, assuming the function takes in 2 parameters, because starting at
GR32
the stacked registers begin, this is where function parameters go,GR32
is parameter 1,GR33
is parameter 2, immediately afterwards are the private local registers, assuming the function requires 4 registers for private useGR34
,GR35
,GR36
,GR37
would be local registers, after those come the output registers, lets assume the function wants to call a function which takes in 3 parameters, it would put those into registersR38
,R39
andR40
, so it needs to be accounted for what sort of functions the function is calling to allocate enough register to be able to hold the outputs of the functions its calling.Input and Local Registers are collectively called the local region, the Input and Local and Output registers are collectively known as the register frame.
Any registers higher than the last output register are off limits to the function, they do not exist and trying to access them is disallowed.
the
alloc
instruction takes in first in what register to store the previous register frame state, how many input registers, how many local registers, and how many output registers and lastly how many rotating registers to allocate for the function.Afterwards the return address is immediately set as such
mov r<x> = b0
stopped here:
on 3
END OF CONCEPTUAL
Sources
The Itanium processor, part 1: Warming up
The Itanium processor, part 2: Instruction encoding, templates, and stops
The Itanium processor, part 3: The Windows calling convention, how parameters are passed
The Itanium processor, part 3b: How does spilling actually work?
The Itanium processor, part 4: The Windows calling convention, leaf functions
The Itanium processor, part 5: The GP register, calling functions, and function pointers
The Itanium processor, part 6: Calculating conditionals
The Itanium processor, part 7: Speculative loads
The Itanium processor, part 8: Advanced loads
The Itanium processor, part 9: Counted loops and loop pipelining
The Itanium processor, part 10: Register rotation
The text was updated successfully, but these errors were encountered: