Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: Findings from the Microsoft Blogs #6

Open
Eeveelution opened this issue Nov 8, 2022 · 0 comments
Open

Research: Findings from the Microsoft Blogs #6

Eeveelution opened this issue Nov 8, 2022 · 0 comments

Comments

@Eeveelution
Copy link
Member

Eeveelution commented Nov 8, 2022

GR0 similar to PR0 is hardwired to 0 (PR0 is hardwired to 1) and writing to it triggers a processor exception.

GR1 is called the Global Pointer and points to the current function's global variables because Itanium has no absolute addressing mode.

In the Win32 calling convention for Itanium GR8...GR11 are used for return values
GR12 is the stack pointer (unknown if Itanium generally or Win32 only)

the NotAThing bit is used for speculative execution to indicate the Value of the register isn't valid yet. Accessing such registers in for example arithmetic operations will spread the NotAThing bit to other Registers aswell, and alot of instructions disallow NotAThing'ed registers meaning uninitialized variable access could lead to a program crash.

FR0 is hardwired to 0.0
FR1 is hardwired to 1.0

same as GR's GR/FR 0 through 31 are static, 32 to 127 are rotating. Through the Win32 calling convention however FR0...FR5 and FR16...FR31 are preserved across calls, others are scratch.

PR0...PR15 are static PR16...PR63 are rotating

In Win32 calling convention PR0...PR5 are preserved while PR6...PR63 are scratch

BR0 in Win32 calling convention is the return address, it is automatically set when br.call is executed.

In Win32 calling convention BR1...BR5 are preserved while BR6 and BR7 are scratch.

BSP is a Application Register (AR) which is called ia64's second stack pointer, which grows downwards as opposed to the normal stack which grows upwards, and it's used to store register states from long ago, I speculate that this is where the RSE (Register Stack Engine) saves registers in case of a allocation requiring more registers than are available.


Stops are used as a indication that the instruction after the stop relies on data that may have been processed in the instructions before the stop, which means the instructions that are before the stop can be executed in parallel.

A sequence of instructions without a single stop is called a instruction group

  • Exceptions to the 'no dependencies in an instruction group' are that branch instructions are allowed to depend on PRs and or BRs set up earlier
  • The result of a successfull ld.c is allowed without a stop
  • Whatever this means:
    "Comparison instructions .and, .andcm, .or, and .orcm are allowed to combine with others of the same type into the same targets. (In other words, you can combine two .ands, but not an .and and an .or.)"
  • Writing to registers read previously is allowed
  • 2 instructions in the same group are not allowed to write to the same register

CONCEPTUAL SO FAR
On entry to a function, assuming the function takes in 2 parameters, because starting at GR32 the stacked registers begin, this is where function parameters go, GR32 is parameter 1, GR33 is parameter 2, immediately afterwards are the private local registers, assuming the function requires 4 registers for private use GR34, GR35, GR36, GR37 would be local registers, after those come the output registers, lets assume the function wants to call a function which takes in 3 parameters, it would put those into registers R38, R39 and R40, so it needs to be accounted for what sort of functions the function is calling to allocate enough register to be able to hold the outputs of the functions its calling.

Input and Local Registers are collectively called the local region, the Input and Local and Output registers are collectively known as the register frame.

Any registers higher than the last output register are off limits to the function, they do not exist and trying to access them is disallowed.

the alloc instruction takes in first in what register to store the previous register frame state, how many input registers, how many local registers, and how many output registers and lastly how many rotating registers to allocate for the function.

Afterwards the return address is immediately set as such mov r<x> = b0

stopped here:
image

on 3

END OF CONCEPTUAL

Sources

The Itanium processor, part 1: Warming up
The Itanium processor, part 2: Instruction encoding, templates, and stops
The Itanium processor, part 3: The Windows calling convention, how parameters are passed
The Itanium processor, part 3b: How does spilling actually work?
The Itanium processor, part 4: The Windows calling convention, leaf functions
The Itanium processor, part 5: The GP register, calling functions, and function pointers
The Itanium processor, part 6: Calculating conditionals
The Itanium processor, part 7: Speculative loads
The Itanium processor, part 8: Advanced loads
The Itanium processor, part 9: Counted loops and loop pipelining
The Itanium processor, part 10: Register rotation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant