basic bitwise operations (and/xor in particular) be dependent on CCRegClass (condition code) registers in x86 (alongside src1, src2, etc). #398
Replies: 2 comments
-
This is definitely a performance bug in gem5 but specific to X86 ISA model only. This doesn't happen with ARM ISA model. In X86 registers RAX, EAX, AH, AL refer to same 64-bit register so they are mapped to same physical register. If the first instruction writes to RAX but the next one only writes to lower 8-bits the second instruction must wait for the first instruction to complete so that it could update lower 8-bits since the whole register is treated a one 64-bit register. To ensure correctness before modifying any destination it is first read and then updated. This limits ILP of X86 model. |
Beta Was this translation helpful? Give feedback.
-
Just so this is recorded, here is a copy-and-paste summary of this bug and two potential solution, written by @bgodala Problem: In X86 ISA 64 bit registers like RAX have aliases like EAX, AX, AL, AH which can be used to address various subcomponents of a 64-bit register. Since all these are referring to the same register a single 64-bit physical register is used for mapping all these registers. In gem5, register dependences are based on physical registers so all these sub-registers would be treated as if they are writing to full register. To prevent loss of values due to partial writes and additional dependency is added for a given write. For example if an instruction is writing to EAX then it is also reads EAX before writing. This limits the scheduling freedom when partial writes to 64-bit register is completely overwritten by full write to the same register by the subsequent instruction. Solution 1: One way to handle this problem is to treat each register as composed of sub-registers. For example treat a 64-bit register as composed of 16 8-bit registers. A write to 64-bit would involve updating all its 8-bit sub-components atomically. This would break the 1:1 relation between architectural and physical registers. A rename function would return a list of sub-atomic registers instead of one register. This solution might involve updating code base of all ISAs. Solution 2: Instead of treating each registers as composed of sub-registers use a mask with each register. A write wold involve write to unmasked portions of the register. This probably involves updating dependence checking logic to look at mask value before inserting a dependence edge. I hope the above text serves as the starting point to document this issue. Please feel free to correct me if I’m wrong. |
Beta Was this translation helpful? Give feedback.
-
I am writing this discussion post to forward a question asked on gem5 Slack.
Original question (https://gem5-workspace.slack.com/archives/C03K26BGTKL/p1695938402187719):
Hi all,
We have been seeing basic bitwise operations (and/xor in particular) be dependent on CCRegClass (condition code) registers in x86 (alongside src1, src2, etc). We suspect this is being done in the rename step but are not able to figure out exactly where; In particular, this is affecting cmp, and not allowing us to perform arithmetic instructions during branch speculation (without everything being squashed early).
Is there a reason this dependency exists?
Further Clarification provided (https://gem5-workspace.slack.com/archives/C03K26BGTKL/p1696026211102579):
Just to clarify the question above, we observe that instructions like AND(r1,r2) or XOR(r1,r2) in Gem5 have a source dependency on the CC (Condition Code) register in addition to r1 and r2, I believe this should not be the case as per Intel documentation, where the only sources should be r1 and r2. Is this is a Gem5 bug or some misunderstanding? The implication is that this would vastly affect the behavior under speculation.
For example, if we try to execute a code sequence as below, where rax generation is delayed, so the CMP and JZ cannot execute, so one would expect L0 to execute speculatively. But on Gem5 we observe that AND r1, r2 is blocked because of a supposed dependency on the CC register (gem5 debug logs say that it has 3 source registers out of which only r1 and r2 are ready and the CC register is not ready). This seems like unexpected behavior.
CMP rax, 0
JZ L1
L0:
AND r1, r2
XOR r3, r4
L1:
...
We see this kind of behavior on the latest Gem5 stable version. But a similar question was also raised in a previous gem5 mailing list post a few years ago. Could someone please help us understand what's happening here? Is this a Gem5 bug?
Beta Was this translation helpful? Give feedback.
All reactions