Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#1569 AArch64: Implement trace optimisation. #2442

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions core/arch/aarch64/decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,9 @@ decode_sizeof(dcontext_t *dcontext, byte *pc, int *num_prefixes)
byte *
decode_raw(dcontext_t *dcontext, byte *pc, instr_t *instr)
{
ASSERT_NOT_IMPLEMENTED(false); /* FIXME i#1569 */
return NULL;
instr_set_opcode(instr, OP_UNDECODED);
instr_set_raw_bits(instr, pc, AARCH64_INSTR_SIZE);
return (pc + AARCH64_INSTR_SIZE);
}

bool
Expand All @@ -136,6 +137,20 @@ decode_raw_jmp_target(dcontext_t *dcontext, byte *pc)
return pc + ((enc & 0x1ffffff) << 2) - ((enc & 0x2000000) << 2);
}

bool
decode_raw_is_cond_branch_zero(dcontext_t *dcontext, byte *pc)
{
uint enc = *(uint *)pc;
return ((enc & 0x7e000000) == 0x34000000); /* CBZ or CBNZ */
}

byte *
decode_raw_cond_branch_zero_target(dcontext_t *dcontext, byte *pc)
{
uint enc = *(uint *)pc;
return pc + ((enc & 0xffffe0) >> 5 << 2);
}

const instr_info_t *
instr_info_extra_opnds(const instr_info_t *info)
{
Expand Down
44 changes: 33 additions & 11 deletions core/arch/aarch64/emit_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* FIXME i#1575: coarse-grain NYI on ARM */
ASSERT_NOT_IMPLEMENTED(!TEST(FRAG_COARSE_GRAIN, f->flags));
if (LINKSTUB_DIRECT(l_flags)) {
/* We put a NOP here for future linking. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But linking was already working fine, so it is unclear why we'd need a nop now. Please elaborate in the comment -- maybe best in the pre-function comment. Putting an asm listing is also helpful (esp with this stub getting very large: 8 instructions now). This is unusual.

*pc++ = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

/* stp x0, x1, [x(stolen), #(offs)] */
*pc++ = (0xa9000000 | 0 | 1 << 10 | (dr_reg_stolen - DR_REG_X0) << 5 |
TLS_REG0_SLOT >> 3 << 15);
Expand All @@ -119,6 +121,8 @@ insert_exit_stub_other_flags(dcontext_t *dcontext, fragment_t *f,
/* br x1 */
*pc++ = 0xd61f0000 | 1 << 5;
} else {
/* We put a NOP here for trace building. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does trace building need a nop? I think there's some code missing from this diff (e.g., fixup_indirect_trace_exit): I can't get a clear picture of what the trace building strategy is on A64 vs x86 and why a nop would help.

*pc++ = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

/* Stub starts out unlinked. */
cache_pc exit_target = get_unlinked_entry(dcontext,
EXIT_TARGET_TAG(dcontext, f, l));
Expand Down Expand Up @@ -184,9 +188,8 @@ stub_is_patched(fragment_t *f, cache_pc stub_pc)
void
unpatch_stub(fragment_t *f, cache_pc stub_pc, bool hot_patch)
{
/* Restore the stp x0, x1, [x28] instruction. */
*(uint *)stub_pc = (0xa9000000 | 0 | 1 << 10 | (dr_reg_stolen - DR_REG_X0) << 5 |
TLS_REG0_SLOT >> 3 << 15);
/* Restore the NOP instruction. */
*(uint *)stub_pc = 0xd503201f;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a named constant

if (hot_patch)
machine_cache_sync(stub_pc, stub_pc + AARCH64_INSTR_SIZE, true);
}
Expand Down Expand Up @@ -269,8 +272,14 @@ indirect_linkstub_stub_pc(dcontext_t *dcontext, fragment_t *f, linkstub_t *l)
cache_pc cti = EXIT_CTI_PC(f, l);
if (!EXIT_HAS_STUB(l->flags, f->flags))
return NULL;
ASSERT(decode_raw_is_jmp(dcontext, cti));
return decode_raw_jmp_target(dcontext, cti);
if (decode_raw_is_jmp(dcontext, cti))
return decode_raw_jmp_target(dcontext, cti);
/* In trace, we might have cbz/cbnz to indirect linkstubs. */
if (decode_raw_is_cond_branch_zero(dcontext, cti))
return decode_raw_cond_branch_zero_target(dcontext, cti);
/* There should be no other types of branch to linkstubs. */
ASSERT_NOT_REACHED();
return NULL;
}

cache_pc
Expand Down Expand Up @@ -343,12 +352,25 @@ insert_fragment_prefix(dcontext_t *dcontext, fragment_t *f)
/* Always use prefix on AArch64 as there is no load to PC. */
byte *pc = (byte *)f->start_pc;
ASSERT(f->prefix_size == 0);
/* ldr x0, [x(stolen), #(off)] */
*(uint *)pc = (0xf9400000 | (ENTRY_PC_REG - DR_REG_X0) |
(dr_reg_stolen - DR_REG_X0) << 5 |
ENTRY_PC_SPILL_SLOT >> 3 << 10);
pc += 4;
f->prefix_size = (byte)(((cache_pc)pc) - f->start_pc);
if (use_ibt_prefix(f->flags)) {
/* ldr x0, [x(stolen), #(off)] */
*(uint *)pc = (0xf9400000 | (ENTRY_PC_REG - DR_REG_X0) |
(dr_reg_stolen - DR_REG_X0) << 5 |
ENTRY_PC_SPILL_SLOT >> 3 << 10);
pc += 4;
f->prefix_size = (byte)((cache_pc)pc - f->start_pc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks identical to the else code below which looks identical to the original code: so the original was sufficient and there's nothing to change here, right?

} else {
#ifdef CLIENT_INTERFACE
/* FIXME i#1569 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this FIXME refer to? Is there something incomplete about the code below? (i#1569 is very general so it doesn't help. The comment should explain.)

#endif
/* ldr x0, [x(stolen), #(off)] */
*(uint *)pc = (0xf9400000 | (ENTRY_PC_REG - DR_REG_X0) |
(dr_reg_stolen - DR_REG_X0) << 5 |
ENTRY_PC_SPILL_SLOT >> 3 << 10);
pc += 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/4/AARCH64_INSTR_SIZE/

f->prefix_size = (byte)((cache_pc)pc - f->start_pc);
}
/* Make sure emitted size matches size we requested. */
ASSERT(f->prefix_size == fragment_prefix_size(f->flags));
}

Expand Down
24 changes: 23 additions & 1 deletion core/arch/aarch64/instr.c
Original file line number Diff line number Diff line change
Expand Up @@ -292,10 +292,32 @@ instr_is_undefined(instr_t *instr)
return false;
}

dr_pred_type_t
instr_invert_predicate(dr_pred_type_t pred)
{
CLIENT_ASSERT(pred != DR_PRED_NONE && pred != DR_PRED_AL &&
pred != DR_PRED_NV, "invalid cbr predicate");
/* Flipping the bottom bit inverts a predicate */
return (dr_pred_type_t)(DR_PRED_EQ + (((uint)pred - DR_PRED_EQ) ^ 0x1));
}

void
instr_invert_cbr(instr_t *instr)
{
ASSERT_NOT_IMPLEMENTED(false); /* FIXME i#1569 */
int opc = instr_get_opcode(instr);
dr_pred_type_t pred = instr_get_predicate(instr);
CLIENT_ASSERT(instr_is_cbr(instr), "instr_invert_cbr: instr not a cbr");
if (opc == OP_cbnz) {
instr_set_opcode(instr, OP_cbz);
} else if (opc == OP_cbz) {
instr_set_opcode(instr, OP_cbnz);
} else if (opc == OP_tbnz) {
instr_set_opcode(instr, OP_tbz);
} else if (opc == OP_tbz) {
instr_set_opcode(instr, OP_tbnz);
} else {
instr_set_predicate(instr, instr_invert_predicate(pred));
}
}

bool
Expand Down
7 changes: 7 additions & 0 deletions core/arch/aarch64/instr_create.h
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,13 @@
instr_create_0dst_2src((dc), OP_cbz, (pc), (reg))
#define INSTR_CREATE_cmp(dc, rn, rm_or_imm) \
instr_create_1dst_2src(dc, OP_subs, OPND_CREATE_ZR(rn), rn, rm_or_imm)
#define INSTR_CREATE_eor(dc, d, s) \
INSTR_CREATE_eor_shift(dc, d, d, s, \
OPND_CREATE_INT8(DR_SHIFT_LSL), OPND_CREATE_INT8(0))
#define INSTR_CREATE_eor_shift(dc, rd, rn, rm, sht, sha) \
instr_create_1dst_4src(dc, OP_eor, rd, rn, \
opnd_create_reg_ex(opnd_get_reg(rm), 0, DR_OPND_SHIFTED), \
opnd_add_flags(sht, DR_OPND_IS_SHIFT), sha)
#define INSTR_CREATE_ldp(dc, rt1, rt2, mem) \
instr_create_2dst_1src(dc, OP_ldp, rt1, rt2, mem)
#define INSTR_CREATE_ldr(dc, Rd, mem) \
Expand Down
5 changes: 4 additions & 1 deletion core/arch/arch_exports.h
Original file line number Diff line number Diff line change
Expand Up @@ -1493,7 +1493,7 @@ decode_init(void);
# define AARCH64_INSTR_SIZE 4
# define FRAGMENT_BASE_PREFIX_SIZE(flags) AARCH64_INSTR_SIZE
# define DIRECT_EXIT_STUB_SIZE(flags) \
(7 * AARCH64_INSTR_SIZE) /* see insert_exit_stub_other_flags */
(8 * AARCH64_INSTR_SIZE) /* see insert_exit_stub_other_flags */
# define DIRECT_EXIT_STUB_DATA_SZ 0
# else
# define FRAGMENT_BASE_PREFIX_SIZE(flags) \
Expand Down Expand Up @@ -1839,6 +1839,9 @@ void interp(dcontext_t *dcontext);
uint extend_trace(dcontext_t *dcontext, fragment_t *f, linkstub_t *prev_l);
int append_trace_speculate_last_ibl(dcontext_t *dcontext, instrlist_t *trace,
app_pc speculate_next_tag, bool record_translation);
#ifdef AARCH64
int fixup_indirect_trace_exit(dcontext_t *dcontext, instrlist_t *trace);
#endif

uint
forward_eflags_analysis(dcontext_t *dcontext, instrlist_t *ilist, instr_t *instr);
Expand Down
4 changes: 4 additions & 0 deletions core/arch/decode.h
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,10 @@ const instr_info_t * opcode_to_encoding_info(uint opc,
_IF_ARM(bool it_block));
bool decode_raw_is_jmp(dcontext_t *dcontext, byte *pc);
byte *decode_raw_jmp_target(dcontext_t *dcontext, byte *pc);
#ifdef AARCH64
bool decode_raw_is_cond_branch_zero(dcontext_t *dcontext, byte *pc);
byte *decode_raw_cond_branch_zero_target(dcontext_t *dcontext, byte *pc);
#endif

/* DR_API EXPORT TOFILE dr_ir_utils.h */

Expand Down
5 changes: 5 additions & 0 deletions core/arch/emit_utils_shared.c
Original file line number Diff line number Diff line change
Expand Up @@ -1328,6 +1328,11 @@ update_indirect_exit_stub(dcontext_t *dcontext, fragment_t *f, linkstub_t *l)
int
fragment_prefix_size(uint flags)
{
#ifdef AARCH64
/* For AArch64, there is no need to save the flags
* so we always have the same ibt prefix. */
return fragment_ibt_prefix_size(flags);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code needs to be here: use_ibt_prefix should be true always for AARCH64, right? So this routine was already returning fragment_ibt_prefix_size(flags). Cleaner to not have an ifdef.

#endif
if (use_ibt_prefix(flags)) {
return fragment_ibt_prefix_size(flags);
} else {
Expand Down
2 changes: 1 addition & 1 deletion core/arch/instr.h
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ typedef enum _dr_pred_type_t {
#define PREFIX_PRED_BITS 5
#define PREFIX_PRED_BITPOS (32 - PREFIX_PRED_BITS)
#define PREFIX_PRED_MASK \
(((1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */
((((uint)1 << PREFIX_PRED_BITS)-1) << PREFIX_PRED_BITPOS) /*0xf8000000 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just 1U

/* DR_API EXPORT BEGIN */

/**
Expand Down
3 changes: 2 additions & 1 deletion core/arch/instr_shared.c
Original file line number Diff line number Diff line change
Expand Up @@ -705,7 +705,8 @@ instr_get_predicate(instr_t *instr)
instr_t *
instr_set_predicate(instr_t *instr, dr_pred_type_t pred)
{
instr->prefixes |= ((pred << PREFIX_PRED_BITPOS) & PREFIX_PRED_MASK);
instr->prefixes = ((instr->prefixes & ~PREFIX_PRED_MASK ) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect: I think there are other bits in prefixes besides predicates, which this routine should not clobber

((pred << PREFIX_PRED_BITPOS) & PREFIX_PRED_MASK));
return instr;
}

Expand Down
Loading