Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more atomic support: 64-bit add for 32-bit code; atomic reads and writes for aarchxx #4215

Open
derekbruening opened this issue Mar 22, 2020 · 7 comments

Comments

@derekbruening
Copy link
Contributor

We have several use cases where we want better atomic support from DR:

  • 32-bit support for dr_atomic_add64()
  • 32-bit support for DRX_COUNTER_LOCK for 64-bit counters in drx_insert_counter_update()
  • atomic and relatively prompt loads and stores for aarchxx for global flags accessed from gencode
@derekbruening
Copy link
Contributor Author

Adding:

  • Implement DRX_COUNTER_LOCK in the first place for aarchxx

@derekbruening
Copy link
Contributor Author

For the aarchxx xref #2502 where DR core needs such utilities.

@derekbruening derekbruening self-assigned this Mar 22, 2020
derekbruening added a commit that referenced this issue Mar 23, 2020
Adds 32-bit and 64-bit load and store atomics to core DR and exports
them to the client API.

Adds missing barriers to the existing store atomic macros on both ARM
and AArch64.

Adds a sanity test that at least exercises the functions.  Adds the
test sequence to client.thread and client.tls (only client.tls is
currently enabled on AArchXX).

Uses dr_atomic_load* in drwrap in place of a previously overkill
dr_atomic_add*.

These will also help with #2502.

The next step is to add gencode versions of these, either to the core
or drx.

Issue: #4215, #2502
derekbruening added a commit that referenced this issue Mar 23, 2020
Adds 32-bit and 64-bit load and store atomics to core DR and exports
them to the client API.

Adds missing barriers to the existing store atomic macros on both ARM
and AArch64.

Adds a sanity test that at least exercises the functions.  Adds the
test sequence to client.thread and client.tls (only client.tls is
currently enabled on AArchXX).

Uses dr_atomic_load* in drwrap in place of a previously overkill
dr_atomic_add*.

These will also help with #2502.

The next step is to add gencode versions of these, either to the core
or drx.

Issue: #4215, #2502
@derekbruening
Copy link
Contributor Author

For gencode load/store: in drx, or in core? Thinking of in DR core b/c DR needs it for #2502.
Some are multi-instr (incl loop!) so should model after instrlist_insert_mov_immed_ptrsz().
So sthg like:

instrlist_insert_atomic_load64(void *drcontext, opnd_t memref, 
                               reg_id_t dst, IF_NOT_X64_(reg_id_t dst_hi)
                               instrlist_t *ilist, instr_t *where, 
                               instr_t **first OUT, instr_t **last OUT);

For 32-bit it takes 2 dest reg params.
Might need a temp reg too for LDREX?

@johnfxgalea
Copy link
Contributor

How many registers will be needed for instrlist_insert_atomic_load32 and instrlist_insert_atomic_load64 for x86 and arm please?

@derekbruening
Copy link
Contributor Author

For the data size equaling the pointer size, for plain load and for plain store (i.e., no add or other read-modify-write), just one data register in all cases (often the address is in a separate register though).

32-bit arm dealing with 64-bit data: load needs a second register; store needs several and needs a loop.

32-bit x86 and 64-bit data: there are choices to make. cmpxchg8b needs all 4 of eax,ebx,ecx,edx while using fildll and fistpll messes w/ floating state.

Maybe limit your interface to not support the data size being larger than the pointer size if you need constraints on registers.

@derekbruening
Copy link
Contributor Author

Xref #4264 about using C11 atomics. For C++ clients, we should document that std::atomic is safe for clients so long as is_lock_free().

@derekbruening
Copy link
Contributor Author

Another question is whether to try to adapt the x86 LOCK() wrapper, allowing for granular use around various loads and stores. It would only work for single-instr constructs, and it would have to support changing the opcode: sthg like from LDR to LDAR on AArch64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants