Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up detach by reducing translation retries #4232

Open
derekbruening opened this issue Mar 27, 2020 · 1 comment
Open

Speed up detach by reducing translation retries #4232

derekbruening opened this issue Mar 27, 2020 · 1 comment

Comments

@derekbruening
Copy link
Contributor

Separating out from #4219 which is about correctness problems.

With the stats from PR #4228 on a large ~600-thread app under drcachesim, I observed ~500 retries (40x more than plain DR) on detach, and a detach time increase of >2x vs plain DR. This issue trying to reduce that by adding translation handling for more suspend locations so we don't have to retry to suspend.

Xref #37, #38 on ptrace attach/detach which might also speed things up vs signals.

@derekbruening
Copy link
Contributor Author

Xref #4669 showing frequent translation failures on block prefixes (that was -disable_traces) and rseq mangling. Exit stubs are also missing translation (#4675).

derekbruening added a commit that referenced this issue Dec 15, 2022
Adds new labels delimiting clean call sequences.
Converts into a translation record flag when storing translations.

Uses the new labels and flag to precisely identify clean call
mangling, replacing the previous scheme which incorrectly thought
mangled tool pc-relative was a clean call, resulting in incorrect
translations and crashes.

Adds a test case to api.detach_state by adding a client (by converting
it to use static DR) which inserts a pc-relative load.  This
reproduces the crash on detach, and is fixed with this fix.
The added instrumentation caused periodic detach failures which were
solved by setting the translation and adding a restore-state event:
i#4232 covers trying to improve the situation.

Issue: #5786, #4232
Fixes #5786
derekbruening added a commit that referenced this issue Jan 14, 2023
Adds new labels delimiting clean call sequences.
Converts into a translation record flag when storing translations.

Uses the new labels and flag to precisely identify clean call
mangling, replacing the previous scheme which incorrectly thought
mangled tool pc-relative was a clean call, resulting in incorrect
translations and crashes.

Adds a test case to api.detach_state by adding a client (by converting
it to use static DR) which inserts a pc-relative load.  This
reproduces the crash on detach, and is fixed with this fix.
The added instrumentation caused periodic detach failures which were
solved by setting the translation and adding a restore-state event:
i#4232 covers trying to improve the situation.

Adds a new instr_t.offset field.
Stops using instr_t.note to hold encoding offsets for pc-releative
operands.  Adds a new field instr_t.offset which is used for this
purpose.  This leaves note values in place across encodings, which is
needed for new clean call marking labels and also simplifies rseq
handling code.

This instr_t field is a compatibility break and we bump the version and 
OLDEST_COMPATIBLE_VERSION here to 990.

Updates dr_get_note docs.

Augments logging of xl8 info with new flag info.

Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels.
This is another compatibility break, while at it.

Fixes several issues hit in tests that happened to trigger on the
heap bucket size and other changes:
+ Fixes a rank order violation at loglevel 5: xref #1649
+ Writes real xstate_bv into signal frame when setting the xstate context to
   avoid lazy AVX restore problems.
+ Tweaks the thread_churn test to work around non-linearities.

Issue: #5786, #4232
Fixes #5786
dolanzhao pushed a commit that referenced this issue Jan 30, 2023
Adds new labels delimiting clean call sequences.
Converts into a translation record flag when storing translations.

Uses the new labels and flag to precisely identify clean call
mangling, replacing the previous scheme which incorrectly thought
mangled tool pc-relative was a clean call, resulting in incorrect
translations and crashes.

Adds a test case to api.detach_state by adding a client (by converting
it to use static DR) which inserts a pc-relative load.  This
reproduces the crash on detach, and is fixed with this fix.
The added instrumentation caused periodic detach failures which were
solved by setting the translation and adding a restore-state event:
i#4232 covers trying to improve the situation.

Adds a new instr_t.offset field.
Stops using instr_t.note to hold encoding offsets for pc-releative
operands.  Adds a new field instr_t.offset which is used for this
purpose.  This leaves note values in place across encodings, which is
needed for new clean call marking labels and also simplifies rseq
handling code.

This instr_t field is a compatibility break and we bump the version and 
OLDEST_COMPATIBLE_VERSION here to 990.

Updates dr_get_note docs.

Augments logging of xl8 info with new flag info.

Reduces DR_NOTE_FIRST_RESERVED to give DR more reserved labels.
This is another compatibility break, while at it.

Fixes several issues hit in tests that happened to trigger on the
heap bucket size and other changes:
+ Fixes a rank order violation at loglevel 5: xref #1649
+ Writes real xstate_bv into signal frame when setting the xstate context to
   avoid lazy AVX restore problems.
+ Tweaks the thread_churn test to work around non-linearities.

Issue: #5786, #4232
Fixes #5786
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant