Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use ZYAN_ASSERT for user passed arguments #297

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

flobernd
Copy link
Member

No description provided.

@flobernd flobernd force-pushed the improve-user-input-validation branch 4 times, most recently from e5fa624 to 615a8f5 Compare December 21, 2021 14:28
@flobernd flobernd requested a review from athre0z December 21, 2021 15:06
@flobernd
Copy link
Member Author

We probably should add a new fuzzer project to test the ZydisDecoderDecodeOperands() function with randomly crafted inputs (context and instruction).

@flobernd flobernd marked this pull request as ready for review December 21, 2021 15:08
*/
const void* definition;
ZyanU16 definition_id;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows us to actually fuzz the code (open todo) and as well guarantees we never read invalid memory, even if the user passes nonsense data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all while requiring less space in the struct! Nice.

@flobernd flobernd force-pushed the improve-user-input-validation branch from f05434e to 906f0f1 Compare December 22, 2021 09:11
@athre0z
Copy link
Member

athre0z commented Jan 3, 2022

While I see why this is necessary under our current API guarantees, I really can't say that I like it too much. It adds a pretty significant amount of branches to our already way too branchy code. I ran a benchmark and the performance regression is pretty noticeable. :/

Performance impact

Full decode + Formatting

Clang

Before

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     13.963 s ±  0.090 s    [User: 13.941 s, System: 0.007 s]
  Range (min … max):   13.851 s … 14.122 s    10 runs

After

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     15.572 s ±  0.034 s    [User: 15.549 s, System: 0.007 s]
  Range (min … max):   15.528 s … 15.614 s    10 runs

Regression

~12%

GCC

Before

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     14.245 s ±  0.086 s    [User: 14.221 s, System: 0.007 s]
  Range (min … max):   14.094 s … 14.362 s    10 runs

After

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     15.503 s ±  0.089 s    [User: 15.478 s, System: 0.007 s]
  Range (min … max):   15.301 s … 15.602 s    10 runs

Regression

~9%

*/
const void* definition;
ZyanU16 definition_id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all while requiring less space in the struct! Nice.

Comment on lines -1003 to +1006
ZYAN_ASSERT(definition->size[context->eosz_index] ||
(instruction->meta.category == ZYDIS_CATEGORY_AMX_TILE));
// ZYAN_ASSERT(definition->size[context->eosz_index] || (instruction->meta.category == ZYDIS_CATEGORY_AMX_TILE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally commented out? I feel like commented out asserts will confuse us in the future. I'd either reinstate it as a return condition or get rid of it entirely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a valid assertion, but as instruction is now passed by the user, it might contain invalid data, so we can't rely on this one anymore. On the other hand I don't want to remove it entirely, because it caused an issue in the past. Might convert it to a plain comment.

@flobernd
Copy link
Member Author

flobernd commented Jan 3, 2022

@athre0z

I ran a benchmark and the performance regression is pretty noticeable. :/

That's not nice. Have you been able to profile it? Probably the new branches in ZydisGetInstructionDefinition might be the issue. I will check on Windows as well.

While I see why this is necessary under our current API guarantees...

This was actually my initial reason to do everything in one single pass when we started. Sadly, each approach has some drawbacks. Let's try to find out what's causing the regression and optimize it as good as we can.

@athre0z
Copy link
Member

athre0z commented Jan 3, 2022

Here's a perf diff. Note that -4% doesn't mean that ZydisDecoderDecodeInstruction got faster -- it's percentages that are being compared. In other words: ZydisDecoderDecodeOperands etc got heavier, so the share of cycles spent in ZydisDecoderDecodeInstruction decreased in comparison.

$ perf diff build-master/perf.data build-flo/perf.data

# Event 'cpu_core/cycles:u/'
#
# Baseline  Delta Abs  Shared Object         Symbol
# ........  .........  ....................  .....................................
#
    52.25%     -4.17%  bench-zydis-full-fmt  [.] ZydisDecoderDecodeInstruction
     1.90%     +2.76%  bench-zydis-full-fmt  [.] ZydisDecoderDecodeOperands
     1.15%     +1.25%  bench-zydis-full-fmt  [.] ZydisGetInstructionDefinition
     4.17%     +1.13%  libc-2.33.so          [.] __memset_avx2_unaligned_erms
    21.56%     -0.56%  bench-zydis-full-fmt  [.] ZydisDecodeOperands
     0.96%     +0.37%  bench-zydis-full-fmt  [.] ZydisDecodeOperandMemory
     1.77%     -0.27%  bench-zydis-full-fmt  [.] ZydisRegisterEncode
     4.27%     -0.19%  bench-zydis-full-fmt  [.] main
     0.84%     -0.11%  bench-zydis-full-fmt  [.] ZydisGetAccessedFlags
     0.65%     -0.10%  bench-zydis-full-fmt  [.] ZydisReadDisplacement
     0.60%     -0.09%  bench-zydis-full-fmt  [.] ZydisGetElementInfo
               +0.09%  bench-zydis-full-fmt  [.] ZydisCalcRegisterId
     0.68%     -0.08%  bench-zydis-full-fmt  [.] ZydisGetInstructionEncodingInfo
     0.43%     +0.07%  bench-zydis-full-fmt  [.] memset@plt
     2.05%     -0.05%  bench-zydis-full-fmt  [.] ZydisRegisterGetWidth
     0.89%     -0.04%  bench-zydis-full-fmt  [.] ZydisFormatterFormatInstruction
     1.69%     -0.02%  bench-zydis-full-fmt  [.] ZydisFormatterFormatInstructionEx
     2.76%     -0.01%  bench-zydis-full-fmt  [.] ZydisDecoderTreeGetChildNode
     0.76%     -0.01%  bench-zydis-full-fmt  [.] ZydisReadImmediate
     0.61%     -0.00%  bench-zydis-full-fmt  [.] ZydisGetOperandDefinitions
               +0.00%  ld-2.33.so            [.] update_usable.constprop.0
     0.00%     -0.00%  ld-2.33.so            [.] _dl_start
     0.00%     -0.00%  ld-2.33.so            [.] _start
     0.00%             ld-2.33.so            [.] init_cpu_features.constprop.0

@flobernd
Copy link
Member Author

flobernd commented Jul 2, 2022

@athre0z How do we proceed with this PR?

@athre0z
Copy link
Member

athre0z commented Jul 3, 2022

Well, I'm not really excited about a 10% perf regression for everyone for what is essentially "make debugging easier". The single commit in this PR does a bunch of different things, some of which are entirely uncontroversial and will likely improve rather then regress performance, e.g. using lookup tables instead of chains of ternary expressions. We should probably start by splitting this one commit into multiple smaller ones and then do partial merges.

As for the argument validation, we'll have to make a decision as to whether we simply define this as undefined behavior and keep performance intact or burn a ton of CPU on validation.

@flobernd
Copy link
Member Author

flobernd commented Jul 3, 2022

Just go with undefined behavior is a bad idea, because it will often lead to invalid memory access or other critical side effects. We don't see these effects on master yet, because our fuzzer does not test this. Imho we have to continue using sanity checks for public APIs.

There must be a way to mitigate the performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants