Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes lifting disassembler bug for Pcmp #1222

Merged
merged 1 commit into from
Sep 24, 2020

Conversation

percontation
Copy link
Contributor

@percontation percontation commented Sep 21, 2020

There's a decoding error in some Pcmp instructions, and the same type error in Pcmp & Ppackedbinop semantics (left-shift of a vector-register-sized operand by an 8-bit shift value).

@percontation
Copy link
Contributor Author

percontation commented Sep 21, 2020

(Am I correct in thinking bil wants both sides of shifts to have the same type? If that has changed, it could just be that I'm working with older code that wants to make that assumption. Either way, the pcmp decoder error fixed here is definitely a problem.)

@percontation percontation changed the title Fixes some bugs in the pcmp semantics Fixes some bugs in the Pcmp & Ppackedbinop semantics Sep 21, 2020
@ivg
Copy link
Member

ivg commented Sep 21, 2020

Am I correct in thinking bil wants both sides of shifts to have the same type?

Nope, they can have different types, no problem here.

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove the .merlin files and provide some references that identify what "some bugs" you're fixing? E.g., what is the expected behavior and how the current behavior differs from it, and how it is reflected in the ISA manual.

lib/.merlin Outdated Show resolved Hide resolved
plugins/x86/x86_lifter.ml Outdated Show resolved Hide resolved
@@ -999,9 +999,9 @@ let parse_instr mode mem addr =
(Punpck(prefix.mopsize, elemt, order, r, rm, rv), na)
| 0x64 | 0x65 | 0x66 | 0x74 | 0x75 | 0x76 as o ->
let r, rm, rv, na = parse_modrm_vec None na in
let elet = match o land 0x6 with | 0x4 -> reg8_t | 0x5 -> reg16_t | 0x6 -> reg32_t | _ ->
let elet = match o land 0x0F with | 0x4 -> reg8_t | 0x5 -> reg16_t | 0x6 -> reg32_t | _ ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any references or explanations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a typo, this is what the author meant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you can check the behavior against http://ref.x86asm.net/coder64.html ; pretty clearly the case that leaves elet=reg16_t isn't supposed to be dead code :P )

@percontation
Copy link
Contributor Author

Am I correct in thinking bil wants both sides of shifts to have the same type?

Nope, they can have different types, no problem here.

Ah, ok. I'll note that I think this type difference is causing primus to send our code stuff like extract(hi=31, lo=0, <8-bit constant>). Should it always be the case that over-extraction is interpreted as "zero pad to recover non-existent bits"?

There's a typo in the disassembler.
@percontation percontation changed the title Fixes some bugs in the Pcmp & Ppackedbinop semantics Fixes lifting disassembler bug for Pcmp Sep 23, 2020
@ivg
Copy link
Member

ivg commented Sep 24, 2020

Thanks!

@ivg ivg merged commit 4eb72ae into BinaryAnalysisPlatform:master Sep 24, 2020
@ivg
Copy link
Member

ivg commented Sep 24, 2020

Am I correct in thinking bil wants both sides of shifts to have the same type?

Nope, they can have different types, no problem here.

Ah, ok. I'll note that I think this type difference is causing primus to send our code stuff like extract(hi=31, lo=0, <8-bit constant>).

Yes, since 2.1.0 we relaxed the type system of the Primus interpreter and it now permits binary operations, including the arithmetic operations, on operands with different sizes. To enable this, the interpreter coerces operands to the same bitwidth. We apply this coercion to all operations, even those that permit operands of different sizes, which is probably too conservative and results in unnecessary operations such as extract(hi=31, lo=0, <8-bit constant>) that you mentioned. If those bogus coercions introduce any issues on your side or bother you, I can remove them real quick, no problem here. Do you want to make this coercions more precise?

Should it always be the case that over-extraction is interpreted as "zero pad to recover non-existent bits"?

Yes, if the resulting width is larger than the original it is the same as zero-padding, e.g.,

# #install_printer Word.pp_hex_full;;
# x;;
- : word = 0xA5:8u
# Word.extract_exn ~hi:31 ~lo:0 x;;
- : word = 0xA5:32u

@percontation
Copy link
Contributor Author

percontation commented Sep 25, 2020

👍 Thanks for the info.

In our particular case, removing the "bogus" coercions would actually just cause errors elsewhere in our code, because our expressions use traditional width-strictness in bitvec operands (and the code assumes that Primus is as strict as it was previously).

So what I'd suggest, is that Primus be updated to send the "correct" coercion (cast or extract), rather than omitting them. An approach like that would allow Primus to avoid breaking the API, even if operator definitions are relaxed/otherwise expanded in the future.

@ivg
Copy link
Member

ivg commented Sep 30, 2020

+1 Thanks for the info.

In our particular case, removing the "bogus" coercions would actually just cause errors elsewhere in our code, because our expressions use traditional width-strictness in bitvec operands (and the code assumes that Primus is as strict as it was previously).

I am not saying about removing all coercions, just removing coercions for shifts, which are not necessary from the perspective of the BIL type system, but indeed are necessary from the perspective of SMTLIB, where shifts are having sort (op2 (_ BitVec m) (_ BitVec m) (_ BitVec m)), so indeed there is some impedance mismatch here.

So what I'd suggest, is that Primus be updated to send the "correct" coercion (cast or extract), rather than omitting them. An approach like that would allow Primus to avoid breaking the API, even if operator definitions are relaxed/otherwise expanded in the future.

In other words, instead of the swiss-knife extract you want to have either cast unsigned m x when we zero extend and extract m 0 x when we do the narrow cast?

I can do this, but here is the caveat. Even if Primus will begin to emit more precise coercions it doesn't guarantee that someone else will not be using extract for zero extending. In fact, all our lifters do this and do this a lot. So anyways you have to handle extracts correctly, because this is the semantics of the extract operation in BIL. For the reference, here is how extract shall be lifted to Z3,

  module Bitv = Z3.BitVector

  let extract hi lo x =
    let xs = Bitv.get_size (Expr.get_sort x)
    and ns = hi-lo+1 in
    if ns > xs
    then if lo = 0
      then Bitv.mk_zero_ext ctxt (ns-xs) x
      else
        Bitv.mk_extract ctxt hi lo @@
        Bitv.mk_zero_ext ctxt (ns-xs) x
    else
      Bitv.mk_extract ctxt hi lo x

See also our symbolic executor formula module that maps Primus observations to Z3 formulae, which we should probably make public.

And sorry for the late reply, I missed somehow the notification :(

@percontation percontation deleted the pcmp_fix branch November 18, 2020 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants