Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of ESIL, or use of alternatives #277

Closed
ghost opened this issue Dec 27, 2020 · 8 comments · Fixed by #1663
Closed

Redesign of ESIL, or use of alternatives #277

ghost opened this issue Dec 27, 2020 · 8 comments · Fixed by #1663
Assignees
Labels
ESIL refactor Refactoring requests
Milestone

Comments

@ghost
Copy link

ghost commented Dec 27, 2020

Is your feature request related to a problem? Please describe.
ESIL is imperfect (slow, no float support, no well tested).
In theory, if Rizin used a good IL, we could have a simplify analysis (as in ghidra).

"By modeling in this way, the analysis of different processors is put into a common framework, facilitating the development of retargetable analysis algorithms and applications" - ghidra

Describe the solution you'd like
Using alternative IL:

  • p-code
  • bil
  • openreil
  • ...

Describe alternatives you've considered
Modify the ESIL specification...

Additional context
https://ghidra.re/courses/languages/html/pcoderef.html

@XVilka
Copy link
Member

XVilka commented Dec 28, 2020

While I agree with the general idea, P-code is very old, OpenREIL as well (OpenREIL project is recent but REIL itself is very old). BIL is more modern but still substituted by the Knowledge Base. Falcon opted for the modification of RREIL (not to mistake it as REIL - it's completely different) instead. Also worth checking Radeco middle layer as well.
If we decide to switch to another IL/IR, it's better to keep in mind the current research.
See also my suggestion for Ghidra itself about describing the architecture/lifting rules using Datalog: NationalSecurityAgency/ghidra#948

I am strongly in favor of something resembling Core Theory (something like an extension of SMT) and Knowledge Base, see more at BAP documentation:

@XVilka
Copy link
Member

XVilka commented Dec 28, 2020

This is what I had in mind if we modify the ESIL itself:

  • Separate RZ_API rz_analysis_esil_* into librz/include/rz_esil.h
  • Consider to isolate the ESIL handling into the separate library, like SIOL was
  • Provide the better API to distinct between various ESIL operation kinds
  • Provide the API for "un-RPN"-ing of the ESIL expressions, for printing purpose, maybe more readable representation, useful for debugging/etc
  • Provide the way for easier distinction between memory accesses and register accesses in API
  • Add the location/flag ESIL operation - to add symbolic markers of some places in ESIL stream

@ghost
Copy link
Author

ghost commented Dec 28, 2020

Falcon opted for the modification of RREIL (not to mistake it as REIL - it's completely different) instead.

"Falcon IL does not support floating point operations."
If the IL could supports floating point operations, that would be great.

Also worth checking Radeco middle layer as well.

As I understand it, this is an "AST" of the ESIL expression?

See also my suggestion for Ghidra itself about describing the architecture/lifting rules using Datalog: NationalSecurityAgency/ghidra#948

okay, i love it

In theory, if we use a simple IL, could we compute the side effects with datalog?

I am strongly in favor of something resembling Core Theory (something like an extension of SMT) and Knowledge Base, see more at BAP documentation:
http://binaryanalysisplatform.github.io/bap/api/odoc/bap-knowledge/Bap_knowledge/Knowledge/index.html
http://binaryanalysisplatform.github.io/bap/api/odoc/bap-core-theory/Bap_core_theory/index.html

To be honest, I don't understand what it says.

@ghost
Copy link
Author

ghost commented Dec 28, 2020

Consider to isolate the ESIL handling into the separate library, like SIOLl was

"ESIL handling" ?

Provide the better API to distinct between various ESIL operation kinds

yes, maybe some new operations

Provide the way for easier distinction between memory accesses and register accesses in API

+1

Add the location/flag ESIL operation - to add symbolic markers of some places in ESIL stream

Can I have more details?

@XVilka
Copy link
Member

XVilka commented Dec 30, 2020

"ESIL handling" ?

Yes, parsing, running a ESIL VM, maybe some other API. Whatever we choose it might be beneficial to provide the simple C library with the stable API for other projects to use. This way more tools apart from Rizin and Cutter could use this library for emulation purpose.

Can I have more details?

Currently ESIL is a continuous stream without proper means to identify what particular instruction was lifted into what particular case. We could add a way to embed location, maybe source-level information into the IL. Location information should be compact though not to add too much overhead on already slow uplifting and emulation.

To be honest, I don't understand what it says.

TLDR; Core Theory is the SMT-like representation with Effects.
See, for example the "minimal" Theory:

And the corresponding Effects (included in the Minimal theory):

It has also the floating point operations representation:

All these (along with a few others) merge into the whole Core Theory:

The Core Theory signature includes operations on booleans, bitvectors, floating point numbers, and memories, as well as denotations of various control-flow and data-flow effects.

The sources are located at lib/bap_core_theory.

@ghost
Copy link
Author

ghost commented Dec 30, 2020

Yes, parsing, running a ESIL VM, maybe some other API. Whatever we choose it might be beneficial to provide the simple C library with the stable API for other projects to use. This way more tools apart from Rizin and Cutter could use this library for emulation purpose.

+1

Currently ESIL is a continuous stream without proper means to identify what particular instruction was lifted into what particular case. We could add a way to embed location, maybe source-level information into the IL. Location information should be compact though not to add too much overhead on already slow uplifting and emulation.

okay

TLDR; Core Theory is the SMT-like representation with Effects.

I will look at it
thanks you

PS: I am working on a new ESIL specification.

@XVilka
Copy link
Member

XVilka commented Jan 16, 2021

See also rizinorg/cutter#1133 for the visualization within Cutter.

@XVilka
Copy link
Member

XVilka commented Jun 1, 2021

This is the list of plugins that currently do uplifting to ESIL (rg -l "esil = true" librz/analysis/p):
Priority ones:

librz/analysis/p/analysis_mips_gnu.c
librz/analysis/p/analysis_mips_cs.c
librz/analysis/p/analysis_riscv_cs.c
librz/analysis/p/analysis_avr.c
librz/analysis/p/analysis_arm_cs.c
librz/analysis/p/analysis_x86_cs.c

The rest:

librz/analysis/p/analysis_xtensa.c
librz/analysis/p/analysis_wasm.c
librz/analysis/p/analysis_v850.c
librz/analysis/p/analysis_v810.c
librz/analysis/p/analysis_sparc_cs.c
librz/analysis/p/analysis_sh.c
librz/analysis/p/analysis_rsp.c
librz/analysis/p/analysis_pic.c
librz/analysis/p/analysis_ppc_cs.c
librz/analysis/p/analysis_h8300.c
librz/analysis/p/analysis_gb.c
librz/analysis/p/analysis_bf.c
librz/analysis/p/analysis_8051.c
librz/analysis/p/analysis_6502.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ESIL refactor Refactoring requests
Projects
Status: Done
2 participants