Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: crucible-llvm: Parameterize over memory and pointer types #1194

Closed

Conversation

langston-barrett
Copy link
Contributor

Note that the code changes are very WIP. Further notes below.

The idea is that we have a lot of big changes we'd potentially like to
make to the memory model:

The problem is that these are all very hard to get started on, because
they would involve sweeping changes across Crucible-LLVM and all
downstream packages.

We could also consider alternative memory models for different
use-cases:

  • A "flat" model based on a single SMT array with a bump allocator, to
    let the SMT solver handle more of the hairy details
  • A "lazy" model in the style of UC-KLEE for under-constrained symbolic
    execution
  • A "fresh" model that always returns fresh symbolic constants when read

All of these would have interesting trade-offs, but can't effectively be
explored at the moment.

This PR is the first step in making it easier to run such
experiments. The goal is to parameterize Crucible-LLVM over a typeclass
that provides (Crucible and Haskell) types and operations on memory and
pointers.

To that end, it introduces a new mem type variable. At the moment, a
lot of places in the code add the abstraction-breaking mem ~ Mem
constraint. The goal would be to gradually add operations to the Mem
class and remove such equality constraints.

Some notable downsides:

  • Increased complexity. See e.g., the pattern matches in
    Translation.Instruction, which I couldn't figure out how to
    adequately replace. Counterpoint: programming against an well-defined
    interface (i.e., a typeclass) can reduce mental complexity by
    abstracting superfluous detail.
  • Code churn within Crucible
  • API churn for consumers

Notes on progress:

  • This PR introduces the mem type variable and Mem typeclass.
  • This PR far enough to compile, I haven't run the tests but there should be no behvioral changes (modulo one explicit TODO item).
  • Translation is actually parametric over Mem, but evaluation just forces mem to be explicitly equal to the current memory model. More operations would be needed inside the Mem typeclass to make evaluation more parametric.
  • Obviously, lots of formatting is seriously messed up. If we like this direction, I can clean that up.

The idea is that we have a lot of big changes we'd potentially like to
make to the memory model:

- Issue no. 917: Decouple memory from control flow
- Issue no. 404: Improve underlying symbolic type for pointer representation
- Issue no. 399: Refactoring the memory model
- Issue no. 366: Bit-level undef/poison

The problem is that these are all very hard to get started on, because
they would involve sweeping changes across Crucible-LLVM and all
downstream packages.

We could also consider alternative memory models for different
use-cases:

- A "flat" model based on a single SMT array with a bump allocator, to
  let the SMT solver handle more of the hairy details
- A "lazy" model in the style of UC-KLEE for under-constrained symbolic
  execution
- A "fresh" model that always returns fresh symbolic constants when read

All of these would have interesting trade-offs, but can't effectively be
explored at the moment.

This commit is the first step in making it easier to run such
experiments. The goal is to parameterize Crucible-LLVM over a typeclass
that provides (Crucible and Haskell) types and operations on memory and
pointers.

To that end, it introduces a new `mem` type variable. At the moment, a
lot of places in the code add the abstraction-breaking `mem ~ Mem`
constraint. The goal would be to gradually add operations to the `Mem`
class and remove such equality constraints.

Some notable downsides:

- Increased complexity. See e.g., the pattern matches in
  `Translation.Instruction`, which I couldn't figure out how to
  adequately replace. Counterpoint: programming against an well-defined
  interface (i.e., a typeclass) can reduce mental complexity by
  abstracting superfluous detail.
- Code churn within Crucible
- API churn for consumers

Mostly done with:

    mogglo-haskell \
      '${{ focus:kind() == "type" and not string.find(focus:parent():parent():text(), "mem") and focus:text() == "LLVM" }}' \
      --replace '${{ "(LLVM mem)" }}'\
      --confirm path/to/file.hs
Evaluation just assumes `mem ~ Mem`, but translation doesn't need very
many operations.
@langston-barrett
Copy link
Contributor Author

@RyanGlScott Could you take a look and see if you like the direction this is headed? This still needs:

  • Formatting changes
  • Changes to downstream libraries

But at that point, I think it will be a self-contained first step towards parameterizing the memory model, the next steps being removing the mem ~ Mem constraints in favor of adding more operations to the Mem typeclass.

@RyanGlScott
Copy link
Contributor

Some initial thoughts:

  • There is also a mem type variable used in macaw-symbolic (here), which is typically instantiated with LLVMMemory. I'm not quite sure if the mem used here (which will typically be instantiated with the similarly-named Mem) is the same notion of mem used in macaw-symbolic... we may want to think about how to reconcile these two.
  • This is a big change—probably the biggest since the great sym/bak split a while ago. As such, I think we'd want to at least have draft PRs for the other big MTV libraries (Macaw, SAW, etc.) ready before we think about adopting this.
  • Before we commit to this, I think it would be helpful to come up with a proof-of-concept test that swaps out the existing LLVM memory model with an alternative one. It's easy to forget to update something and not realize it until you actually try using it in practice.

@langston-barrett
Copy link
Contributor Author

Closing as this has merge conflicts and I don't have time to push it to completion right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants