Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uc-crux-llvm: User-provided specifications for external functions #982

Closed
langston-barrett opened this issue May 12, 2022 · 5 comments · Fixed by #1003
Closed

uc-crux-llvm: User-provided specifications for external functions #982

langston-barrett opened this issue May 12, 2022 · 5 comments · Fixed by #1003
Assignees
Labels
enhancement uc-crux Issues specific to under-constrained crux

Comments

@langston-barrett
Copy link
Contributor

UC-Crux has the ability to skip over execution of declared but not defined functions, e.g., from a third-party library or libc. This is practical, but it has its disadvantages:

  • It's not overapproximate (sound): the execution of those functions might modify their pointer arguments or global variables. This means the safety claims made by UC-Crux after functions are skipped are much less strong.
  • It's also not underapproximate, meaning that any bugs found by UC-Crux may not actually be feasible in practice.

For libc functions in particular, we could manually implement overrides (in Haskell) for some of them. Sound overrides could live in crucible-llvm, and unsound ones in UC-Crux. However, this takes considerable effort and still doesn't work for other third-party libraries.

Instead, we should implement deserialization of function postconditions (#981), probably from YAML, along with specifications of whether the provided postconditions are overapproximate, underapproximate, both, or neither (for the sake of #932 and tracking of unsoundness). This would mostly require making non-GADT versions of UC-Crux's data structures and working out how to convert to/from them.

@langston-barrett
Copy link
Contributor Author

After a bit more reflection I now think that the ideal input language for this feature would be more rich than what UC-Crux currently provides via the Postcond type. The ideal is probably closer to SAW's notion of a collection of overrides. For each external function, the user could provide n specifications, where a spec consists of:

  • A precondition (the existing Precond type)
  • An optional postcondition (i.e., a summary of the function's effect on the program state)
  • A soundness tag, one of
    • Overapproximate
    • Underapproximate
    • Neither
    • Both

The precondition would be a constraint on the program state which specifies when the spec applies (e.g., f returns a positive number, but only if given a positive number as an input). Together with these n specs, the user could provide the intended behavior if none of their preconditions match, which might be either (1) an error (like SAW does if no overrides match) or (2) fallback to the current unsound function-skipping behavior.

For example, the following is a spec for a division function, stating that the output is symbolic and non-negative if the numerator is non-negative and denominator is positive or if both the numerator is non-positive and denominator is negative.

- name: div
  specs:
  - # spec 1: non-negative / positive = non-negative 
    pre:
    - args:
      - # numerator
        signed-greater-or-equal: 0
      - # denominator
        signed-greater: 0
    post:
    - ret:
        symbolic:
          signed-greater-or-equal: 0
  - # spec 2: non-positive / negative = non-negative
    pre:
    - args:
      - # numerator
        signed-less-or-equal: 0
      - # denominator
        signed-less: 0
    post:
    - ret:
        symbolic:
          greater-or-equal: 0
  # neither under-approximate nor over-approximate
  soundness: neither
  # failure to match all specs is an error (1), also see below
  match-failure: error

If the behavior in case of failure to match preconditions is an error, then UC-Crux could also use the specs to infer preconditions on callers of the external function, i.e., the caller should guarantee that it satisfies the intersection of the preconditions of the specs for the external function at the callsite. For example, in the above spec, match-failure: error indicates that it is an error to pass a denominator of zero (since doing so would not match the preconditions of any of the specs).

If the user doesn't provide a postcondition for an override, the default behavior would also be to fall back to the unsound skipping. It would only really make sense to provide such overrides if failure to match preconditions is treated as an error.

It would also be nice to provide a way to specify values that aren't concrete nor symbolic, but are derived from existing values. For instance, fgets(3) says:

char *fgets(char *restrict s, int size, FILE *restrict stream);

fgets() returns s on success, and NULL on error or when end  of  file  occurs  while  no
characters have been read.

which might be specified like so:

- name: fgets
  specs:
  - pre: {}
    post:
    - ret:
        # NULL, i.e., EOF or error
        constant: 0
  - pre: {}
    post:
    - ret:
        # s, i.e., success
        arg: 0
  soundness: neither

@travitch
Copy link
Contributor

travitch commented Jun 1, 2022

Are there any existing formats we might take advantage of for this? I don't have a list of candidates offhand, but it is worth us thinking about collectively

@langston-barrett
Copy link
Contributor Author

langston-barrett commented Jun 2, 2022

Are there any existing formats we might take advantage of for this? I don't have a list of candidates offhand, but it is worth us thinking about collectively

Great question. I also don't know of any, but I looked into a few verification tools. I couldn't find anything for 2LS or Seahorn, but it looks like none of Angr, CBMC, KLEE, Infer, or SMACK use a common or reusable format for procedure summaries.

Angr

Angr allows symbolic function summaries (example: fgetc), which are very much like Crucible's Haskell overrides.

CBMC

CBMC provides simple C models of libc functions. These appear to be sound and precise. From their docs, it sounds to me like they expect to have C definitions for all functions, and don't support over- or under-approximate user-provided specs.

Quotes from CMBC docs

A Note About Compilers and the ANSI-C Library

Most C programs make use of functions provided by a library; instances are functions from the standard ANSI-C library such as malloc or printf. The verification of programs that use such functions has two requirements:

  • Appropriate header files have to be provided. These header files contain declarations of the functions that are to be used.
  • Appropriate definitions have to be provided.

Linking Libraries

Some software projects come with their own libraries. Also, the goal may be to analyze a library by itself. For this purpose it is possible to use goto-cc to link multiple model files into a library of model files. An object file can then be linked against this model library. For this purpose, goto-cc also features a linker mode.

KLEE

KLEE relies on being able to call external functions. It can concretize symbolic arguments in order to do so.

Infer

Infer has a bunch of different analyses, at least one allows specifying really basic properties of external functions as command-line arguments, e.g.:

--pulse-model-return-first-arg string

Regex of methods that should be modelled as returning the first argument in Pulse

SMACK

SMACK allows inline Boogie code, which can be used to provide definitions of external functions.

@kquick
Copy link
Member

kquick commented Jun 3, 2022

I don't know if you've locked in terminology, but reading soundness: neither was confusing. I'd suggest some alternatives that are more informative in a standalone usage:

Overapproximate
Underapproximate
Approximate   (replacing "Both")
Precise   (replacing "Neither")

@langston-barrett
Copy link
Contributor Author

langston-barrett commented Jun 15, 2022

I've been exploring the space of possible implementations of this feature. It's actually a little complicated! The following are notes mostly to myself.

Implementation notes

The general idea would be to replace the postcondition (UPostcond) that gets fed into a Skip override with a NonEmpty Spec, where a Spec is what goes in an entry in the specs: list in the YAML examples above.

Seems easy enough, but the issue is that we'd end up with have a Map (FuncSymbol m) (Some Spec), and then when creating a Skip override we'd have to somehow "typecheck" the spec against the function it's overriding. This was easy enough to do with UPostcond because it was designed to make this easy, and has less stuff in it than a Spec (the current implementation with UPostcond is similar to, and shares some of the disadvantages with, option (2) below).

So here are the alternatives for generating or preserving evidence that a Spec is well-typed with respect to a specific function/CFG:

  1. Enhance the maps in ModuleContext and elsewhere with MapFs with typed keys, i.e. keys with symbol (function name) and function signature type indices

    a. Keys are (SymbolRepr, FuncSigRepr) pairs

    • Pros:
      • No unsafeCoerce required
      • Simple to implement
      • Extracting a function name from a key stays at O(1)
    • Cons:
      • Keys are big, and are used across several maps - lots of memory!
      • Equality comparisons are slow

    b. Keys are Nonces

    • Pros:
      • Fast
      • No unsafeCoerce in my code (though it is used for Nonce)
    • Cons:
      • Nonce isn't actually a singleton, so TestEquality is unlawful
      • Multiple maps with Nonce keys won't be guaranteed to share keys, defeating the partiality-avoiding lookup guarantees of the m type parameter as in DeclMap.
      • Extracting a function name from a key goes from O(1) to O(log(n))

    c. Use Data.Parameterized.HashTable

    • Pros:
      • Low space overhead
      • No unsafeCoerce required
    • Cons:
      • Computational overhead of hashing
      • Have to implement HashableF for FuncSigRepr - probably straightforward?
      • Have to look up (SymbolRepr, FuncSigRepr) pairs from function names, then use those to look up things like Specs.
      • It would be nice to have the m type parameter correspond to the s in the HashTable - this would require unsafeCoerce though

    d. Keys are hash(function name, function sig) using some kind of Hash newtype in UC-Crux

    • Pros:
      • Low space and time overhead
    • Cons:
      • Requires reimplementing hash table interface
      • Requires use of unsafeCoerce
      • Have to implement HashableF for FuncSigRepr - probably straightforward?
      • Have to look up (SymbolRepr, FuncSigRepr) pairs from function names, then use those to look up things like Specs.
  2. Type-check Spec against a FuncSigRepr when creating a Skip override

    • Pros:
      • Reasonably straightforward
      • No extra memory consumption (as opposed to the (SymbolRepr, FuncSigRepr) key scheme described above in (1a))
    • Cons:
      • Either causes some duplication with the type-checking code in View, or somewhat complicates that code to avoid such duplication
      • Does require repeated, unnecessary computation (it's a violation of "parse, don't validate")
      • Creates a situation in which the spec could conceivably not match the function, introducing panics when typechecking fails
  3. Pair each Spec with a FuncSigRepr, both jointly existentially quantified over the function signature type parameter, then check the FuncSigRepr against the function/CFG's FuncSigRepr when creating a Skip override

    • Pros:
      • Reasonably straightforward
      • No unsafeCoerce required
    • Cons:
      • Uses more memory to duplicate the FuncSigRepr
      • Also requires repeated, unnecessary computation (it's a violation of "parse, don't validate")
      • Creates a situation in which the spec could conceivably not match the function, introducing panics when typechecking fails

We'd have this same issue even if the code wasn't dependently-typed - it's just that our only option would be (2), type-checking every time we want to use a spec.

Currently leaning towards (1c).

langston-barrett added a commit that referenced this issue Jun 16, 2022
This functionality will be useful for implementing #982.
langston-barrett added a commit that referenced this issue Jun 16, 2022
This functionality will be useful for implementing #982.
langston-barrett added a commit that referenced this issue Jun 16, 2022
This functionality will be useful for implementing #982.
langston-barrett added a commit that referenced this issue Jun 17, 2022
This functionality will be useful for implementing #982.
langston-barrett added a commit that referenced this issue Jun 17, 2022
langston-barrett added a commit that referenced this issue Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement uc-crux Issues specific to under-constrained crux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants