Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strongly update-able calloc blobs for more precise zstd analysis #702

Open
sim642 opened this issue Apr 26, 2022 · 3 comments
Open

Strongly update-able calloc blobs for more precise zstd analysis #702

sim642 opened this issue Apr 26, 2022 · 3 comments

Comments

@sim642
Copy link
Member

sim642 commented Apr 26, 2022

Problem

Our analysis of zstd ends up with unexpectedly many unknown pointers due to the following sequence of coincidences.

Thread pool and its queue

In zstd's POOL_create_advanced, the thread pool and its queue are allocated on the heap as follows (simplified):

ctx = (POOL_ctx*)ZSTD_customCalloc(sizeof(POOL_ctx), customMem);
ctx->queueSize = queueSize + 1;
ctx->queue = (POOL_job*)ZSTD_customMalloc(ctx->queueSize * sizeof(POOL_job), customMem);
ctx->queueHead = 0;
// ...

Here ctx and both alloc variables are in the local state. Thanks to #686 the alloc variables escape into global state correctly.

When jobs are added to the queue, their function pointers are nicely joined into the queue's alloc variable, which precisely contains known function pointers and no NULL or unknown pointer.

However later in the thread pool threads, they retrieve jobs from the queue using ctx->queue[ctx->queueHead], i.e. *(ctx->queue + ctx->queueHead) after CIL normalization. Confusingly, the function pointer in the returned job struct contains an additional unknown pointer, despite the queue's alloc variable not containing one.

Extensive invalidation

Turns out this is because ctx's alloc variable also has a NULL pointer possible for the queue field. When pointer arithmetic is applied (and queueHead is no longer constant either), the known alloc pointer is nicely kept, but the NULL pointer is replaced by an unknown pointer using this code:

let default = function
| Addr.NullPtr when GobOption.exists (BI.equal BI.zero) (ID.to_int n) -> Addr.NullPtr
| _ -> Addr.UnknownPtr
in
match Addr.to_var_offset addr with
| Some (x, o) -> Addr.from_var_offset (x, addToOffset n (Some x.vtype) o)
| None -> default addr

Analyzing zstd, the unknown function pointer is absolutely disastrous effect:

  1. An unknown function pointer call is analyzed by special to invalidate its arguments.
  2. In zstd, one of the thread pool jobs is passed an argument, which inside it contains a pointer to the same thread pool ctx.
  3. The resulting invalidation is extensive and invalidates:
    • ctx alloc variable (so an unknown pointer is added to queue's address set directly as well),
    • queue alloc variable (so NULL and unknown pointers are added to the job's function pointer and argument sets),
    • all the argument structs for all the jobs (holding all of the relevant data) are totally invalidated.

Culprit

The culprit of all of this is the NULL pointer in the queue field. If we didn't have that there, none of this invalidation would have happened!
Tracking this down was a two-day job since the cyclic pointing structure means that everything ends up being invalidated and it's impossible to tell from the final results, where this imprecision originated from. Moreover, it's practically impossible to trace Goblint when the problematic analysis takes over 40 minutes and produces way too much tracing output (and not to mention being slowed down by all the I/O).

Solution

Ideally, Goblint should know that queue cannot be a NULL pointer, avoiding all of the above. The source of the NULL pointer in the first place is that ctx is allocated using calloc. But clearly that NULL pointer is overwritten before ctx is seen by any other code, so it should be avoidable.

Although simple on the surface, to be able to strongly update the queue field requires many big pieces to work together:

  1. Uniqueness analysis for Juliet suite std_thread wrappers #186 is needed to know that we have a uniquely allocated blob, otherwise strong updates wouldn't be sound.
  2. ValueDomain.update_offset must use that uniqueness information to replace the weak (joining) update with a strong (replacing) one here:
    | `Blob (x,s,orig), _ ->
    begin
    let l', o' = shift_one_over l o in
    let x = zero_init_calloced_memory orig x t in
    mu (`Blob (join x (do_update_offset ask x offs value exp l' o' v t), s, orig))
  3. Even that would not be enough, because calloc creates an abstract array of blobs:
    set_many ~ctx (Analyses.ask_of_ctx ctx) gs st [(add_null (AD.from_var heap_var), TVoid [], `Array (CArrays.make (IdxDom.of_int (Cilfacade.ptrdiff_ikind ()) BI.one) (`Blob (VD.bot (), eval_int (Analyses.ask_of_ctx ctx) gs st size, false))));

    Therefore that array update also needs to be strong, not weak. In zstd's case (and very commonly), calloc is used with a count of 1, so we get an array of length 1. Hence the array domain(s) should be specialized for the length 1 case to just do strong updates instead.
    (After Fixes/calloc #145 there's probably a bug in that calloc code because it always creates an array of length 1, instead of using the count from the calloc argument.)
  4. Even with all of the above, I'm not sure if we'd be precise enough, because in zstd thread pools may be created when Goblint is in multithreaded mode, so it might even be necessary to incorporate Primitive malloc freshness analysis #690 into the uniqueness and privatization to always have a local strongly update-able copy of the blob.
@michael-schwarz
Copy link
Member

  1. and 2. are actually things that are all tackled within Uniqueness analysis for Juliet suite std_thread wrappers #186 by our Master's Practical student who will hopefully make a PR any day now. His final presentation is scheduled for next week. I think on top of 1 and 2, one needs to ensure that the entire blob is written at the same time (by checking the size of the pointer that is used to write) to actually perform a string update.
    Otherwise, one might just be updating one part of the blob and not all of it.

(After #145 there's probably a bug in that calloc code because it always creates an array of length 1, instead of using the count from the calloc argument.)

I don't think so, since the allocated blob already has the product of both calloc arguments as its length. The problem is that nothing enforces that what is allocated via calloc is actually used an array.

@sim642
Copy link
Member Author

sim642 commented Apr 26, 2022

  1. and 2. are actually things that are all tackled within Uniqueness analysis for Juliet suite std_thread wrappers #186 by our Master's Practical student who will hopefully make a PR any day now.

I guess the ultimate test of the student's work will be whether it's able to fix this whole zstd imprecision.

I think on top of 1 and 2, one needs to ensure that the entire blob is written at the same time (by checking the size of the pointer that is used to write) to actually perform a string update.
Otherwise, one might just be updating one part of the blob and not all of it.

Something along the lines might be necessary indeed. Although this is even more crucial for general things like #696.

I don't think so, since the allocated blob already has the product of both calloc arguments as its length.

This is actually exactly what #145 changed: it used to be a product, but it was changed to be kept separate (which is more general), but it looks like the Calloc comment in base analysis was forgotten to be removed.

@michael-schwarz
Copy link
Member

We should investigate how much of this is resolved by #722.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants