Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and fix race access distribution #1084

Merged
merged 54 commits into from
Jun 27, 2023
Merged

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Jun 13, 2023

This is the result of reverse-engineering the existing race access distribution code. That code is cleaned up, simplified and commented.

This does not yet do anything clever to avoid distributing accesses to many memory locations, but makes the rules for doing so a lot clearer, hopefully simplifying the redesign.
However, the number of accesses, memory locations and races may change as the result of this PR, in particular due to some included soundness fixes.

Changes

  1. Define a new Access.Memo type for precisely representing memory locations as used by the race analysis.

    The previous representation used two options, of which not all four combinations were possible, and both of them could contain offsets, which were duplicated.
    The new representation has a varinfo or typ as root, followed by unit-indexed offsets. This also allows reusing features from Refactor offsets, lvals and addresses #1067.

  2. Completely rewrite Access.add_propagate as add_distribute_outer.

    Its previous logic was very confusing and completely unexplained. In some ways it had an overly specific special case, while the general case was both too precise or too imprecise, depending on the situation. The new logic generalizes this to two general principles according to which type-based accesses are distributed:

    1. A type-based access is distributed to all global variables of that type.
    2. A type-based access is distributed to all struct fields of that type. This aspect is recursive, i.e. those are also distributed to global variables, etc.

    The new logic fixes both soundness and precision issues that existed previously.

  3. Fix unsoundness regarding distribution to struct fields with array type.

    Previously accesses via int* didn't account for the possibility that it might point to and thus race with an element of a struct field with type int[2]. Now arbitrarily deep multi-arrays are unwrapped.

  4. Rename and flip ana.mutex.disjoint_types to ana.race.direct-arithmetic.

    This should somewhat clarify the meaning of the option and avoid double negation (not !unsound). Disjoint types is rather nondescriptive since it really just controls the behavior of direct (i.e. non-field) accesses to arithmetic-type–based memory locations.

  5. Moves the handling of ana.race.direct-arithmetic to just one place.

    Previously two partial checks were in two different places, obfuscating the logic.

  6. Exclude accesses to some anonymous pthread type internals, which show up as __anonstruct and __anonunion.

    Existing exclusions are based on TNamed, so simply typedef names. But anonymous structs are precisely the ones that don't have such names.

  7. Remove polymorphic Hashtbl usage from Access.

    Although it didn't cause issues as the hashtables were with typsig keys, it's still nicer to avoid the possibility explicitly. typsigs are still used to make sure that all type-based accesses are collected regardless of any typedefs.

  8. Add access tracing.

  9. Add option warn.deterministic.

    This makes the order of printed messages deterministic, by delaying the printing and later printing them in sorted (by text, etc) order. This is useful for writing cram tests such that the test doesn't spuriously break (e.g. on OSX) due to different varinfo IDs being in different order in some hashtables that get iterated.

TODO

  • Remove system-dependent cram tests.

karoliineh and others added 30 commits June 6, 2023 11:13
Co-authored-by: Simmo Saan <simmo.saan@gmail.com>
Co-authored-by: Simmo Saan <simmo.saan@gmail.com>
Co-authored-by: Simmo Saan <simmo.saan@gmail.com>
Co-authored-by: Simmo Saan <simmo.saan@gmail.com>
@sim642 sim642 added cleanup Refactoring, clean-up bug unsound precision documentation Documentation, comments labels Jun 13, 2023
sim642 added 3 commits June 14, 2023 10:28
Currently the order of messages may depend on varinfo IDs in hashtables or whatnot.
Since these differ on Linux and OSX, so does the message order.
module Memo =
struct
include Printable.StdLeaf
type t = [`Var of CilType.Varinfo.t | `Type of CilType.Typ.t] * Offset.Unit.t [@@deriving eq, ord, hash]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a polymorphic variant type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be changed, but I don't want to do it here right now since #1089 and more follow-up already build on this, so it's best to avoid massive conflicts there.
Once all the big usability improvements to the race analysis are done, we'll see which types remain at all.

Copy link
Member

@michael-schwarz michael-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely looks a lot more reasonable now. I am a bit unhappy with the number of TODO comments we're adding here though! I don't think we'll ever be in a better situation to address these than now.

@sim642
Copy link
Member Author

sim642 commented Jun 22, 2023

I am a bit unhappy with the number of TODO comments we're adding here though! I don't think we'll ever be in a better situation to address these than now.

None of the added TODOs should be new problems, they're existing (possible) problems, so it's better to mark them than completely ignore them and I cannot fix all of them in a single PR.
#1089 and its follow up should address some of them. It doesn't make sense to try to fix them here if the whole system ends up getting thrown away and redesigned from scratch.
Others have been and are open questions that I still don't have answers to. For example Access.get_type is still a complete mystery even after days of staring at it.

Co-authored-by: Michael Schwarz <michael.schwarz93@gmail.com>
@sim642 sim642 added this to the v2.2.0 milestone Jun 27, 2023
@sim642 sim642 merged commit 6709f82 into master Jun 27, 2023
@sim642 sim642 deleted the access-analysis-propagate branch June 27, 2023 08:58
sim642 added a commit to sim642/opam-repository that referenced this pull request Sep 13, 2023
CHANGES:

* Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019).
* Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016).
* Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082).
* Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073).
* Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048).
* Add affine equalities analysis (goblint/analyzer#592).
* Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114).
* Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979).
* Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952).
* Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124).
* Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777).
* Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031).
* Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078).
* Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091).
* Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027).
* Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053).
* Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138).
* Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cleanup Refactoring, clean-up documentation Documentation, comments precision unsound
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants