Refactor and fix race access distribution #1084

sim642 · 2023-06-13T13:18:38Z

This is the result of reverse-engineering the existing race access distribution code. That code is cleaned up, simplified and commented.

This does not yet do anything clever to avoid distributing accesses to many memory locations, but makes the rules for doing so a lot clearer, hopefully simplifying the redesign.
However, the number of accesses, memory locations and races may change as the result of this PR, in particular due to some included soundness fixes.

Changes

Define a new Access.Memo type for precisely representing memory locations as used by the race analysis.

The previous representation used two options, of which not all four combinations were possible, and both of them could contain offsets, which were duplicated.
The new representation has a varinfo or typ as root, followed by unit-indexed offsets. This also allows reusing features from Refactor offsets, lvals and addresses #1067.
Completely rewrite Access.add_propagate as add_distribute_outer.

Its previous logic was very confusing and completely unexplained. In some ways it had an overly specific special case, while the general case was both too precise or too imprecise, depending on the situation. The new logic generalizes this to two general principles according to which type-based accesses are distributed:
1. A type-based access is distributed to all global variables of that type.
2. A type-based access is distributed to all struct fields of that type. This aspect is recursive, i.e. those are also distributed to global variables, etc.
The new logic fixes both soundness and precision issues that existed previously.
Fix unsoundness regarding distribution to struct fields with array type.

Previously accesses via int* didn't account for the possibility that it might point to and thus race with an element of a struct field with type int[2]. Now arbitrarily deep multi-arrays are unwrapped.
Rename and flip ana.mutex.disjoint_types to ana.race.direct-arithmetic.

This should somewhat clarify the meaning of the option and avoid double negation (not !unsound). Disjoint types is rather nondescriptive since it really just controls the behavior of direct (i.e. non-field) accesses to arithmetic-type–based memory locations.
Moves the handling of ana.race.direct-arithmetic to just one place.

Previously two partial checks were in two different places, obfuscating the logic.
Exclude accesses to some anonymous pthread type internals, which show up as __anonstruct and __anonunion.

Existing exclusions are based on TNamed, so simply typedef names. But anonymous structs are precisely the ones that don't have such names.
Remove polymorphic Hashtbl usage from Access.

Although it didn't cause issues as the hashtables were with typsig keys, it's still nicer to avoid the possibility explicitly. typsigs are still used to make sure that all type-based accesses are collected regardless of any typedefs.
Add access tracing.
Add option warn.deterministic.

This makes the order of printed messages deterministic, by delaying the printing and later printing them in sorted (by text, etc) order. This is useful for writing cram tests such that the test doesn't spuriously break (e.g. on OSX) due to different varinfo IDs being in different order in some hashtables that get iterated.

TODO

Remove system-dependent cram tests.

Co-authored-by: Simmo Saan <simmo.saan@gmail.com>

5d11f6f is a soundness fix.

This reverts commit 8270028.

…keys

…o access-analysis-propagate

…ithmetic

…ble_type

Currently the order of messages may depend on varinfo IDs in hashtables or whatnot. Since these differ on Linux and OSX, so does the message order.

src/analyses/accessAnalysis.ml

michael-schwarz · 2023-06-21T22:09:41Z

src/domains/access.ml

+module Memo =
+struct
+  include Printable.StdLeaf
+  type t = [`Var of CilType.Varinfo.t | `Type of CilType.Typ.t] * Offset.Unit.t [@@deriving eq, ord, hash]


Why is this a polymorphic variant type?

This could be changed, but I don't want to do it here right now since #1089 and more follow-up already build on this, so it's best to avoid massive conflicts there.
Once all the big usability improvements to the race analysis are done, we'll see which types remain at all.

src/analyses/raceAnalysis.ml

src/domains/access.ml

src/util/options.schema.json

michael-schwarz

Definitely looks a lot more reasonable now. I am a bit unhappy with the number of TODO comments we're adding here though! I don't think we'll ever be in a better situation to address these than now.

sim642 · 2023-06-22T09:57:49Z

I am a bit unhappy with the number of TODO comments we're adding here though! I don't think we'll ever be in a better situation to address these than now.

None of the added TODOs should be new problems, they're existing (possible) problems, so it's better to mark them than completely ignore them and I cannot fix all of them in a single PR.
#1089 and its follow up should address some of them. It doesn't make sense to try to fix them here if the whole system ends up getting thrown away and redesigned from scratch.
Others have been and are open questions that I still don't have answers to. For example Access.get_type is still a complete mystery even after days of staring at it.

Co-authored-by: Michael Schwarz <michael.schwarz93@gmail.com>

karoliineh and others added 30 commits June 6, 2023 11:13

Remove unnecessary type definitions

4e884c6

Replace failwith with exception and catch that instead

2d7d09c

Refactor pattern matching

96f67f5

Refactor: remove typeSig function calls

8270028

Add type race cram tests

e07ee47

Co-authored-by: Simmo Saan <simmo.saan@gmail.com>

Remove unused function parameters from access

41c2b11

Co-authored-by: Simmo Saan <simmo.saan@gmail.com>

Refactor offset option type in access

96c26ff

Co-authored-by: Simmo Saan <simmo.saan@gmail.com>

Refactor pattern matching in add in access

913a2a7

Co-authored-by: Simmo Saan <simmo.saan@gmail.com>

Refactor compinfo in add_propagate in access

c75de0f

Add unsound race test with array and struct

f4a692e

Merge branch 'master' into access-analysis-propagate

467a2e3

Refactor Access memory location type to avoid offset redundancy

e07933e

Move Access.Memo up to add_one

4e16e1f

Simplify Access.add_struct

64d69c3

Add Access.Memo.type_of

5d11f6f

Fix Access.add_struct struct case matching

5fd977f

Add test case for typedef

23ed02d

Disable races from free in chrony-name2ipaddress

74a036f

5d11f6f is a soundness fix.

Move Access.Memo up to add_struct

36c91c5

Revert "Refactor: remove typeSig function calls"

035b7e3

This reverts commit 8270028.

Replace general Hashtbl with a specialized table with CilType.Typsig …

8ac30e9

…keys

Move Access.Memo up to add_propagate

717778a

Merge remote-tracking branch 'upstream/access-analysis-propagate' int…

219825b

…o access-analysis-propagate

Remove now-unused Access.type_from_type_offset

2ca99f3

Make `Var case more direct in Access

9728852

Make Access fallback type lazy

f38c4d6

Add TODOs to Access.get_type

2aa2837

Add Access ignorable type TODOs

491d01c

Add Access tracing

0fd1c3f

Remove unused code from Access

bbf4df0

sim642 added 4 commits June 13, 2023 14:40

Remove Access.Type_offset_error

24a559d

Rename and flip option ana.mutex.disjoint_types -> ana.race.direct-ar…

a1c87c6

…ithmetic

Extract Access.nested_offsets

db42e88

Implement nicer anonstruct and anonunion matching in Access.is_ignora…

36e47f6

…ble_type

sim642 added cleanup Refactoring, clean-up bug unsound precision documentation Documentation, comments labels Jun 13, 2023

sim642 added 3 commits June 14, 2023 10:28

Remove system-dependent parts of cram tests

d13acea

Disable info messages in type-based access cram tests

4bdc294

Try to make cram test output deterministic

eb0e1c7

Currently the order of messages may depend on varinfo IDs in hashtables or whatnot. Since these differ on Linux and OSX, so does the message order.

karoliineh mentioned this pull request Jun 15, 2023

Use TrieDomain to distribute accesses to contained fields #1089

Merged

michael-schwarz self-requested a review June 21, 2023 22:02