Refactor heap and cache outers #12608

liufengyun · 2021-05-26T08:54:09Z

Some refactoring:

Refactor heap
Cache outers
Use opaque types for domain definitions

This is safe because heap is monotonistic and fields abstractions are immutable

liufengyun · 2021-05-26T08:59:12Z

@olhotak @EnzeXing This PR will make the experiments easier.

Allowing non-hot as arguments to primary constructors would need to remember the value of the parameters in the heap. Thus reintroduction of the heap and addresses is inevitable. It also helps with performance as it enables more caching.

olhotak · 2021-05-26T16:44:59Z

In extending the abstract heap to reason about multiple objects, a key concern is the design of addresses. Since in this case, we want must information (sets of fields that must have been assigned), we need each abstract address to represent (at most) one single concrete object. ThisRef(C) is a good address in that it represents the unique object currently pointed to by this. All-objects-of-class-B is a bad address in that there may be multiple concrete objects of class B at a time.

Since we want the heap to reason about objects whose constructors we are analyzing, perhaps the design of addresses could be based on the stack of constructors that we analyze. For example, for the program

class A {
  val b = new B(A.this)
}

the call stack in our analysis might look like this:

1: constructor of B called from
0: constructor of A

Then our addresses could be something like ThisRef(0) and ThisRef(1), referring to the values of this in the two frames of the call stack. (Only constructor frames would give rise to addresses. Frames for methods would not because a method call does not create a new object.)

liufengyun · 2021-05-26T18:10:51Z

@olhotak Thanks for the feedback. This PR is essentially the same as the master --- the heap enables more caching. It would be good to motivate the proposal above with concrete examples which are related to concerns about expressiveness, soundness or termination. Let's discuss more tomorrow.

olhotak · 2021-05-28T14:49:39Z

I thought some more about how to design the Heap and Addr.

We have two analyses, init and eval. Only init mutates the fields of the current Objekt. eval reads them when it encounters ThisRef, but it doesn't change them. Also, init mutates the fields of only one single Objekt.

This suggests keeping a single Objekt with mutable fields.

But then we want to leak non-hot values to methods (method parameters), constructors, and to local variables of methods. For this, we need to keep track of which method parameters, constructor parameters (fields), and local variables have non-hot values. So we need an Environment. An Environment should be a map from parameters/variables to Values. Also, outer, outer.outer, outer.outer.outer.... are like variables, so also can be keys of the Environment. Perhaps this Environment is what the generalized Heap was intended to be. And Addr were intended to be its keys. But parameters/variables/outer are not abstractions of concrete addresses, so this is confusing.

Unlike Objekt/Heap, such an Environment is immutable in that it is passed as input to eval and to init, but does not have mutable fields. The Objeckt/Heap is passed to init for reading and writing. eval needs to read it but not mutate it.

Then perhaps such an Environment can replace thisV. Although the type of thisV is a Value, it can really only be ThisRef(C) or Warm(C, outer: Value), indicating that outer is hot or outer has some other value, respectively. If we generalize to allow variables/parameters other than outer to be non-hot, then the temperature of outer can be recorded together with the temperature of other variables/parameters, rather than on thisV.

I find Warm a bit confusing because it is not used in general for warm values (all fields assigned with not necessarily hot values) but only for the special case of a ThisRef whose outer is non-hot.

After this, one possible further precision improvement could be to extend the Value lattice with something between Warm and Cold, an object with some fields assigned and others not, much like Objekt, but as a Value, and thus with immutable fields. It could be an immutable view of Objekt. This would be useful if we want to allow leaking a this with not all fields initialized into methods/constructors and we want to allow accessing specific fields of that this in those methods/constructors.

liufengyun · 2021-05-30T19:42:55Z

Thank you @olhotak for the many good points, I agree with many of them.

At the high-level, many styles of analysis are equivalent in essence,
however they make a big difference in engineering qualities. E.g., the
master (based on abstract definitional interpreters, ADI) and the
type-and-effect system are both receiver-polyvariant and have same the
time complexity, but the former has better engineering qualities:

the length restriction is gone
simplicity and maintainability
extensibility

In addition to local reasoning, another aspect that is heavily
employed in both the previous type-and-effect analysis and the current
ADI is heap monotonicity. Thanks to it, in the ADI version, we can
cache without making the heap part of the key and get away from
fixed-point computation. That is also the reason why we don't need to
thread through the heap, but instead use it as a global cache. It's
just one line of difference in code, but it's of far-reaching
importance.

Note that it's not enough that eval does not mutate the heap. Heap
monotonicity in addition ensures that if we have a more initialized
heap, the result of eval for the same expression and environment
should still produce a sound abstraction for the concrete value of the
expression computed from the old heap. That enables caching without
heap as part of the key and without fix-point computation.

I conjecture many analysis lie in the category of heap-monotonistic
analysis, thus can be implemented in the same style as initialization
analysis. For example, in type analysis, mutation does not play a role
in type inference --- the type of the initializer of a field determines
its type.

But then we want to leak non-hot values to methods (method parameters), constructors, and to local variables of methods. For this, we need to keep track of which method parameters, constructor parameters (fields), and local variables have non-hot values. So we need an Environment. An Environment should be a map from parameters/variables to Values.

Yes, that's correct. Introduction of environment is a natural step. We
need to be careful with the introduction to ensure soundness,
termination and good performance. For example, Fun may now capture
the environment, finitizing measures must be taken. The same holds for
local classes. BTW, local variables and parameters can be handled
differently, we can discuss the detail when we reach the point.

As a first step, to support non-hot values to primary constructors,
actually we can avoid introducing environments, which will be a
straight-forward extension based on this PR.

To support non-hot values for secondary constructors, we may still
keep the abstract domain for functions and (local) classes the
same. But care must be taken for closures and local classes in
secondary constructors, which should be rare in practice but still
possible.

Also, outer, outer.outer, outer.outer.outer.... are like variables, so also can be keys of the Environment.

Conceptually, it makes sense. However, it ignores an important
semantic property of inner classes if the outer is always a stable
path: i.e. the outers of super-classes are all determined by the
immediate outer of the concrete class of an object. That property
enables a more concise representation of outers. That is also the
reason why the abstract address Warm only contains one outer as part
of the key to the abstract heap (summary).

Then perhaps such an Environment can replace thisV.

I agree, conceptually it's possible to treat this as the parameter 0
of methods. However, there is no engineering benefits in doing
so. Even in coarse-grained analysis like type systems, it handles
this differently from parameters. In our setting, the analysis is
receiver-sensitive (or receiver-polyvariant), thus it is more
justified to keep it as an outstanding environment value. Otherwise,
we just need more zig-zags in the code.

If we generalize to allow variables/parameters other than outer to be non-hot, then the temperature of outer can be recorded together with the temperature of other variables/parameters, rather than on thisV.

This does not sound like a good idea from the engineering point of
view. The are many equivalent ways to view things abstractly, but
some representations are more natural and simpler than others. To the
extreme, we can also eliminate the concept of object in OOPL
semantics with environment and closures, but there is no real
engineering benefits in doing so.

The fact that we associate outers with object in concrete semantics
shows that it's a natural and convenient representation.

On the other hand, a design consideration of ADI is to be as close as
possible to the concrete semantics in the design. Doing things
otherwise will be inconsistent and reaps no engineering benefit.

I find Warm a bit confusing because it is not used in general for warm values (all fields assigned with not necessarily hot values) but only for the special case of a ThisRef whose outer is non-hot.

This is because currently we only support warm objects of inner
classes. We can easily support leaking non-hot values to constructors,
thus have Warm represent other values as well. Warm is just a key
to the abstraction summary of similarly initialized concrete objects.

After this, one possible further precision improvement could be to extend the Value lattice with something between Warm and Cold, an object with some fields assigned and others not, much like Objekt, but as a Value, and thus with immutable fields.

It's tempting to allow such use cases. However, we need to be cautious
here, because what the analysis can do does not imply it should
do. There are two concerns:

Whether allowing subtle interference of two objects is a good practice
It has a big impact on worse-case performance (care also needs to be taken for termination)

Coming back to the PR itself, this PR is intended to achieve the following
engineering benefits:

Enable caching for outers of warm objects and simplify the logic
Uniformly handle Warm and ThisRef in semantics
Make it easier to handle warm objects where the class parameters can be non-hot
Use opaque types for some abstract domains

I believe the current PR is the simplest from the perspective of
engineering and it paves the way for future extension. It benefits a
lot from our discussions --- see how far we have been from the
type-and-effect system. That said, I'm always open to better
proposals that enjoy the same engineering qualities.

compiler/src/dotty/tools/dotc/transform/init/Semantic.scala

olhotak · 2021-06-01T16:01:41Z

compiler/src/dotty/tools/dotc/transform/init/Semantic.scala

-  case class Objekt(klass: ClassSymbol, val fields: mutable.Map[Symbol, Value]) {
-    val promotedValues = mutable.Set.empty[Value]
-  }
+  case class Objekt(klass: ClassSymbol, fields: mutable.Map[Symbol, Value], outers: mutable.Map[ClassSymbol, Value])

  /** Abstract heap stores abstract objects
   *
   *  As in the OOPSLA paper, the abstract heap is monotonistic.
   *
   *  This is only one object we need to care about, hence it's just `Objekt`.


Remove this line of the comment.

Good catch, now it's removed

olhotak · 2021-06-01T16:08:30Z

compiler/src/dotty/tools/dotc/transform/init/Semantic.scala

          if target.isOneOf(Flags.Method | Flags.Lazy) then
            if target.hasSource then
+              val cls = target.owner.enclosingClass.asClass


This changed from target.owner.asClass to target.owner.enclosingClass.asClass. Is that intended?

Yes, it's intentional -- here we reuse the code for calling local methods.

The related code change can be found below for case id: Ident =>.

olhotak · 2021-06-01T16:14:13Z

compiler/src/dotty/tools/dotc/transform/init/Semantic.scala

          val value = Warm(klass, outer)
+          if !heap.contains(value) then
+            val obj = Objekt(klass, fields = mutable.Map.empty, outers = mutable.Map(klass -> outer))


Will this Objekt continue to have empty fields and outers forever since we never call init on the new value?

The call happens at line 367 -- it's a constructor call on the object.

liufengyun added 7 commits May 22, 2021 09:15

WIP - reintroduce addresses

a568573

Use opaque type for type aliases

7cf68c0

Create warm object in heap if absent

13f2634

Fix class for evaluating local methods

4edadb2

Fix key for this resolution

ad0da27

Make sure widening does not refer non-existent abstract objects

c17bbfe

Heap may serve as global cache for warm objects

0c45807

This is safe because heap is monotonistic and fields abstractions are immutable

Handle outers of nested Java interfaces

ca724f7

liufengyun marked this pull request as ready for review May 26, 2021 10:06

olhotak reviewed May 31, 2021

View reviewed changes

compiler/src/dotty/tools/dotc/transform/init/Semantic.scala Show resolved Hide resolved

olhotak approved these changes Jun 1, 2021

View reviewed changes

Address review comments

9527eac

liufengyun merged commit a00fb3f into scala:master Jun 1, 2021

liufengyun deleted the refactor-heap branch June 1, 2021 18:35

Kordyjan added this to the 3.0.2 milestone Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor heap and cache outers #12608

Refactor heap and cache outers #12608

Uh oh!

liufengyun commented May 26, 2021

Uh oh!

liufengyun commented May 26, 2021

Uh oh!

olhotak commented May 26, 2021

Uh oh!

liufengyun commented May 26, 2021

Uh oh!

olhotak commented May 28, 2021

Uh oh!

liufengyun commented May 30, 2021

Uh oh!

Uh oh!

olhotak Jun 1, 2021

Uh oh!

liufengyun Jun 1, 2021

Uh oh!

olhotak Jun 1, 2021

Uh oh!

liufengyun Jun 1, 2021

Uh oh!

olhotak Jun 1, 2021

Uh oh!

liufengyun Jun 1, 2021

Uh oh!

Uh oh!

Refactor heap and cache outers #12608

Refactor heap and cache outers #12608

Uh oh!

Conversation

liufengyun commented May 26, 2021

Uh oh!

liufengyun commented May 26, 2021

Uh oh!

olhotak commented May 26, 2021

Uh oh!

liufengyun commented May 26, 2021

Uh oh!

olhotak commented May 28, 2021

Uh oh!

liufengyun commented May 30, 2021

Uh oh!

Uh oh!

olhotak Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

liufengyun Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

olhotak Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

liufengyun Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

olhotak Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

liufengyun Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!