Skip to content

Refactor heap and cache outers #12608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 1, 2021
Merged

Conversation

liufengyun
Copy link
Contributor

Some refactoring:

  • Refactor heap
  • Cache outers
  • Use opaque types for domain definitions

@liufengyun
Copy link
Contributor Author

@olhotak @EnzeXing This PR will make the experiments easier.

Allowing non-hot as arguments to primary constructors would need to remember the value of the parameters in the heap. Thus reintroduction of the heap and addresses is inevitable. It also helps with performance as it enables more caching.

@liufengyun liufengyun marked this pull request as ready for review May 26, 2021 10:06
@olhotak
Copy link
Contributor

olhotak commented May 26, 2021

In extending the abstract heap to reason about multiple objects, a key concern is the design of addresses. Since in this case, we want must information (sets of fields that must have been assigned), we need each abstract address to represent (at most) one single concrete object. ThisRef(C) is a good address in that it represents the unique object currently pointed to by this. All-objects-of-class-B is a bad address in that there may be multiple concrete objects of class B at a time.

Since we want the heap to reason about objects whose constructors we are analyzing, perhaps the design of addresses could be based on the stack of constructors that we analyze. For example, for the program

class A {
  val b = new B(A.this)
}

the call stack in our analysis might look like this:

1: constructor of B called from
0: constructor of A

Then our addresses could be something like ThisRef(0) and ThisRef(1), referring to the values of this in the two frames of the call stack. (Only constructor frames would give rise to addresses. Frames for methods would not because a method call does not create a new object.)

@liufengyun
Copy link
Contributor Author

@olhotak Thanks for the feedback. This PR is essentially the same as the master --- the heap enables more caching. It would be good to motivate the proposal above with concrete examples which are related to concerns about expressiveness, soundness or termination. Let's discuss more tomorrow.

@olhotak
Copy link
Contributor

olhotak commented May 28, 2021

I thought some more about how to design the Heap and Addr.

We have two analyses, init and eval. Only init mutates the fields of the current Objekt. eval reads them when it encounters ThisRef, but it doesn't change them. Also, init mutates the fields of only one single Objekt.

This suggests keeping a single Objekt with mutable fields.

But then we want to leak non-hot values to methods (method parameters), constructors, and to local variables of methods. For this, we need to keep track of which method parameters, constructor parameters (fields), and local variables have non-hot values. So we need an Environment. An Environment should be a map from parameters/variables to Values. Also, outer, outer.outer, outer.outer.outer.... are like variables, so also can be keys of the Environment. Perhaps this Environment is what the generalized Heap was intended to be. And Addr were intended to be its keys. But parameters/variables/outer are not abstractions of concrete addresses, so this is confusing.

Unlike Objekt/Heap, such an Environment is immutable in that it is passed as input to eval and to init, but does not have mutable fields. The Objeckt/Heap is passed to init for reading and writing. eval needs to read it but not mutate it.

Then perhaps such an Environment can replace thisV. Although the type of thisV is a Value, it can really only be ThisRef(C) or Warm(C, outer: Value), indicating that outer is hot or outer has some other value, respectively. If we generalize to allow variables/parameters other than outer to be non-hot, then the temperature of outer can be recorded together with the temperature of other variables/parameters, rather than on thisV.

I find Warm a bit confusing because it is not used in general for warm values (all fields assigned with not necessarily hot values) but only for the special case of a ThisRef whose outer is non-hot.

After this, one possible further precision improvement could be to extend the Value lattice with something between Warm and Cold, an object with some fields assigned and others not, much like Objekt, but as a Value, and thus with immutable fields. It could be an immutable view of Objekt. This would be useful if we want to allow leaking a this with not all fields initialized into methods/constructors and we want to allow accessing specific fields of that this in those methods/constructors.

@liufengyun
Copy link
Contributor Author

Thank you @olhotak for the many good points, I agree with many of them.

At the high-level, many styles of analysis are equivalent in essence,
however they make a big difference in engineering qualities. E.g., the
master (based on abstract definitional interpreters, ADI) and the
type-and-effect system are both receiver-polyvariant and have same the
time complexity, but the former has better engineering qualities:

  • the length restriction is gone
  • simplicity and maintainability
  • extensibility

In addition to local reasoning, another aspect that is heavily
employed in both the previous type-and-effect analysis and the current
ADI is heap monotonicity. Thanks to it, in the ADI version, we can
cache without making the heap part of the key and get away from
fixed-point computation. That is also the reason why we don't need to
thread through the heap, but instead use it as a global cache. It's
just one line of difference in code, but it's of far-reaching
importance.

Note that it's not enough that eval does not mutate the heap. Heap
monotonicity in addition ensures that if we have a more initialized
heap, the result of eval for the same expression and environment
should still produce a sound abstraction for the concrete value of the
expression computed from the old heap. That enables caching without
heap as part of the key and without fix-point computation.

I conjecture many analysis lie in the category of heap-monotonistic
analysis, thus can be implemented in the same style as initialization
analysis. For example, in type analysis, mutation does not play a role
in type inference --- the type of the initializer of a field determines
its type.

But then we want to leak non-hot values to methods (method parameters), constructors, and to local variables of methods. For this, we need to keep track of which method parameters, constructor parameters (fields), and local variables have non-hot values. So we need an Environment. An Environment should be a map from parameters/variables to Values.

Yes, that's correct. Introduction of environment is a natural step. We
need to be careful with the introduction to ensure soundness,
termination and good performance. For example, Fun may now capture
the environment, finitizing measures must be taken. The same holds for
local classes. BTW, local variables and parameters can be handled
differently, we can discuss the detail when we reach the point.

As a first step, to support non-hot values to primary constructors,
actually we can avoid introducing environments, which will be a
straight-forward extension based on this PR.

To support non-hot values for secondary constructors, we may still
keep the abstract domain for functions and (local) classes the
same. But care must be taken for closures and local classes in
secondary constructors, which should be rare in practice but still
possible.

Also, outer, outer.outer, outer.outer.outer.... are like variables, so also can be keys of the Environment.

Conceptually, it makes sense. However, it ignores an important
semantic property of inner classes if the outer is always a stable
path: i.e. the outers of super-classes are all determined by the
immediate outer of the concrete class of an object. That property
enables a more concise representation of outers. That is also the
reason why the abstract address Warm only contains one outer as part
of the key to the abstract heap (summary).

Then perhaps such an Environment can replace thisV.

I agree, conceptually it's possible to treat this as the parameter 0
of methods. However, there is no engineering benefits in doing
so. Even in coarse-grained analysis like type systems, it handles
this differently from parameters. In our setting, the analysis is
receiver-sensitive (or receiver-polyvariant), thus it is more
justified to keep it as an outstanding environment value. Otherwise,
we just need more zig-zags in the code.

If we generalize to allow variables/parameters other than outer to be non-hot, then the temperature of outer can be recorded together with the temperature of other variables/parameters, rather than on thisV.

This does not sound like a good idea from the engineering point of
view. The are many equivalent ways to view things abstractly, but
some representations are more natural and simpler than others. To the
extreme, we can also eliminate the concept of object in OOPL
semantics with environment and closures, but there is no real
engineering benefits in doing so.

The fact that we associate outers with object in concrete semantics
shows that it's a natural and convenient representation.

On the other hand, a design consideration of ADI is to be as close as
possible to the concrete semantics in the design. Doing things
otherwise will be inconsistent and reaps no engineering benefit.

I find Warm a bit confusing because it is not used in general for warm values (all fields assigned with not necessarily hot values) but only for the special case of a ThisRef whose outer is non-hot.

This is because currently we only support warm objects of inner
classes. We can easily support leaking non-hot values to constructors,
thus have Warm represent other values as well. Warm is just a key
to the abstraction summary of similarly initialized concrete objects.

After this, one possible further precision improvement could be to extend the Value lattice with something between Warm and Cold, an object with some fields assigned and others not, much like Objekt, but as a Value, and thus with immutable fields.

It's tempting to allow such use cases. However, we need to be cautious
here, because what the analysis can do does not imply it should
do
. There are two concerns:

  • Whether allowing subtle interference of two objects is a good practice
  • It has a big impact on worse-case performance (care also needs to be taken for termination)

Coming back to the PR itself, this PR is intended to achieve the following
engineering benefits:

  • Enable caching for outers of warm objects and simplify the logic
  • Uniformly handle Warm and ThisRef in semantics
  • Make it easier to handle warm objects where the class parameters can be non-hot
  • Use opaque types for some abstract domains

I believe the current PR is the simplest from the perspective of
engineering and it paves the way for future extension. It benefits a
lot from our discussions --- see how far we have been from the
type-and-effect system. That said, I'm always open to better
proposals that enjoy the same engineering qualities.

case class Objekt(klass: ClassSymbol, val fields: mutable.Map[Symbol, Value]) {
val promotedValues = mutable.Set.empty[Value]
}
case class Objekt(klass: ClassSymbol, fields: mutable.Map[Symbol, Value], outers: mutable.Map[ClassSymbol, Value])

/** Abstract heap stores abstract objects
*
* As in the OOPSLA paper, the abstract heap is monotonistic.
*
* This is only one object we need to care about, hence it's just `Objekt`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line of the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, now it's removed

if target.isOneOf(Flags.Method | Flags.Lazy) then
if target.hasSource then
val cls = target.owner.enclosingClass.asClass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed from target.owner.asClass to target.owner.enclosingClass.asClass. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intentional -- here we reuse the code for calling local methods.

The related code change can be found below for case id: Ident =>.

val value = Warm(klass, outer)
if !heap.contains(value) then
val obj = Objekt(klass, fields = mutable.Map.empty, outers = mutable.Map(klass -> outer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this Objekt continue to have empty fields and outers forever since we never call init on the new value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call happens at line 367 -- it's a constructor call on the object.

@liufengyun liufengyun merged commit a00fb3f into scala:master Jun 1, 2021
@liufengyun liufengyun deleted the refactor-heap branch June 1, 2021 18:35
@Kordyjan Kordyjan added this to the 3.0.2 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants