Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link time sets and maps #371

Open
eernstg opened this issue May 23, 2019 · 6 comments
Open

Link time sets and maps #371

eernstg opened this issue May 23, 2019 · 6 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@eernstg
Copy link
Member

eernstg commented May 23, 2019

In response to #369, this issue proposes link time sets and maps as a feature that enables a small amount of pre-main computation beyond what's already possible with const. The point is that this allows a "well-known" declaration D (in some library that is widely used) to offer a registration service, in the sense that any library where D is imported can contribute to the value of D.

In #369, @davidmorgan mentioned some important situations where this is useful:

Some frameworks need a way to run initialization code for all
leaf code using the framework. For example, dependency injection
frameworks want to build a map of injectable types, and serialization
frameworks want to know about all serializable types.

The feature proposed here is built directly on top of the support for constants in Dart, and it will not enable execution of user-written code. However it is sufficiently powerful to satisfy the need for modular expression (for instance, the framework can build a map of all injectable types, even though they are declared in libraries not imported by the framework), and it allows the main program to use the data whenever it wishes to do so (there is no inherent startup cost, because any work associated with the registered entities is performed by explicit code called from main, which can use any algorithm we'd want, e.g., to support laziness).

In order to avoid problems with excessively large programs (in terms of code size or heap size), it is crucial that this mechanism allows for conditional contributions: For instance, the transitive closure of some useful imports may include a very large number of serializable types, but any given program may only actually use a few of them.

This feature relies on tree-shaking to enable inclusion of approximately the smallest possible set of entities. It should be noted that tree-shaking is an implementation dependent concept, but it is required to have a soundness property: No tree-shaking algorithm is allowed to remove any part of the program which is actually used at run-time.

Link Time Sets and Maps, by Example

This feature adds a new kind of top-level declaration that introduces a link time set or map:

Set<T> linkTimeSet of const;

Map<K, V> linkTimeMap of const;

linkTimeSet and linkTimeMap are not constant (so they cannot be used in constant expressions), but they can contain elements (linkTimeSet) and key/value bindings (linkTimeMap) which are specified elsewhere, in so-called contributing declarations (described below). A link time set or map is immutable.

A contributing declaration for a link time set or map is a declaration that contributes a single element to the set, respectively a single binding for the map:

const linkTimeSet.add(c);
const linkTimeMap[c1] = c2;

It is an error unless c, c1 and c2 are constant expressions. These declarations can only have the form shown above, so any attempt to write general code to manipulate the set/map is an error—this feature only supports building the set/map one element/binding at a time.

The usual static errors apply, as if the contributing declaration had been a statement in the same binding environment. For instance, it is a compile-time error unless c is assignable to T.

A conditional contributing declaration includes a condition which lists some entities (such as classes or functions):

const linkTimeSet.add(c) if E1, ..., Ek;
const linkTimeMap[c1] = c2 if E1, ..., Ek;

Again, conditional contributing declarations can only have the forms shown above (we can only use add and []=), the expressions c, c1, and c2 must be potentially constant, and the usual static errors apply just like they do in the unconditional case. E1 .. Ek are constant expressions denoting types or functions.

The value of linkTimeSet is an immutable set of type Set<T> containing the elements which are added using contributing declarations. Some contributing declarations may add the same element (which is not an error).

The value of linkTimeMap is an immutable map of type Map<K, V> containing the bindings which are added by contributing declarations. It is a link time error if two contributed bindings have the same key but different values. It is not an error if two contributed bindings have the same keys and the same values. Again, no guarantees are given for the iteration order.

(A link time error is a new concept in the specification of Dart. It just needs to occur before the execution of main starts, but specific tool chains may offer a specific link operation where the error can occur. In any case, the error will be detected if the resulting program is executed at all, not just if the program gets some specific input.)

The process whereby link time sets and maps receive elements respectively bindings runs in phases. In the first phase, all conditional contributing declarations are ignored. At that point a tree-shaking procedure may run, which marks a subset of the entities in the program as unreachable. Now each conditional contributing declaration is visited, and if all of E1 .. Ek are marked as unreachable then it is still ignored, otherwise the requested contribution takes place, which may cause some entities that were previously marked as unreachable to be included in the program. This step is repeated until it has no effect.

Note that there are no constraints on the relationship between the condition and the object/binding which is contributed. This means that it is possible, for instance, to specify that a deserialization helper function deserializeC can be found under the key "C" in the map deserializers if the program uses the class C as follows:

// In some core library of the serialization package.
Map<String, Object Function(String)> deserializers of const;

// In some third party library.
class C {...}
C deserializeC(String s) {...}
const deserializers["C"] = deserializeC if C;

Different applications may need to use different techniques in order to enable the inclusion of deserializeC exactly if that is needed. For instance, it could be the case that the program simply uses C in that case, by creating new instances of C explicitly, or by using C as a type annotation. It could also be necessary to force usage of certain classes like C, say, because some supertypes of C are used explicitly but C itself is one of a number of implementation classes, and we only know which implementation classes are needed because that's a business level "contract" with some cloud services that this application is using. In that case we may need to mention the required classes in main:

main() {
  used(C);
}

A little bit of careful coding is needed whenever there is a need to interact with implementation dependent mechanisms like tree-shaking, but the point is that used should be written in such a way that no optimizations can eliminate the dependency on C.

We would need to standardize how to do such things, but surely it will be possible to achieve the desired effect: That C is included after all optimizations are complete.

Initial Feature Specification Proposal

Syntax

The grammar is modified as follows:

<topLevelDefinition> ::= // Add two new alternatives.
    ... |
    'const' <linkTimeCollectionContribution> ';' |
    <linkTimeCollectionDeclaration> ';'

<linkTimeCollectionContribution> ::= // New rule.
    (<typeIdentifier> '.')? <identifier> <linkTimeCollectionAction>
    ('if' <expressionList>)?

<linkTimeCollectionAction> ::= // New rule.
    '.' <identifier> '(' <expression> ')' |
    '[' <expression> ']' '=' <expression>

<linkTimeCollectionDeclaration> ::= // New rule.
    <identifier> <typeArguments> <identifier> 'of' 'const'

In order to experiment with the new syntax and see the complete grammar, please consult this CL.

Note that the type arguments are not optional. This is an opinionated choice, based on the assumption that a semi-dynamically typed link time collections should be avoided, or at least explicitly typed as Set<dynamic> or Map<dynamic, dynamic>. Similarly there is no way to omit the type entirely.

Static Analysis

We introduce the implementation dependent notion of program inclusion.

This is usually associated with an algorithm known as tree-shaking. In general, program inclusion proceeds by marking a program element E as included if some potential execution starting from main may depend on E, e.g., by executing it if E is an expression or statement, or by referring to it if E is a declaration. At the end of a fixpoint iteration where no more entities are included, all the entities which were not included may be removed from the program before deployment. Any size of program element may be eligible for inclusion, e.g., a single subexpression or an entire class, but this feature only relies on the inclusion of a type or a function as a whole, and it does not matter whether parts of said type or function have been eliminated.

An implementation is required to use a sound algorithm to compute which entities are included in a program, but it is allowed to eliminate any program element which is guaranteed to be unable to observably influence the execution.

Events like 'out of memory' may be influenced by tree-shaking. But they are also implementation dependent, and hence they are not considered observable.

It is a bug if any program element is actually used, but it was eliminated by tree-shaking. On the other hand, it is allowed for an implementation to use the trivial algorithm which does not eliminate any program elements at all. This means that different implementations may offer a different quality of service in this respect, but given that each deployed program was actually produced by a specific tool chain, and tree-shaking is completed before deployment, it is always possible to assess the actual quality of tree-shaking.

Consider a <linkTimeCollectionDeclaration> D of the form id<T1, .. Tk> c of const. It is a compile-time error unless each of T1 .. Tk is a constant type expression. It is a compile-time error unless one of the following holds:

  • id is Set and k is 1.
  • id is Map and k is 2.

The effect of D is to introduce the identifier c into the library scope of the enclosing library, with the declared type id<T1, .. Tk>.

Consider a top level definition derived from 'const' <linkTimeCollectionContribution> ';' D of the form const q.id(e);. It is a compile-time error unless id is add. It is a compile-time error unless e is a constant expression. It is a compile-time error unless q denotes a link-time collection declaration whose type is of the form Set<T>, where Set denotes the built-in set type. It is a compile-time error unless the type of e is a subtype of T.

Consider the case where D is of the form const q.id(e) if E1, ..., Ek;. In this case exactly the same compile-time errors exist for id, q, and for the type of e as for the form with no if. Moreover, it is a compile-time error unless e is a potentially constant expression, and it is a compile-time error unless each Ej for j in 1..k is a constant expression denoting a type or a function.

Consider a top level definition derived from 'const' <linkTimeCollectionContribution> ';' D of the form const q[e1] = e2;. It is a compile-time error unless e1 and e2 are constant expressions. It is a compile-time error unless q denotes a link-time collection declaration whose type is of the form Map<K, V>, where Map denotes the built-in map type. It is a compile-time error unless the type of the value of e1 is a subtype of K and the type of the value of e2 is a subtype of V.

Consider the case where D is of the form const q[e1] = e1 if E1, ..., Ek;. In this case exactly the same compile-time errors exist for q, and for the types of e1 and e2 as for the form with no if. Moreover, it is a compile-time error unless e1 and e2 are potentially constant expressions, and it is a compile-time error unless each Ej for j in 1..k is a constant expression denoting a type or a function.

It is an error that must be raised before the execution of main (possibly at compile time or link time, if that concept is relevant for a given a tool) if two link time map contributions bind the value k to two different values v1 and v2.

We say that a contributing declaration is conditional when it includes the part starting with if. Other contributing declarations are said to be unconditional.

Consider the case where a script S is given (that is, a library which declares a main function and defines which libraries are included in a complete program based on the transitive closure of its imports). Assume that program inclusion has been computed for a version of the program where every conditional contribution declaration is ignored (as if it had been commented out).

The effect of conditional contribution declarations is then specified by a repeated application of the following step, until it has no effect:

Conditional contribution inclusion step: Consider a conditional contribution declaration D whose condition is of the form if E1, ..., Ek. If program inclusion has marked any of E1, ..., Ek as included then the unconditional contributing declaration corresponding to D is added to the program at the same location as D, and D is removed; otherwise D is still ignored. Next, the program inclusion algorithm is repeated (such that all entities which are potentially reachable starting from the newly added unconditional contribution declarations are also included).

When the iteration is complete, any remaining conditional contributing declarations are removed from the program.

Note that the addition of an unconditional contributing declaration to the program may cause an error, because certain expressions must now be constant rather than just potentially constant. Also note that every object contained by a link-time collection is constant, but the name of a link-time collection is not a constant expression. It follows that they are not canonicalized.

Dynamic Semantics

Note that there is no dynamic semantics for conditional contributing declarations, because they have all been transformed into unconditional ones or removed from the program.

Let s be the name of a link-time collection declaration of type Set<T>. Evaluation of s at run time yields an instance of a subtype of Set<T> which is not a subtype of Set<S> for any S unless T <: S. That instance is an immutable set. (Hence, any attempt to modify it at run-time causes a dynamic error.)

The elements contained in s are exactly the objects that are mentioned in contributing declarations for s of the form const s.add(c) or const p.s.add(c), in any library which is transitively imported by the entry point (which is the "main" library of the program). No guarantees are given with respect to the iteration order of s.

Let m be the name of a link-time collection declaration of type Map<K, V>. Evaluation of m at run time yields an instance of a subtype of Map<K, V> which is not a subtype of Map<K1, V1> for any K1, V1, unless K <: K1 and V <: V1. That instance is an immutable map. (Hence, any attempt to modify it at run-time causes a dynamic error.)

The map elements contained in m are exactly the ones that hold a key k and a value v which occur in contributing declarations for m of the form const s[k] = v or const p.s[k] = v, in any library which is transitively imported by the entry point. No guarantees are given with respect to the iteration order of m.

Updates

Jun 7th 2019: Added support for conditions, as a way to provide language level access to tree-shaking.

@amirh
Copy link

amirh commented Sep 27, 2019

Link time sets/maps may also simplify the way federated Flutter plugins work (https://flutter.dev/go/federated-plugins), as we should be able to gather the list of active platform implementations with minimal and cleaner code generation.

@eernstg
Copy link
Member Author

eernstg commented Oct 2, 2019

@amirh, I looked at https://flutter.dev/go/federated-plugins, and there are some elements that could be expressed in a more convenient and flexible manner using link-time collections.

Interestingly, it doesn't even need to use access to tree-shaking information, so in that sense it's purely a matter of using the modularity properties of link-time collections.

Some elements from federated-plugins

One of the elements of the proposal is that code like the following will be generated based on a given application:

import 'package:gtk_webview_flutter/gtk_webview_flutter.dart'
import 'package:gtk_path_provider/gtk_path_provider.dart'

class GeneratedPluginRegistrant {
  static void registerPlugins(PluginRegistrar registrar) {
    GtkWebViewFlutter.registerPlugin(registrar);
    GtkPathProvider.registerPlugin(registrar);
  }
}

The static method registerPlugins call methods like this one (which would be declared in platform specific plugins):

class GtkWebViewFlutter implements PluginRegistrar {
  @override
  static void registerPlugin(PluginRegistrar registrar) {
    WebView.platform = GtkWebViewPlatform();
  }
}

In other words, this setup makes it possible to call a number of registerPlugin methods, and each of them will initialize a generally accessible static variable (SomePluginClass.platform), such that anyone who has access to WebView can access the platform specific object for that plugin, and similarly for all the other plugins. It's likely that this platform specific object shouldn't be used directly, but that's just a matter of using standard OO encapsulation and abstraction, the important part is that we have the platform specific object in the first place.

Expressing a similar setup with link-time collections

If we use a link-time collection to create a similar setup there is no need to set up methods to perform this registration work: We just make each platform specific plugin class register itself.

First, we have the 'webview_flutter.dart' library where the platform specific object is made available:

// FILE 'webview_flutter/lib/webview_flutter.dart'.

Map<Null, WebViewPlatform> webviewPlatform of const;

class WebView extends StatefulWidget {
  ...
  static WebViewPlatform _platform = webviewPlatform[null]!;
}

We get the platform specific object from webviewPlatform which is a link-time map.

We use a map from Null because the type Null is the singleton domain, and in this case we wish to specify a unique object. In other words, webviewPlatform is like a set whose size is guaranteed to be zero or one. We could use other types (and drop the built-in zero-or-one guarantee), but the point is that we want to get an error at link time if this unique object is ambiguous (that is: more than one entity wants to define it).

Next, we need to populate webviewPlatform:

// FILE: The one denoted 'gtk_webview_flutter' in the dependency graph.
import 'package:webview_flutter/webview_flutter.dart';

const webviewPlatform[null] = const GtkWebViewPlatform();

// Note that this class could be private, if it does not contain platform specific extras.
class GtkWebViewPlatform implements WebViewPlatform {
  ...
}

If GtkWebViewPlatform cannot be const then we define a factory function and put that one into webviewPlatform, and then we call it in the definition of _platform:

  static WebViewPlatform _platform = webviewPlatform[null]!();

Every library that provides a plugin implementation for the given plugin would have such a contributing declaration that puts the platform object into webviewPlatform.

Now we need to ensure that exactly one such library is part of the application: If there are two of them then they will both try to define the unique entry in webviewPlatform, so we get a link time error.

We could have an 'endorsed' approach where someone, possibly the owner of the frontend, takes responsibility for defining which platform implementation is suitable for each platform:

// FILE 'bind_webview_flutter.dart'.

// Assume that every platform will set exactly one of `dart.platform.*`.
import 'empty.dart'
    if (dart.platform.gtk) 'gtk_webview_flutter.dart'
    if (dart.platform.macos) 'macos_webview_flutter.dart'
    if (dart.platform.windowsPhone) 'windowsPhone_webview_flutter.dart' show nothing;
export 'package:package:webview_flutter/webview_flutter.dart';

The application developer would then import 'bind_webview_flutter.dart' rather than 'webview_flutter.dart' (and get the same imported name space), and exactly one of the platform implementations would be part of the application, and that one would set up webviewPlatform to deliver the corresponding plugin platform object.

Note that the configuration specific import does not import anything (show nothing really means "show nothing" as long as we don't have anything called nothing), because each of these platform implementation libraries will do their own registration with webviewPlatform. We just need to have one of them in the program, no matter how that happens. 'empty.dart' is just an empty file.

Alternatively, an "unendorsed" approach could be achieved by importing 'webview_flutter.dart' directly into the application (rather than 'bind_webview_flutter.dart'), and importing any desired set of platform implementation libraries. The application developers would then be able to control exactly which platform implementations they want for the given plugin. Again, all they need is the import because the chosen library will self-register.

As a special case, consider an application A that covers k platforms, and the developers of A know about a plugin implementation for a additional platform and wish to support that one as well: They would simply have their own import of the additional platform implementation, which will be added to the k platforms that are supported already (the endorsed ones):

// FILE: Any library of _A_, could be the entry point.
import 'package:webview_flutter/bind_webview_flutter.dart'

// This platform is simply added to the ones in 'bind_webview_flutter.dart'.
import 'empty.dart'
    if (dart.platform.newThing) 'newThing_webview_flutter.dart' show nothing;
...
main() {...}

The code in 'newThing_webview_flutter.dart' will self-register, just like the other platform implementation libraries; it just needs to import 'webview_flutter.dart' in order to have access to webviewPlatform, and it doesn't matter where in the application 'newThing_webview_flutter.dart' is imported.

On the platform 'newThing' the endorsed list does not provide an implementation (so the configurable import just imports 'empty.dart' in 'bind_webview_flutter.dart'), but the additional configurable import in "main" will provide the 'newThing' platform implementation when compiled for the 'newThing' platform.

As mentioned, we'll get a link-time error if the imports are such that there are two implementations that both want to control which unique object webviewPlatform holds.

On an unsupported platform, that is, when there is no match in any of the configurable imports that contribute to the implementations of this plugin, webviewPlatform will not have any entries. This is not so easy to turn into a compile-time error (using link-time collections), but it is platform specific so it would be detected for any and every execution of the application on that platform, so it's nearly as good as a compile-time error. The error will be that the non-null assertion on webviewPlatform[null]! fails, which means that the this platform is not supported. If WebView has any initialization actions it could check whether webviewPlatform[null] is non-null, such that it would be detected at startup.

With respect to the requirements:

  • Using a language mechanism (and avoiding the need for generated code) should certainly preserve the property that there is no extra installation step for app developers.
  • The use of configurable imports gives rise to a static dependency graph for each platform; the graphs do not differ in any other way than by importing a different platform implementation library; they also ensure that only the library which fits the given platform is included in the program (plus, of course, whatever each platform implementation library needs to import itself—but we don't include any of the libraries which are designed for other platforms).
  • Consent is not needed for extending a plugin to cover a new platform implementation, as illustrated in the last example.

I think that all the concepts around packages (that one package can implement another one, etc) and the special treatment of implementation/"frontend" packages at pub.dev is an orthogonal topic: Those things could be done independently of whether the core language mechanisms are link-time collections or a bit of generated code, and it sounds like they'd be useful in any case.

But I certainly think that link-time collections will give a considerable amount of flexibility, and that it will take away the reasons for generated code that I have noticed for federated plugins.

The whole thing can be programmed in much more sophisticated ways, of course. For instance, it is not so hard to come up with an approach where the standard set up has one plugin platform implementation per platform as a default, but if the application developers make the choice to import a library on a specific platform then it gets to define webviewPlatform on that platform, because the "default" one gets disabled. So that's just another example of the added flexibility.

WDYT?

@goderbauer
Copy link

Link-time maps could also help us in Flutter to make anonymous routes restorable. I wrote http://flutter.dev/go/restoring-anonymous-routes to explain the problem. The document outlines a solution using Ephemerons, but those are also currently not available in Dart. We could potentially use link-time maps to solve the problem, though. I do have some questions about the proposal to evaluate if it would work in our context:

  • Can regular Strings or integers be used as keys in the link-time map? I am asking because some other "magically" concepts like this in Dart don't allow those primitive types (e.g. Expando). I believe this is because of limitations in JavaScript.
  • To look up a value in the map during runtime, do I need to provide a const object as key? Or would it be okay to e.g. read a string from disk and use that string as a key to retrieve a value from the link-time map?
  • During runtime can I iterate over the keys or values that end up in the map? This is not a strict requirement to make route restoration work, I am just curious.

@eernstg
Copy link
Member Author

eernstg commented Mar 24, 2020

A link-time collection would be similar to a const top-level declaration (it isn't constant, because there should not be any dependencies on a link-time collection from a constant expression, but it is very similar).

So it can't be modified, but a regular String or int, obtained by evaluating an expression at run-time or a constant expression (that doesn't matter), can certainly be used for lookups in a link-time map.

I did not intend to restrict the ability to iterate over the keys or values of a link-time map at run time, and I cannot see any strong motivation for having such a restriction. However, I have heard arguments to the effect that it is too confusing if the result of tree-shaking is directly observable at run time. Moreover, a restriction on the ability to iterate over a link-time collection would be even more limiting in the case of a Set. So I think it's possible and desirable to avoid such a restriction.

@goderbauer
Copy link

Thanks for the details, @eernstg. Sounds like link-time maps with conditional contributing declaration would work for Flutter's use case. Alternatively, we could also solve Flutter's use case with Ephemerons as described in http://flutter.dev/go/restoring-anonymous-routes. I posted a proposal to add those to Dart here: dart-lang/sdk#41198. However, we would only need one of the mechanisms: Either link-time maps as described here or Ephemerons would do the job.

@leafpetersen
Copy link
Member

However, I have heard arguments to the effect that it is too confusing if the result of tree-shaking is directly observable at run time

Note that if you can use computed keys (e.g. Strings computed at runtime) as indexes into the map, then you can still directly observe whether tree-shaking has occurred or not, regardless of whether you can iterate the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

4 participants