-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to gather static initializers #369
Comments
The implications on treeshaking and modular compilation should be carefully evaluated here. For treeshaking, any app using these methods would have to retain all annotated consts, regardless of if they are ever actually used. For modular compilation, any library using these methods now has a dependency on effectively all transitive libraries (any library could add a new const at any time). |
Angular was originally built this way, and it was a huge [expletive] mistake. I understand it "feels" nice to be able to do aspect-oriented programming like this, but it does not pay off when you care about code-size and initial startup, as Dart's users do. |
I've written an outline of a feature that I've put together, #371, based on the discussions that we had at the summit, and input from several others. It uses a new kind of declarations There is no control other than "exists in the program". So if you import L1 anywhere and that library adds something to It's up to some code in #371 covers link time maps as well. I suspect that it wouldn't be much harder to support than link time sets, and this might be a useful basic "mechanism" in the above sense. |
I agree that it would be breaking modularity, and I in general feel iffy about the fact that the presence of an 'import' can change the semantics of a program. Wrt. modularity: One solution for DDC would be to generate a static initializer for each library that registers global consts and is run by before main at program startup time (I suppose it already does something like this for constant canonicalization). The advantage of this compared to general static initialization is that const evaluation has no side effects.
|
Thanks Erik! That looks like a good fit. Re: tree shaking, I expect that frameworks will provide custom tree shaking code. This is already being looked at for protos. My intuition is that a whole program kernel transform is the correct place to do this work. Re: modular compile performance, these collections should only be accessed in framework initialization code, which should have a 1:1 correspondence to app entrypoints. Compilation performance should be better than what we have today--because today we are implementing something like this on an ad hoc basis. The compiler should be able to do it more quickly since it already does this work for const canonicalization. Re: angular. Code size should be addressed by custom tree shaking; startup time needs consideration. If the initializer list is "flat" then startup time will necessarily be proportional to the list size. But the proposal allows for a map; if this is set up so initialization can be done lazily based on the key, then I think startup time should be okay. I think a good next step for the proposal would be for CFE folks to check if my hand waving about "custom tree shaking code" is at all feasible :) ... I'm not exactly sure what form this would take, but protos are a good example. e.g. perhaps the app needs to supply some additional data at compile time to help the tree shaker figure out what to keep; it could for example say "these packages are the feature code; tree shake const initialization exactly for protos not mentioned here". |
@matanlurey wrote, and @sigurdm commented on:
I'd also like to hear more about what went wrong. Note that the mechanism in #371 will not give rise to additional space for constant objects (every object which is added to a link time set/map is obtained from a constant expression, and presumably you'd need that constant anyway), so the only added space consumption is for some sets/maps to hold those values. Also no computation takes place before We could store anything in those sets/maps as long as it is a constant, but I'd expect things like functions (that some initialization code in the framework would call, when appropriate) or objects that could be understood as commands (so maybe there's a list of k values, a list of k types, and a function, and then the initialization code should call that function as with those arguments, that is, Was there something in the original approach taken by Angular which would still be inherently difficult to handle? |
re: tree-shaking Meaning that it would not count towards keeping objects alive, and would return |
@sigurdm wrote:
In this context I envision a typical situation where the link-time sets and maps are rather small, because they would typically contain a bunch of functions to run, or a bunch of "command" objects (again specifying some computation). Running those computations might be very expensive, but it should be possible to partition the tasks into manageable chunks and running them on-demand. Couldn't this enable the same kind of resource usage improvements as a tree-shaking feature? We wouldn't run a command before we need the data that it produces, so if it's never needed we've "tree-shaken" it away. Of course, it might be a complex task to partition the "start-up" computations and perform them lazily at the right point in time, but it would be done in regular Dart code so we can use whatever techniques we can think of. |
This assumption is wrong, and it is where the Angular approach fell apart. The issue was that a library would statically declare 10x Note that there was a similar attempt in the past by @sigmundch (but never got adopted). The idea was to provide a new kind of map, |
This should straight up and simply not be done. It is a mistake. Interestingly enough, I don't see much of a problem statement here, only a solution. We've talked to lots of teams both externally and internally before who thought they wanted this, only to be (easily) talked out of it when the costs and benefits were weighed out. (While it seems simple enough at constants, ultimately teams will want to register non-const objects and functions, and then all assumptions are completely out of the window as you have accidentally implemented mirrors by another name)
This is a terrible expectation. Writing a generator is tough enough, expecting teams and frameworks to write global analysis steps is, full-stop, a non-starter. The only reason it works for protos is because of the small scope of the project (only for the VM, and written/maintained by the Kernel/VM team).
Sorry, this is just a really bad assumption and idea. I don't have a better way of stating it. We spent literally 5 years thinking of different ways to mitigate the cost of collecting metadata and decided the only way forward was to deprecate it (
There is a bit of history in these docs:
I'd be happy to GVC/meet in person to go over details. |
This discussion originated with a very concrete problem: Decoding protobuf messages with extensions, where the decoder doesn't know about which extensions are available. See b/131893790 for further details.
That depends, on the project, the team and the benefit it brings. If there is a huge savings in size or increase in performance, a "full-stop, non-starter" becomes "the right thing to do". Given that Angular seems to have a similar problem: What solution do you currently use? What solution do you recommend? |
IMO, this is a flaw in how the protobuf library was originally developed, not in Dart. For example, @yjbanov's original gAPI-compatible "Streamy" library (now deprecated) never had this problem, if you didn't use certain entity objects (read: protos) or fields, the information about decoding/encoding them was tree-shaken. This was back in Dart 1.x to boot. (As far as I can tell, the JSPB/JS-Closure solution internally also supports tree-shaking w/o hacks)
A combination of things:
As how this relates to protocol buffers, I am not sure there is an easy fix. I know @ferhatb and others have talked about having a "Nano Proto" or "Proto lite" library similar to what was developed for Java internally, and I'm a huge proponent of this idea. |
I think it's worth exploring whether there is a protobuf specific design for the problem of extensions, first. For example, as @jakemac53 suggested elsewhere, could decoding be more lazy when it encounters extensions? Indeed, if we do end up considering static initializers, we need to be extremely cautious about tree-shaking. I also want to share a couple thoughts from previous approaches:
|
A very raw idea but a similar UX can be achieved without introducing reflection-like features by making build/codegen more tightly integrated with the SDK. Say, running I have no idea about SDK-level implications of this, but the goal in this case is to make codegen seamless for the user so that it becomes a no-issue. |
Thanks @sigmundch for the background there! @pulyaevskiy: This isn't related to code generation being idiomatic or not. |
@matanlurey I'm sure you have more context on this subject but wondering if you could elaborate on why you think it's different? Just to quote problem description from the original post:
The above tells me that the main reason for this requests is that there is a required manual action needed for frameworks to compose their registries and we'd like to automate it. In this case by introducing reflection. I just wanted to point out that if we can automate this manual action then it'd become a non-issue. Though I'd defer to @davidmorgan to either confirm or deny my assumption, since he's driving this. |
I wrote, and @yjbanov responded:
That's obviously a very important experience. So that assumption was wrong for Angular, and it sounds like the situation ("the program imports many classes of this kind, will use a small subset of them") could easily occur in other large systems as well, especially in situations where those link time collections contain So there is no danger in using a link time collection to hold a bunch of objects with no/few dependencies, but it does not scale up when they have any "potentially shakeable" dependencies.
If that idea does actually work (even though it wasn't adopted previously), and if it can be made sound ("couldn't possibly" must be true) then it could directly be used for link time maps. But do we then conclude that a mechanism like link time collections is never useful in practice? Or do we conclude that it doesn't scale in some situations, but it is actually useful in other situations, and we can develop a sufficiently effective culture of how to use it? @matanlurey wrote:
There's a clear message! ;-) And, again, it's obvious that there's a lot of experience backing up this conclusion. Matan, trying to get at your core reason for this message, I noted the following:
My interpretation is that your main concern is similar to Yegor's: You're convinced that this mechanism does not scale up, because it will inevitably cause programs to hold on to a large number of unused entities.
Thanks, that was helpful! About how Angular currently handles this issue:
For the case where a Does this imply that your advice would be to always use code generation in the situations where the naive developer would otherwise be tempted to use a link time collection? ;-) @sigmundch wrote:
Sounds like the use of that package could very easily cause the same kind of tree-shaking problems that we have on the table here already. But if it were ever appropriate to use that package then I would assume that link time collections could be used to achieve the same result (by collecting a set of initializing functions into a link time set; the only manual step would then be to add @pulyaevskiy wrote:
With something like link time collections there would still be a need for a manual action: So the point would be that there is no need for the developer who writes an application to know which ones of all the libraries that are imported (directly or indirectly) have which "things" that need to be gathered: That's something that the relevant library/package developers will register (e.g, using something like In that sense it's more about reducing the amount of "administration" that the application developer is required to perform (that is: it's more about improving the encapsulation of libraries-that-register-something) than it is about avoiding a tiny manual step (which is simpler, because it's per framework, not per library). But if we have a language construct (like We could make this mechanism dependent on the outcome of the overall tree-shaking process, using something like the following: // Specify the classes/functions that will enable the addition of `c`.
const myLinkTimeSet.add(c) if MyClass, myFunction; The effect would then be that This implies that tree-shaking would be extended from the current approach to iteratively add each contribution to a link time collection, but that algorithm already needs to propagate dependencies transitively. In any case, the choice of dependencies is sound in the following sense: If So maybe that would allow us to avoid some custom tree-shaking, and still have our cake? ;-) |
I would not conclude that based on my experience. Since we didn't get a chance to put |
@yjbanov wrote:
I discussed // Using LookupMap.
const map1 = LookupMap([A, 1], [B, 2], [C, 3]);
// Using a link time map.
Map<Type, int> map2 of const;
const map2[A] = 1 if A;
const map2[B] = 2 if B;
const map2[C] = 3 if C; The point is that the declaration of The tree shaking properties could be the same: With The
|
Thanks everyone. Dependency management is hard. Automatically pulling in dependencies errs on the side of correctness, at the risk of bloated code. Manually maintained dependencies can be wrong in both ways: there can be unneeded or missing dependencies. I've also seen automatic dependencies with manual blacklisting for codesize. FWIW, My guess is that the best compromise between performance and usability is to end up with a checked in list of dependencies which is automatically updated. e.g. you add a new serializable type and your presubmit says "add it to the list of serializable types (y/n)?". The configuration also needs to blacklist things you don't want to be serializable (without referring to them!), for the benefit of the presubmit. We actually could do this with the language feature, by forcing you to say which code you're keeping and which code you're dropping in order to gain access to the link time set/map. Kind of a weird language feature :)
|
I think we could use it (or something similar) for protobuf extensions, they are represented as a kind of descriptor-objects: as here: You acces the imports extension field of a message using If that object is never referenced, we can assume that extension is not used. |
@davidmorgan wrote:
If that's desired then we could use the conditional contributions that I mentioned here to express that. // Make it an error to include the types `Foo` and `Bar`.
Set<int> _excludedTypes of const;
const _excludedTypes.add(1 ~/ 0) if Foo, Bar; The basic idea is that the added expression must be potentially constant in all cases, but it will not be evaluated if Of course, we could declare some Inclusion of certain types in the program (based on tree shaking) could be achieved by any usage of the desired types which doesn't get optimized away. main() {
// Force inclusion of the following types in the program.
var useTypes = [foo.Foo, foo.Bar];
...
} Again, we'd probably need a twisted statement or two to make sure that Forced inclusion may or may not be needed: For instance, if a particular program is capable of creating and serializing instances of a set of classes M then tree-shaking will yield a program that includes those classes (because code creating an instance Note that we wouldn't need to worry about how an included type will be used, because (1) when |
Thanks Erik. I think what you are describing fits the I'm not sure if there is a comparable case for protos; extensions are different and I don't know how they're used in practice / how they relate to client code: https://developers.google.com/protocol-buffers/docs/proto#extensions Getting back to the proposal--I think there are three types of code change that we'd like to be able to 'catch', and by that I mean, highlight to the person making the change and the person reviewing it. Forcing a change to where the link time map is accessed, explicitly listing/rejecting types, would be one way to do this.
|
@davidmorgan wrote:
If initializable types are written in such a way that each of them, if not tree-shaken away, will add itself to a specific link-time collection main() {
// If this fails, check for unwanted elements in `initializables`.
assert(initializables.size == 31);
...
} More precision could be achieved by comparing the set (or the keys if it's a map) to a manually written set of expected classes, or check that
For soundness it would not be possible to omit something which is considered reachable by the tree-shaking algorithm. So this sounds like some author of the code of I believe this could be addressed using the same kind of techniques as those for bullet 1. So when the use of |
Good point, we can write a test that sits next to But I expect we could come up with a working/workable pattern to recommend. For me this is a good approach to the problem: collect the configuration for the whole app in one place, and make whatever asserts on it you like, in a test. It's true that if you don't follow best practices and have such a test, your app risks ending up bloated; but I think that's true of all answers we have for this problem, including fully manual configuration. I'm curious as to what others on the thread think. |
FYI: I just added the ability to rely on tree-shaking to the proposal in #371. |
If someone could summarize the difficulties with adding an initialization mechanism it would be appreciated (discussion is very long). The following is what I have wished more than one time: default A.initialize();
class A {
static initialize() {}
} Essentially I would expect dart to invoke all packages defaults before running the main. Is there any important consideration why something like that cannot be included? Since there is “lazy loading”, this could be think of this like “proactive loading”. Alternative wording could be “expedite” or “proactive”. Currently I’m only able to achieve initialization through a “unhygienic” hack. |
The issue which is usually causing problems with the model where each module (whatever notion of module a language might have) has one or more initialization code blocks that are executed implicitly before |
Thanks, I understand better now. Short answerResolve defaults in the same way Python resolves module imports, but through static analysis (even better). If there are default method invocations circularity, it is forbidden. The orders are resolved by the library import order, which is not ideal (because import order matters), but it is a simple solution and I’m pretty sure that default methods writing the same variables would be fairly uncommon. Being practical, keep in mind Python works like this and is one of the most popular languages in the world. It is possible to constraint initialization more by allowing at most one initialization invocation per library. Constraint it even more by allowing using only native var types and only writing uninitialized vars defined within the scope of the library. Think of it like a class constructor where the vars are initialized but for the scope of the library (the vars are the "scope class" members). Whichever mechanism is chosen should be simple and direct, keeping in mind that most use cases will not conflict with most of the problems that arise from initialization which are the edge cases reached by an exploring minority. Long deliberation (initialization trick)The problem is that initialization is currently already possible, it's just that it is not elegant. For example, I do the following: _initialize = () {
// perform initialization here
return null;
}();
// Somewhere else in the code
class A {
final x;
A(): x = _initialize ?? 0;
} I have no idea when exactly _initialize is evaluated, if it is preemptively or lazily evaluated, I just know that it is working for me. _initialize is just included so that it is not branch cut. That kind of initialization does give the compiler a notion of ordering which I presume is the Pythonic one explained above. Given that, I would expect the default invocation to have implicitly the same behavior as it would be to place the above trick over all members in a library (including everything that comes from the library, also types). Like library lib1;
var x = 2;
// Used in class A constructor
_initA = () {
x += 1;
}
// Used in class B constructor
_initB = () {
x *= x;
} When x is accessed, which value will it have 2, 3, 4, 5 or 9? There is no way to intuitively know this (there are different valid intuitions). If class B is accessed first in the code, is _initB executed first? Most developers are probably clueless about how the compiler resolves these cases. Any default execution mechanism is better than the _init trick which is not transparent because the value of x at the very least depends on the result of the branch cut, a default method would always be invoked if any of the vars it writes are not branch cut (the way it should be). Now imagine this: // imports lib1
library lib2;
_initA2 = () {
lib1.x *= 3;
}
Whatever rule currently exists to resolve this should be considered inconsistent, since it’s too complex for a person to resolve. The fact that there is no easy initialization mechanism is preventing this odd behavior from frequently happening in libraries by hiding the problem, but the problem is still there. Python approached this by transparently explaining exactly how the package dependencies resolve, I'm not sure how the cases above resolve in Dart. Personal thoughtsPersonally I do not like at all the fact that import order matters, because it's a hidden effect that cannot be predicted unless internal details of the packages are known, which breaks the package encapsulation ideal and it adds an asterisk to the import statement definition. But that's a wish, being realistic initialization is highly required and the short answer explained above is a decent solution, where the benefits outweigh the inconveniences, unless I’m missing something big. I'm building a library and I do miss an official var initialization mechanism. Package dependency order mattering shouldn't be thought of only as something negative, it also opens the possibility to transparently create "initialization libraries" which tweak values of mainstream libraries. |
Note that your initialization trick doesn't actually work because the closure is evaluated lazily, see https://dartpad.dartlang.org/d036b1c1e58bf0e5172c54ce0cc3d8e5. |
I have been using the trick in the constructor of the key classes that require the initialized variables. It can be done in any part of the code that is reached from the visible API of the library. That explains though how Dart deals with these values, the same rule could apply for any initializer. The initializer is run lazily as soon as the first member or type from a package is referenced. Assume an initializer is implicitly In any case, as of today, with lazy loading (python also offers lazy loading of imports) Dart can have initialization circularity errors. It's not different from actual initialization methods producing an error. |
This issue was originally more around the use case highlighted in my dartpad example "frameworks need a way to run initialization code for all leaf code using the framework". If you already know all the variables that you want to be "initialized" then the trick you provided is already available and is the recommended solution. Manipulating global state in an order dependent manner inside of those initialization closures should generally be avoided for the reasons you have highlighted. |
I wanted to open a request for initialization but I found this one, which seems more specific to me, unless I'm understanding something wrong. If there was some kind of initialization, the following could be done: initialize {
registrations = Platform.registeredConsts();
} The initialization could be lazy evaluated as you proved it happens with the closures, but like I said, I think lazy initialization is less consistent. The trick I posted is exactly why I wanted to ask for an official language mechanism to achieve this with readable syntax. I think initialization is a common enough requirement to deserve it's own language mechanism to do it. I prefer the package initialization resolution by import order over the execution dependent lazy evaluation. |
Some frameworks need a way to run initialization code for all leaf code using the framework. For example, dependency injection frameworks want to build a map of injectable types, and serialization frameworks want to know about all serializable types.
Currently frameworks either use codegen or manual initialization to do this.
I discussed with @eernstg and @sigurdm; the idea we got to was that it might be possible to build this on top of existing
const
canonicalization support, giving something that's both cheap to implement and sufficient for the use cases we care about.Leaf code wanting to register itself would declare a
const
in some new special way:When the framework is initialized it can ask for all
@register
edconst
values and iterate over them doing whatever initialization work it wants to.This is sufficient for e.g. all injectable types or serializable types to register themselves. Note that the value registered could be any const, e.g. a function, or an instance of a const class. A possible refinement would be to make it possible to ask for all registered consts of a particular type:
--arguably this removes the need for the
register
annotation since you can request a framework-specific type that is only used for registration.The text was updated successfully, but these errors were encountered: