-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #2850
Comments
To update on the progress of the development of the new evaluator, a critical component of making performance improvements in CUE. The Evaluator Roadmap project is gradually being filled out with issues to communicate our progress in this part of the CUE project. For more information on the approach we are following here, please see #2846. Back in December 2023 on our last community call, we updated to confirm the new evaluator was close to passing all tests except for disjunctions. The changes to support this work (i.e. everything except disjunctions) have since landed in the main CUE repo. We will be submitting the final key change in this series soon. Since December, I have been making good progress with disjunctions. We originally planned to have a simple implementation for disjunctions to speed up the roll out. That turned out to be problematic. The latest implementation (for the new evaluator), although still conceptually simpler than the previous one, took a bit more effort than anticipated. Things are going well now, though. We now plan to submit the changes to support disjunctions in the new evaluator in an incremental fashion. SequencingA bit more about next steps. #2853 is largely done with the exception of reclaiming memory (see #2887) and some esoteric bugs. We first need to land https://review.gerrithub.io/c/cue-lang/cue/+/1174341, the main change implementing the new evaluator. This is being tracked in #2884. Once the biggest bugs with disjunctions are ironed out (which we are doing via tests and code that is part of Unity), we plan to allow users to play with the new evaluator, enabling it via a Note that this initial experimental version of the new evaluator will not include memory reuse or any of the secondary optimizations, such as structure sharing (see #2854). So even though users should see an improvement in the big-O performance of the core evaluator, it might be that we initially see performance get worse in some instances. With such a significant change in implementation, this is somewhat to be expected because the implementations of the evaluator (old/current and new) have very different designs and therefore characteristics. Once support for enabling the We also need to ensure that error messages in the new evaluator are at least on par with the old evaluator. This is tracked in #2890. The new evaluator enables more structured capturing of error situations, and therefore better error reporting. Improving error messages is tracked via #2891. After this is done, the real fun starts. The new evaluator has been designed with several additional performance improvements in mind. Each of those can give substantial constant speedup, or even additional big-O improvements. These can all be rolled out incrementally. We will be creating issues as part of the Evaluator Roadmap for each of these. Updates on main performance sub-issuesThis update being the first of its kind, means it necessarily includes background and context on where we find ourselves today. Future updates (which are aiming to be at least fortnightly) will be shorter. In future updates we will provide shorter bullets under the heading of the sub-issues:
Key open changesHere is a list of the active changes related to the evaluator and performance work, referenced either directly or indirectly above.
|
Development on the new evaluator is progressing steadily. The main focus remains implementing the new evaluator as the basis for performance and other improvements (#2884) and making the new evaluator available via a At the time of our last update, 87% of all evaluator test files were run and free of P0 or P1 bugs, while 19 files were excluded from the tests entirely. Now, 94% of these tests pass, with only 3 files excluded from tests. Moreover, before many files had many errors. Now, the 23 (out of 346) remaining files have at most a few errors remaining. (As a reminder, we categorize bugs/errors according to their severity: P0 being critical, through to P3.) We also added some new tests. Here is a new benchmark that we will add: #def: {
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {c: string}
{} | {a: string}
}
x: #def
x: c: "foo" The old evaluator completed this with this many core operations:
Incrementally adding lines showed that this test displayed exponential behavior. You can collect stats for your own runs using the The new evaluator accomplishes the same result with far fewer operations:
The new evaluator displays linear growth in the number of operations when adding a repeated line. Clearly an improvement. What made this possible is the capability of the new evaluator to much more accurately detect duplicates, even when a value is only partially evaluated. That said, there is still some work to be done. This new detection algorithm needs tuning. Some of the cycle detection algorithms are also not fully functional yet. This means that for some of the benchmarks the new evaluator performs worse, for now. But all the techniques that worked for the old evaluator will work for the new one, in addition to all the improvements made possible by the new implementation. There were also some complications that lead to some tweaking to the disjunction algorithm. It turned out that the implementation would be much more straightforward if we had a partial implementation of structure sharing. We opted for this approach. This meant a slight setback in the projected time for implementing disjunctions in the new evaluator. On a positive note, however, it means we are now well on our way to implementing structure sharing. We will not need to finish this before launching the new evaluator in experimental form, but we will be able to introduce structure sharing sooner than expected. This is great, as there are some known performance issues that cannot be solved without it. We intend to launch the experiment when all known P0 and P1 bugs have been fixed. We anticipate being able to do so before the end of March 2024. And as a reminder, this is before we improve memory management. The new evaluator currently has a high constant overhead. This is known and expected. We do hope, however, that people will see a big reduction in core operations. |
Apologies, this update is arriving a week later than planned. We are getting close to releasing an experimental version of the new evaluator: we are currently at over 95% of the test files passing. This seems like only a small increase since last time, where we were at 94%. There is a lot going on though. The number of skipped files, the most severe of all, went from 13 to 4. Also, the total number of errors in the 13 remaining erroneous files has reduced as well, which is not reflected in this count. Also, the severity of the errors has diminished. Finally, we have discovered several new errors in the process that we previously forgot to count, which slowed down the reduction a bit. In some cases we have also deliberately introduced new minor errors. For instance, we realized that some of the handling of structural cycle detection edge cases, which is not yet implemented for the new evaluator, can be handled much more nicely with structure sharing. We are close to supporting structure sharing. So it seemed more prudent to have a less precise, but simpler handling for now, at the expense of having some spurious structural cycles in the meantime. We can use the saved time to roll out structure sharing sooner. We have also prepared the integration of the new evaluator into the The main focus remains completing implementation of the new evaluator (#2884) and making the new evaluator available via a Unlike what we said last time, the initial experimental version will contain some outstanding P0 and P1 bugs. The outstanding bugs are mainly regarding structural cycle detection, most notably when used for implementing recursion, let processing, and some smaller issues. We realize that those that do not use these constructs could already benefit from the new evaluator. Getting early exposure and feedback will ultimately, however, speed up the rollout. |
Today, we will enable the new evaluator as an experiment for the cue binary as part of v0.9.0-alpha.2. At a later point, we also plan on enabling it in the API. The new evaluator has various known bugs that have not yet been fixed. On the flip side, it has more performance improvements than we initially planned to release. We decided in favor of this tradeoff because, on the one hand, many users will not run into the remaining bugs and could benefit from the performance gains now, while on the other hand, some of the performance improvements will actually help address some of the remaining missing functionality that gives rise to these bugs. Also, by enabling more users to benefit from performance gains sooner, we hope to also get more feedback on the remaining issues. So in spite of the remaining bugs, we suggest you give the new evaluator a whirl. In our experience, it is already significant faster than the old evaluator due to some significant big-O wins. Using the new evaluatorTo enable it on the command line, use the CUE_EXPERIMENT=evalv3 cue export Together with the Known issuesStructural cyclesNot all of the core constructs have been ported from the old evaluator to the new one. A noteworthy one is structural cycles. We have seen that the new algorithm gives rise to a simpler and more effective structural cycle handling. Structure sharing plays a role in that. Instead of porting the rather complex old mechanism to the new evaluator, we have decided to implement an approximation instead for now. Together with structure sharing, this already covers most of the cases. But there are cases where structural cycles are either detected too late or spuriously. Let expressions in comprehensionsThere are known issues with handling of let expressions in comprehensions. Closedness scopeThere are some remaining issues in closedness detection. It is known where these come from, we just did not want to hold up the release on these issues, as we expect they will not affect most users. This issue may manifest itself by disjuncts that are not eliminated that should otherwise be eliminated because of closedness constraints. Error handlingThe new evaluator does not have all the locations relevant for an error as was available in the old evaluator. As soon as we are on feature parity, we plan to give a thorough overhaul on error reporting, fixing those features and others. Performance changesThe new evaluator is significantly faster on many fronts. In addition to a much more capable disjunction elimination algorithm, it has a partial implementaton of structural sharing, which has proven to give some significant wins. GainsBoth the re-implementation of disjunctions as well as structure sharing has shown significant performance improvements. This example is similar to the hypothetical example we showed in the last update. It reflects a real-life version of it.
Here we saw a reduction in the number of operations from
to
An example of the kind of improvements brought by structure sharing is this code:
Where we reduce the number of conjunct operations from Similarly, a reduced version of the Billion Laughs attack, shown here
reduces the number of operations from
to
Here the new evalautor is Finally, the improvements of the new disjunction algorithm can also compound with structure sharing. This is an example where we have recursion combined with disjunctions, a common way to get into serious performance issues with the old evaluator.
The old evaluator clocked in at the following number of operations:
The new evaluator, with its new disjunction algorithm, only needed
A reasonable improvement, especially since it is counts conjuncts more Adding structure sharing in the mix, however, this further reduces to
Now we're talking! Known performance issuesThere are a few cases where it may be slower than the old evaluator:
Unimplemented performance improvementsThe new evaluator does not address all performance issues yet. Note that the old evaluator also does not implement these, so there is no benefit to stick to the old evaluator here. We mainly want to point these out on what is still in store. While by no means being exhaustive, here are some examples:
FeedbackWe would love for you to give it a spin and get some feedback. The sooner we know about issues, the sooner we can fix it. The new evaluator is not only faster, but also quite a bit easier to debug, we have found. So do not hesitate to help us get you up and running faster. |
Since last release, we have fixed quite a few more bugs. Based on popular demand, however, we have started on a little diversion from fixing bugs to work on integrating the new evaluator in the API. API workHooking up the new evaluator to the API may seem like a small step, but is actually quite involved. The evaluator itself is quite low level. The API provides a layer on top that interprets the low-level evaluation results and presents it to the user in a more convenient manner. It is actually good we have it, as it allows us to reimplement an evaluator without users having to change their API usage! There are many subtle and less subtle ways in which the new evaluator changes the representation, for instance:
The API needs to take all these change into account, and now handle both. Luckily the API itself has many tests separately from the core evaluator. We are now applying the main testing framework we developed for the evaluator to track deviations to the API as well, where possible, or coming up with other solutions where this is not the case. You can track the progress at Issue #3060. We have already fixed many of the API issues, and work on this is progressing nicely. It even resulted in some minor bug fixes in the core evaluator itself. Next stepsAfter the biggest bugs have been removed from the API support, we plan to release an option to enable the new evaluator. After this is done, we will resume fixing the outstanding larger bugs in the evaluator. For instance, FeedbackAs usual, we would love to hear your feedback. Please raise issues or get in touch with us via Slack. |
The upcoming v0.9.0-alpha.5 release is another big step forward for the new evaluator. We have fixed some of the major bugs that were outstanding. Based on popular demand, however, our main focus has been to integrate the new evaluator in the Go API. This is not as trivial as it sounds. So far we have rolled out about 40 CLs to accomplish this. The main reason for this is that the underlying data representation of the new evaluator differs from the old evaluator in some important aspects. This is hidden from the user in the API, but the API had to be adjusted to accommodate this. So far we have squashed most of the couple of dozen regressions that originated from this move and have only two relevant regressions left. Some of the remaining issues include:
Things like dependency analysis, How to enable the new evaluatorThe new evaluator can be enabled in the Go API by passing options to
Using There is also an option to pass debug flags. This takes the same form as the flags supported by the environment variables. The most notable option is the one to turn off structure sharing, which is enabled by default. If you encounter a bug that you suspect may be related to structure sharing, it can be turned off using the
Note that for the Core evaluator progressWith respect to progress of the core evaluator itself, there are now no more skipped tests left, and just about a dozen of high-priority regressions remain. Now all tests are running, we can see a 83% reduction in running time of running the same tests for the old evaluator versus the new evaluator. Of course this is just a snapshot, but it gives some indication of what to expect. Of course, the new evaluator will be much faster for some cases, and will be equal in some other cases. Keep in mind that we still have a lot of optimizations in store, some quite large as well. |
Since the last update, we have been working on responding to user bugs as well as tackling the remaining bugs. There have been a couple of weeks worth of work on other things, including a new embed proposal, and also some investigations on other blockers, which, not entirely coincidentally, also impact the new evaluator. We have removed most bugs of the API support for the new evaluator. There are a few left. The biggest omission is CUE trim. The old trim algorithm was fully designed around the old algorithm and its respective data structures. Consequently, as the new evaluator often uses different data structures, trim completely stopped working for the new evaluator. Matthew Sackman is now looking into a redesign of the trim algorithm, addressing a whole slew of bugs, while also making it work for the new evaluator. Another aspect of the new evaluator that has greatly been improved is subsumption. Some of the improvements and bug fixes made it also in the old evaluator. But for the most part, these improvements are only supported for the new evaluator. These improvements will be crucial for both optimization and defining backwards compatibility for CUE schema. Finally, the number of bugs of the core evaluator has been steadily decreasing as well. Moreover, we are cleaning up the algorithm of the structural cycle detection of the new evaluator to handle more cases in a simpler fashion. This was one of the major omissions of the new implementation and appears to be a factor in many of the remaining bugs. We aim for the new evaluator to be usable for most users by the end of August. |
After returning from our summer break, we have picked up work on the evaluator again. Since our last update, most of the new evaluator work has gone in to addressing user-reported bugs. We have fixed a myriad of performance bugs, semantic bugs, and panics. @cuematthew has already been helpful and is continue to ramping up on the internals of the evaluator. Another source of Other work has been focussed on reducing the number of closeContext allocations and getting the closedness algorithm right. We have not yet enabled lifetime management, which means that Another noteworthy development is the development of an internal "TinyCUE". Matthew has been developing a parallel new implementation of CUE, a simpler reference implementation, in order to improve his overall understanding of CUE evaluation. We are investigating to keep maintaining this implementation to insure quality in the future. We were hoping to be able to switch over the default to That said, we have several users increase their use of |
It has ben a while since the last update. We have continued our focus on user-reported bugs. Also the JSON Schema work has exposed more bugs that we have addressed. Since last time, we have closed close to 50 issues related to the new evaluator. Part of this was designing a new and simplified cycle algorithm. The new algorithm is considerably simpler compared to the old. New since last time is the focus on Unity. We are working to remove discrepancies in Unity. So if you want some assurance the new evaluator will work with your code, be sure to add your repo to Unity! There are about 10 main issues left in making Unity compatible. One of them is actually a closeness bug fix that causes too much backwards incompatibility. Consider this example:
The old evaluator would allow this, even though With respect to outstanding bugs, the new evaluator is now almost on par with the old one. We are considering flipping the default as soon as Unity passes and some known remaining performance issues are fixed. With respect to Unity, @cuematthew has completed a first round on the field ordering work. The aim is to make differences between V2 and V3 easier to compare. Aside from the remaining performance issues, we also have made some headway with fine-grained structure sharing. This has potential to greatly improve the performance of |
Performance is a key goal of the CUE project. Broadly, @mpvl envisages being able to achieve performance improvements of several orders of magnitudes (predominantly by means of reductions in the time complexity).
Performance is also a key requirement of developer tooling, for example the CUE LSP.
This umbrella issue exists to summarize the main categories of current performance problems that people run into. It will link to sub-issues that capture more specific detail on well-identified categories of performance problems. We will also use this umbrella issue to post high-level updates on performance matters: it will therefore be locked to allow subscribing to updates from @mpvl.
Please subscribe to this issue to receive notifications of those updates.
The sub-issues linked below this umbrella issue will include more detail of each category of performance problem. Each sub-issue will link via tasklists to bug reports of performance issues that are known to suffer from the category of performance problem. High-level updates specific to each category of performance problem posted to those interested in subscribing to sub-issues.
The main umbrella issue and sub-issues will be part of the Evaluator Roadmap project (see our recent announcement of new approach to issue/project management). Bug reports linked from sub-issues will not be added to the Evaluator Project because they are entirely covered by the sub-issue.
The main categories of performance problems are:
For discussion of this issue and all matters related to performance please see #2857.
FAQ
My CUE is slow to evaluate - what should I do?
Please raise an issue and we will help to investigate. Part of that investigation will involve working out which of the categories of performance problems affect your code, and linking to your issue from a tasklist in the relevant sub-issue.
Where can we discuss things relating to this umbrella issue?
For discussion about performance issues, please see #2857, or create a new discussion and we will be happy to join the exchange.
Why is my performance bug not part of the Evaluator Project?
Where there are well-identified categories of performance problems, it makes sense to group these as sub-issues under the umbrella performance issue. This helps to keep the project board small, whilst not losing any detail in the original report. The linkage to the original bug reports is managed from sub-issues via tasklists.
The text was updated successfully, but these errors were encountered: