-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can (non-standard) stacks be preserved with implicit PTC, and does this matter? #6
Comments
I'm curious whether other implementors and users find ShadowChicken to be an acceptable strategy. @bmeurer from V8 team expressed this concern about ShadowChicken:
|
I wanted to add that Google Web products also collect stack traces from the field and that our optimizer likely causes more the average number of PTC calls. So implicit PTC is worrisome |
I am very worried by implicit tail call elision, from a Node.js / server side perspective. Post-mortem debugging and in-production profiling are important use cases on the server side. It is very common practice for server side applications to do profiling in production. This entails gathering stack traces and associating them events being profiled (e.g. cpu ticks, allocations, etc.). TCE would reduce the efficacy of such profiling techniques and make it harder for people to understand performance issues in their production code. Similarly, post-mortem debugging is common practice on the server side. Upon errors you take a core-dump, restart the service, and analyze the dumps offline / en-masse. Having tail calls elided by default would make it hard to identify root-causes of problems. Approaches like ShadowChicken are a no-go for me because they cost performance for a 'usually works' solution. Implicit tail call elision means that I don't get an option to make a trade-off. I just get a degraded experience on debugability and profile-ability. With an explicit opt-in syntax, I can at least make that trade-off and not write code (or use libraries) with tail call elision. /cc some folks from @nodejs/post-mortem as they might have opinions on by-default tail-call elision: @yunong @davepacheco @mhdawson. |
For what it's worth, the ShadowChicken strategy described above would not be very useful for users of DTrace and core-file-based postmortem debugging for Node.js. Node.js today (since 0.6.7) supports the DTrace ustack helper mechanism, which allows DTrace to construct JavaScript stack traces from running Node programs without any assistance from the program itself. This is necessary in order to, for example, profile stack traces where a Node.js program goes to sleep, wakes up, issues I/O, or otherwise interacts with the operating system -- or just to profile it. This mechanism relies on there being a relatively straightforward way to translate stack frames on the native stack into JavaScript ones. I've written a lot more about how this mechanism works elsewhere. There are a couple of tools for debugging Node.js programs from core files. I develop mdb_v8, but there's also llnode and IBM's IDDE. Using core files, of course, we cannot assume any assistance from the running program. mdb_v8 currently reconstructs JavaScript stack traces, most JavaScript values, and closure variables. It could potentially make use of shadow table representing the "real" stack, but support for that could be arbitrarily complicated, depending on how it's implemented. Anyway, we'd always be enabling such a feature, which seems to defeat the purpose of the TCE. Optimized tail calls have always been a potential source of problem with debugging and profiling tools, and the impact varies depending on the problem. But given the option, particularly for JavaScript, I would generally disable this optimization for production use. |
To elaborate on the comment from @concavelenz - maintaining stack frames in error.stack is very important for Google applications. We collect errors and analyze them on the server, and it's a critical path for fixing problems found in production. A fix only for devtools would not be sufficient for our use case. Similarly, it sounds like the ShadowChicken strategy might not be suitable as well. |
@erights I'd be interested to hear your thoughts on this. |
I have the same discomfort with ShadowChicken. Causeway is a post-mortem distributed debugger that I worked on a long time ago (together with @cocoonfx , Tyler Close, and others). See the Related Work section of Causeway: A message-oriented distributed debugger for some other relevant old work on distributed debuggers. Causeway and some of these others could cope with partial information in the sense that some sites participating in the distributed computation may elect not to also participate in the distributed accumulation of logs. But it would have been very weird for the information collected at participating sites to be non-deterministic depending on its local memory pressure. Whether we do PTC and ShadowChicken or not, we should adopt STC regardless. Even with PTC and ShadowChicken, the advantages of adopting the explicit syntactic marker of STC are
STC has these advantages whether used instead or in addition to PTC. |
One possible way to reconcile a ShadowChicken-like approach with the points that @concavelenz , @mknichel , and Causeway make is to say that, in the context where logs are accumulated and reported, that the info that GC would painfully drop under ShadowChicken is first logged, and then deleted from the local heap. This relieves local memory pressure without losing the utility of this info for instrumentation. This would be a difficult bit of engineering and makes the non-determinacy of GC yet more visible. At least at the present time, I am not advocating it. But it is worth pointing out as a possible way of resolving this tension. |
If stack frame data is logged before being deleted through GC, it still constitutes linear rather than constant resource usage, and goes against the spirit of proper tail calls. The ES spec doesn't say anywhere that things are held in memory rather than disk :) |
I agree. Hopefully people will find this persuasive. |
There is a lot of misinformation in this thread about how ShadowChicken works and how other theoretical shadow stack implementations could work. I'm going to speak to how we implemented our shadow stack to in hopes of moving discussion away from arguing against a theoretical shadow stack implementation. The main arguments I see here against ShadowChicken is that its efficacy is dependent on heap memory pressure and that it uses linear memory. This is not how we on the JSC team designed our shadow stack. Firstly, our stack pruning algorithm is not tied to GC or heap memory pressure. Secondly, JSC's implementation of a shadow stack does not use linear resources. We will keep the last N tail deleted frames in our shadow stack. For example, if your program contains M naturally occurring frames, JSC will keep between M and M+N frames in its shadow stack when the inspector is open. We chose N=128 because we think it provides a good balance between the two main debugging scenarios:
Again, we chose N=128 because we think it strikes a good balance. We may change this number as we see fit. But the argument that an implementation of a shadow stack implicitly requires linear memory and needs to be cleaned up by GC is provably wrong. |
I was not referring to ShadowChicken being a linear memory user but rather Mark's suggestion of logging tail-optimized frames being linear. Sorry for my misunderstanding; I thought it was based on GC because Chicken's TCO is, but this is an interesting-sounding design to use a separate mechanism to ensure limited resource usage. With this design, is the resulting stack deterministic? Nevertheless, the underlying concern about missing stack frames in Error.stack remains. We have heard in this thread as well as in #6 that developers depend on this stack, and from this thread, we know that 4-5% of stack frames on some popular websites would go missing. The lack of bug reports has been mentioned, but a missing stack frame would not result in a page load failing, but instead in more difficulty debugging something off-line later, eventually resulting in a worse user experience from other bugs not being diagnosed and fixed. Not all of the developer experience happens with devtools open. |
At Netflix, we leverage JS heavily, both on the client (most of our clients use JS) and on the server (our website and data access platform run Node.js). It's critical to our business that we have access to tools that lets us easily observe and debug our services and clients in production. For our server deployments, CPU profiling via perf which allows us to sample JS stack frames on production instances without modifications to the Node binary is vital. Profiling allows us to quickly identify performance problems and resolve them -- which is paramount when we're serving 75+ million customers. We also use @davepacheco's post-mortem debugging toolchain mdb_v8 in production. The ability to capture a core dump on a live process and then inspect it offline is quite valuable. This technique lets us examine the complete stack frame of the process, as well as all of the function arguments for each frame. Production is a tricky environment since we rarely have the luxury to switch toolchains or restart the process. Sometimes restarting the process itself means we've lost the state which caused the issue in the first place. Part of the reason we've chosen Node is that it comes with an excellent suite of observable and debuggable tools. We rely on them every day. Losing the ability to sample complete stack frames or see them in a core dump would mean losing essential telemetry -- something we're reluctant to compromise. |
One concern raised at the March 2016 TC39 meeting was how/whether it's bad if stack frames were lost as part of implicit PTC. Although this had been considered in the past and TC39 consensus still came down on the side of implicit PTC, it may warrant new consideration now, both because the DevTools teams of several browsers are being brought into the discussion in a way which they hadn't been before, and because @erights is raising the possibility of standardizing error stacks.
The problem: In a naive PTC implementation, for all tail calls, even those which are not part of a loop, and which may be present in legacy code, the caller stack frame is completely eliminated. This is visible in
error.stack
(which is sometimes used for remote debugging).Possible solutions
Size of the impact
It's unclear how important it is to developers to not have stack frames elided. @jeffmo mentioned at the March 2016 TC39 meeting that Facebook uses collected stack frames for debugging, and would prefer if these stacks maintained their current semantics. It would be great to have feedback from other web developers about how/whether this would impact them as well. If, for example, a ShadowChicken-like strategy comes with a performance regression which is acceptable for DevTools but not on-all-the-time, then this would affect use cases like @jeffmo's.
The text was updated successfully, but these errors were encountered: