Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: State of llnode #202

Closed
mmarchini opened this issue Jun 6, 2018 · 7 comments
Closed

Discuss: State of llnode #202

mmarchini opened this issue Jun 6, 2018 · 7 comments
Labels

Comments

@mmarchini
Copy link
Contributor

Follow up from the Diagnostics Session in the Collaborators Summit:

llnode is a project under the Node.js org., and since it is a diagnostic tool it should be of our interest as a Working Group. Right now the project is fairly stable but has some issues:

I'm bringing this up as a discussion related to #157. llnode is a tool worth being 1st tier supported based on its importance for Core and Native Module developers as well as enterprise, but since it is a less known feature with a low bus factor its too risky today to have llnode as a potential blocker for releases.

Would love to discuss some ideas on how to improve the current scenario of the project.

@cjihrig
Copy link

cjihrig commented Jun 6, 2018

I couldn't find any content about llnode on nodejs.org, Node.js Medium or Node.js Youtube Channel

I wrote this a few months back, but it's outside of any official channels.

llnode is a tool worth being 1st tier supported

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

we only have a handful of people working on the project

An unscientific Twitter poll I ran a while back said that most people thought postmortem debugging was too hard, didn't think it was worth it, or didn't know how. I can quote at least one TSC member as saying that postmortem debugging should just be a thing of the past. As promises are used more and more, without a great solution for them, I think fewer people will care. Without many users, there will be even fewer people willing to maintain a tool with such a steep learning curve (in some cases you need to reverse engineer what V8 is doing).

I'm not trying to be negative here - I'm a big fan of llnode, but those are the challenges that I see.

@mmarchini
Copy link
Contributor Author

mmarchini commented Jun 6, 2018

@cjihrig I agree with everything you said.

people thought postmortem debugging was too hard

I think we'll be able to improve llnode's usability once the JS API lands. For example, we could have a GUI or our own REPL.

I can quote at least one TSC member as saying that postmortem debugging should just be a thing of the past

Have they suggested any alternatives? V8 Snapshots are too expensive to be used in production, and they don't provide all the information we can gather with core dumps (the complete call stack, inspecting JS contexts and scripts, etc.).

As promises are used more and more, without a great solution for them, I think fewer people will care

Agreed, but to move forward on how to handle promises we need more people contributing (which is hard).

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

For some time now I wanted to try to use V8's API (or even N-API) to write something similar to llnode, which wouldn't require so many hacks and wouldn't rely on postmortem metadata to inspect a core dump, but I didn't had the time to work on it yet.

@mmarchini mmarchini reopened this Jun 6, 2018
@joyeecheung
Copy link
Member

joyeecheung commented Jun 6, 2018

I want to agree with you here, but it's extremely impractical without some type of commitment or collaboration with V8. (Also, what about ChakraCore?)

I have to agree with @cjihrig on this for now, if llnode being in tier 1 means breakage of llnode would block V8 updates in core.

I think we'll be able to improve llnode's usability once the JS API lands. For example, we could have a GUI or our own REPL.

On one hand I am optimistic about this, on the other hand based on a GUI prototype (cannot release it because I worked on it in my last job) that I've developed before and observations from using llnode with a lot of in-production core dumps (frequent crashes caused by lldb itself, cannot deal with a lot of core dumps while gdb works fine, etc.), I don't think the tool is ready, especially after I've seen this post from a year ago in the lldb mailing list. For example, sometimes I have to use this hack in order to load core dumps with lldb, but it does not always work because there are deeper issues with how lldb just crash when it encounter things that it does not support or just doesn't try hard enough. It's nice to have this tool usable when I need to debug C++ and JavaScript together and it's kind of painful when I couldn't, but with the support of lldb itself I think that's where we will be for some time. The JS API does not solve the issue of lldb and a crash of the background process of a GUI tool is very confusing even if I know it's not unfixable. You just get lazy when you think about the llvm development process (svn, mailing list, giant repo, .etc).

@mike-kaufman
Copy link
Contributor

mike-kaufman commented Jun 7, 2018

I also agree w/ Colin here. Not sure what the solution is - any debugger extension working over a core dump is going to require detailed understanding of the host + VM internals, and those internals will change.

I also agree that the right approach here is to push all knowledge of VM internals down onto the VM, and land on a stable API that debugger extensions can take a dep on. Perhaps this API becomes part of n-api. Need to get the VMs to sign up here to support this. Not sure what V8 already has in place here. Chakra has a windbg extension that understands externals, but unless someone volunteers, I don't see that being ported to any other platforms.

Also, I agree w/ above that this is a scenario/tool w/ narrow appeal. For most JS devs, they only care about the JS stack + JS heap. The question is, is a JS stack + heap sufficient to debug a core-dump in a meaningful way? Or, is it the common case that by the time you have a core dump, the failure is pretty deep & the JS stack + heap is useless?

That said, in terms of design, it would be nice if there was a system like this:

-------------
| UI Tool    |   ---->  let people have a variety of tools/UX here. 
-------------          UI tools can work cross-plat, cross-node-version, & cross-VM
     /\
      |      ------------> some crdp-like protocol
      |
---------------------------
| Node Core Dump Adapter |   ----> this is going to be platform-dependent, 
--------------------------             VM dependent & node version dependent
     /\                                 May leverage host-specific tools like lldb to read
      |                                 interact w/ dump.
      |
---------------    
| core dump |    
---------------

@joyeecheung
Copy link
Member

joyeecheung commented Jun 7, 2018

The question is, is a JS stack + heap sufficient to debug a core-dump in a meaningful way?

Having used llnode to debug many in-production core dumps, my answer is yes. If someone's app is stuck in an infinite loop (which is not that uncommon I am afraid, especially in buggy loggers), it's possible to use gcore to trigger a core dump and use that to see the call stack. Sometimes it helps with OOM if the objects causing it is obvious from the object count/total size (you'll be able to trace the reference with llnode), or if it happens that the app is running code that cause the OOM, like some code that tries to join a huge array then you'll be able to see where that array is coming from. Sometimes it helps with segfaults if you need the JS stack for reproduction.

But the core dump is obviously not a silver bullet, also lldb itself does not work as well as gdb with a lot of core dumps.

Or, is it the common case that by the time you have a core dump, the failure is pretty deep & the JS stack + heap is useless?

It's very rare that the JS stack or heap is completely useless. It's just hard to know where to look if you are not familiar with the code base and there are a lot of node_modules functions on the stack and a lot of objects created by code that you are not familiar with. But I would say it's the same situation with other VM-aware tools like heap snapshots.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@github-actions github-actions bot added the stale label Jul 17, 2020
@mmarchini
Copy link
Contributor Author

This spun into other issues so I believe it can be closed. Feel free to reopen if more discussion is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants