Skip to content

multi: add a Status server #541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 25, 2023
Merged

Conversation

ellemouton
Copy link
Member

@ellemouton ellemouton commented May 3, 2023

This PR introduces a Status server to Lit. This server can be queried at any time to determine the
status of any of the following servers: Lit, LND, Faraday, Loop, Pool. With this PR, Lit will also not shutdown
unless stopped purposefully. The reason being so that if any parts of Litd fail to start, the status server will
still be available for a front end to query to see what is up.

Let's say for example, that Lit is started with Loop disabled (loop-mode=disabled) & then some error causes Pool
not to start up properly, then you can query the status server to see the following:

$ litcli status 

{
        "sub_servers": {
                "faraday": {
                        "disabled": false,
                        "running": true,
                        "error": ""
                },
                "lit": {
                        "disabled": false,
                        "running": true,
                        "error": ""
                },
                "lnd": {
                        "disabled": false,
                        "running": true,
                        "error": ""
                },
                "loop": {
                        "disabled": true,
                        "running": false,
                        "error": ""
                },
                "pool": {
                        "disabled": false,
                        "running": false,
                        "error": "pool not working"
                },
                "taproot-assets": {
                        "disabled": false,
                        "running": true,
                        "error": ""
                }
        }
}


Depends on: #616

Replaces #442

Fixes #421
Fixes #420
Fixes #290

@ellemouton ellemouton force-pushed the statusServer2 branch 4 times, most recently from 7e6f1c1 to 7afe6a6 Compare May 4, 2023 09:45
@ellemouton ellemouton force-pushed the statusServer2 branch 3 times, most recently from 2c9f90b to 5cf5976 Compare June 20, 2023 07:18
@ellemouton ellemouton force-pushed the statusServer2 branch 3 times, most recently from e6e7450 to e152308 Compare July 18, 2023 15:55
Copy link
Contributor

@itsrachelfish itsrachelfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get this integrated into terminal but while testing I noticed that when a subserver is disabled it completely disappears from the status list. We might want to change this because it means we won't be able to use the status server for feature detection.

For example when taproot assets are added to litd, it could be useful to query the status server to see if tapd is available. Currently there is no way to differentiate between "litd hasn't been upgraded, so tapd does not exist" and "litd has been upgraded, and tapd exists, but it is disabled"

@ellemouton ellemouton force-pushed the statusServer2 branch 3 times, most recently from b169be3 to 650cb0a Compare July 19, 2023 12:28
@ellemouton
Copy link
Member Author

thanks for the suggestion @itsrachelfish. Updated accordingly

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR, great work!
I did a first pass to load all the context. I think the main thing we might want to change is merging of the two subserver managers (see inline comments).
I'll do a round of manual testing in the next round.

Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR 🚀 🔥!!! My only comments are for minor things, so really great work! I've tested the functionality manually as well, and will also conduct thorough manual testing next review round 🚀!

Copy link
Member Author

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guggero & @ViktorTigerstrom 🚀 updated!

@ellemouton ellemouton force-pushed the statusServer2 branch 3 times, most recently from 6ae8a00 to 0053ffa Compare July 31, 2023 10:31
@ViktorTigerstrom
Copy link
Contributor

(2). Even more strangely, for some reason which I can't figure out why when looking a

Ok so this is cause currently for remote sub-servers, we dont block on the connection being successful. Instead we give it:

grpc.WithConnectParams(grpc.ConnectParams{
	Backoff:           backoff.DefaultConfig,
	MinConnectTimeout: defaultConnectTimeout,
})

and return the unconnected connection so that LiT continues to maintain the connection if the remote server gets restarted or something. I dont think we want to change this behaviour right now. And I think in most cases, the pool/faraday/loop etc servers are run in integrated mode in anycase.

I think in a follow up PR, we can add a heartbeat mechanism to the status server to give more live updates regarding sub-server statuses.

Ok makes sense, thanks for checking! Sound like a good plan

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work. tACK, LGTM 🎉

Just like for the Taproot Assets subserver, we add the option to disable
the Loop, Pool and Faraday subservers.
Update the `testDisablingSubServers` test to also test the suite when
the loop, pool and faraday subservers are disabled.
If an error is returned from the errQueue channel, we want that error to
be returned so that LiT is shutdown with a non-0 exit code.
We eventually want a few Lit grpc servers to continue functioning even
if LND did not start up properly. This means that we need to register
these servers with LiT's grpc server instead of using LND's.
Because of this change, we also need to update the itest a bit so that
calls to the proxy server are never expected to succeed if the call is
made via the LND port.
These logs were initially added for debugging LNC issues but now they
just spam the itest logs so let's remove them for now.
Add the protos for a new Status server that can be used to query the
status of a number of LiT's subservers.
Initialise an instance of the status manager in LightningTerminal.
Register its gRPC and REST endpoints and also add its method to the set
of LiT whitelisted permissions.
In preparation for an upcoming commit where litcli commands for the
Status server are added, the `getClientConn` method is updated with a
`noMac` parameter. If true, them the connection will be made without a
macaroon.
Note that the client connection for these commands does not require a
macaroon.
Pass the status manager to the sub-server manager so that the sub-server
manager can inform the status manager of any sub-servers that have
started or failed to start.
Add a Handles method to the subserver Manager which takes a URI string
and returns true if one of the sub-servers it manages owns the given
URI.
@guggero
Copy link
Member

guggero commented Sep 22, 2023

I assume we'll want to wait for #620 before merging this?

@ellemouton
Copy link
Member Author

@guggero - yes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Show LND/LiT version mismatch error in Web UI Show dependency issues in the UX litd exits with code 0 when it stops due to non-0 lnd exit
6 participants