Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeaconNode <--> ValidatorClient API - Protocol #1012

Closed
spble opened this issue Apr 30, 2019 · 28 comments
Closed

BeaconNode <--> ValidatorClient API - Protocol #1012

spble opened this issue Apr 30, 2019 · 28 comments

Comments

@spble
Copy link
Contributor

spble commented Apr 30, 2019

ETH2.0 Beacon Node & Validator Client Protocol Discussion

Further background, and actual protocol, is described in issue #1011

It would be useful to choose a standard protocol for the BeaconNode and ValidatorClient API.

It was discussed during the Client Architecture session at the Sydney Implementers meeting that the main decision is between gRPC and JSON-RPC.
This discussion was a follow on from the Client Architecture Roundtable in Prague.

gRPC

Advantages

  • Fast
  • Highly specified, compiled specification
  • Has explicit types and support for binary data

Disadvantages

  • Non human-readable transport
  • Requires specific framework for integration other software
  • Requires special software to interface with directly

JSON-RPC

Advantages

  • Human-readable transport
  • Simple enough to be implemented by any language
  • Well understood by many developers
  • Can easily be called with curl
  • Consistency with Eth1.0 clients

Disadvantages

  • Slower
  • Some ambiguity around data representations
  • Specification of API not as rigerous, must be specified out-of-band

In conclusion, most people had a preference towards JSON-RPC mainly due to it's human readability and ease of implementation.

@prestonvanloon
Copy link
Contributor

For Prysm, we will be continuing to use protocol buffers for our beacon chain and validator implementation.
The discussion within the team is that the API enforcement within generative code and performance gains outweigh the marginal benefit of using curl or other pre-installed tools rather than tools created for the ecosystem.

Client interop may be achieved through a gRPC proxy gateway, but the bidirectional streaming would not work so we may not support JSON-RPC unless there is a very compelling reason to do so.

@spble
Copy link
Contributor Author

spble commented May 1, 2019

Thanks for the input @prestonvanloon - I definitely see the performance improvements with using gRPC, but I imagine the interface between the BeaconNode and ValidatorClient will be a fairly low-bandwidth interface. As such, doing a call in 10ms instead of 100ms would not bring any substantial benefit in my opinion.

We have also implemented protocol buffers in Lighthouse currently, but we are considering re-factoring this if most other clients are in favour of JSON-RPC. Interoperability is our most compelling reason for this re-factor.

While using curl is helpful, I think the human readable and widely understood nature of the protocol is the biggest benefit. Interacting with JSON-RPC is very very widely used and understood by web developers, whereas gRPC is generally a lot more niche.

@spble
Copy link
Contributor Author

spble commented May 1, 2019

Also, a quick google around reveals: https://github.com/plutov/benchmark-grpc-protobuf-vs-http-json
Turns out that speeds and resource usages are fairly compatible... within one order of magnitude anyway.

@prestonvanloon
Copy link
Contributor

prestonvanloon commented May 1, 2019

@spble Interesting link!

They are almost the same, HTTP+JSON is a bit faster and has less allocs/op.

This is quite surprising actually. 😄

Going forward, we would still advocate for protobuf usage even if solely for its generative schema approach. If the general consensus is to support only JSON-RPC, then we would likely provide some wrapper or use jsonpb while we continue to have success with protobuf. We're even using protobuf in the browser with a Typescript application! And with tools like prototool, we maintain productivity for the rare need for adhoc queries.

In short, we support interop even if we are the minority.

@pipermerriam
Copy link
Member

Lacking a compelling reason for the performance gains gRPC which based on the comments from this thread doesn't seem to be present, JSON-RPC is my preference.

Potentially compelling reason for JSON-RPC: It is already well supported across the existing web3 tooling which makes integration with existing web3 client libraries much simpler.

@karalabe
Copy link
Member

karalabe commented May 6, 2019

Hey all,

Just wanted to do a small braindump. Full disclosure, I'm not familiar with the ETH 2.0 spec at all, neither with the communication requirements between beacon chain nodes and validators. That said, I can talk out of ETH 1.0 experience + general API experience.

Generally, the dumber and more boring a protocol is, the simpler it is to interface. At the end of the day, the goal of Ethereum is to bring developers together, so we should always prefer simplicity over other advantages.

There have been two proposals made here: gRPC and JSON-RPC. I honestly don't see any advantage in gRPC if we're building an open infrastructure. "Nobody" will want to (or be able to) roll their own gRPC implementation, so you are immediately limited by what you can implement on top of Ethereum purely, because you can't talk to it. This alone should be enough to rule out gRPC (this is why you don't see protobuf, cap'n proto and others on APIs). These frameworks are very nice for internal calls in proprietary systems, but not in public APIs that you want to maximize interoperability with.

That said, JSON-RPC is also a horrible choice. It's better than gRPC in that you can at least interface it easily, but the issue is that it is a stateful protocol, which makes it a non-composable protocol. Ethereum 1.0 made the huge mistake of permitting RPC calls that span requests (e.g. req1: create a filter; req2: query logs until block N; req3: query until block M, etc). This is a huge problem in internet infrastructure as it completely breaks horizontal scaling. All the requests must go to the same backend, because they are stateful. The backend cannot be restarted, the backend cannot be scaled, cannot be load balanced, cannot be updated, etc. JSON RPC works ok for communicating 1-1 with a dedicated node, but you cannot build public infrastructure out of it.

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve. If you are unfamiliar with it, REST is simply a "schema" that defines how you should query data ("GET /path/to/resource/"), how you should upload data ("POST /path/to/resource") and how different errors should be conveyed ("404 not found"). It is a tiny specialization of the HTTP protocol, but the enormous power is that:

  • You can query if from any tool: curl, browser, literally any programming language. The data you get back is JSON that you need to interpret of course, but the "input" is mostly just URL parameters that you can even type in manually to test something.
  • Everything speaks HTTP. You can access it through a proxy, you can put it behind Tor, you can shove it on top of AppEngine, you can put an nginx load balancer in front, you can put a memcache in front, you can put Cloudflare in front, you can have AWS or GCE auto scale it for you. You can host one backend to serve it, or 100. You can serve it from multiple geographic locations (i.e. many data centers). You can have rotating DNS in front for failovers. You can have encryption and server authentication via TLS, you can add client authentication via client certs, or JWT tokens, or OAuth. You have well defined throttling mechanisms via tokenbuckets. You can even host your service through a Mashape/RapidAPI marketplace and make money for yourself.

You see, RESTful HTTP APIs are the building blocks of the modern internet. Everything on the internet is built to produce and consume it. If we go down the JSON RPC path, we remain yet another niche. Sure, some will support it, but the big guys will always be deterred. If we embrace proper architectures, Ethereum will be trivial to integrate into existing systems, giving it a huge boost in developer appeal.

My 2c.

@karalabe
Copy link
Member

karalabe commented May 6, 2019

Oh, just as a memo, the fact that the default reply format is JSON, is just a detail. Since the reply is just an HTTP response, you are free to send JSON, or any other format. Way back XML was also popular (e.g. "GET /path/to/res.json" vs. "GET /path/to/res.xml"), but there's nothing stopping us from also supporting a binary format (e.g. "GET /path/to/res.rlp" or "GET /path/to/res.ssz"). REST still works, it doesn't care, HTTP doesn't care, nothing cares. But we can immediately have both performance and simplicity: validators would use a binary format, and a web interface would use a json format.

@karalabe
Copy link
Member

karalabe commented May 6, 2019

Btw, I'd gladly help spec out a REST version if you have any pointers to the requirements. I'm aware there might be limitations that might make REST unsuitable, but I'd rather redesign the limitations than go with a non-popular protocol.

@pipermerriam
Copy link
Member

@karalabe I'm not sure I follow the argument for REST. I acknowledge and recognize the problems with the stateful ETH1.x APIs and am fully onboard with avoiding those mistakes in Eth2.0 APIs but I fail to see how REST solves that.

Note that I'm not arguing against REST, just trying to understand.

I do agree that REST is more expressive than JSON-RPC and that we could benefit from that. I will say that JSON-RPC's simplicity has been nice, exposing the API over a unix socket and bypass the need for an HTTP server's complexity.

@ligi
Copy link
Member

ligi commented May 6, 2019

I think the most compelling argument for gRPC/protobuf is that it leads to a well defined API - currently with json-rpc this is a mess. As far as I see preventing to repeat this mess could be enforced by using gRPC/protobuf. So I would lean in this direction. Also having trouble understanding @karalabe 's argument against it:

I honestly don't see any advantage in gRPC if we're building an open infrastructure. "Nobody" will want to (or be able to) roll their own gRPC implementation, so you are immediately limited by what you can implement on top of Ethereum purely, because you can't talk to it.

why will nobody be able to roll their own gRPC implementation?

@karalabe
Copy link
Member

karalabe commented May 6, 2019

REST mostly allows Ethereum to be a component in a modern web stack. For example, I can't run my own "Infura", because it's a PITA to interpret, load balance, and cache all those requests. It takes a team just to maintain an Ethereum gateway. But if the API was simple REST, anyone could compete with Infura. You could have Cloudflare compete with them. You could launch N k8s instances and have k8s auto load balance. The advantage is that you can combine your node with existing infrastructure in a natural and native way, without relying on custom bridges (e.g. How do I write a firewall to block personal_xyz JSON RPC calls, I dunno? How do I write a firewall to block /api/personal/xyz, well, that's easy, any web server/router/proxy can do it, or authenticate it, or throttle it).

I do agree that REST is more expressive than JSON-RPC and that we could benefit from that.

I'd actually say REST is less expressive, hence why it's more composable.

exposing the API over a unix socket and bypass the need for an HTTP server's complexity.

We can still expose REST through a unix socket. The socket is just a transport, TCP vs. IPC. Whether that transport speaks REST or JSON-RPC is not relevant from the transport's perspective.

@karalabe
Copy link
Member

karalabe commented May 6, 2019

@ligi gRPC is a framework. You need libraries to speak to it. e.g. there's no Erlang lib. You immediately shot people like blockscout off the network who develop in Elixir.

@ligi
Copy link
Member

ligi commented May 6, 2019

@karalabe
Copy link
Member

karalabe commented May 6, 2019

0.4-alpha, build failed on CI :) Yes, you can hack it, but that doesn't mean there's reliable code.

@ligi
Copy link
Member

ligi commented May 6, 2019

OK - good point ;-)
Still really compelled by the advantage of having a strong protocol spec though - but you are right - it comes with some collateral damage ..

@karalabe
Copy link
Member

karalabe commented May 6, 2019

Completely agree :) https://swagger.io/specification/

@pipermerriam
Copy link
Member

@karalabe 👍 makes sense now. I would be fine with REST or JSON-RPC.

Restating my 👎 on grpc due to it having real tooling downsides and all of it's stated upsides being things we can address with things like swagger for well defined REST specifications, or just good due diligence if it's JSON-RPC.

My comment about expressiveness was intended towards the expressiveness of HTTP method semantics in REST (GET/POST/PUT/DELETE) and response status code.

@holiman
Copy link

holiman commented May 6, 2019

My two cents (which mainly is the same as @karalabe brought up).

Cent one

  • JSON-RPC is a protocol for a dialogue between two peers that send messages to one another. It's good for that. It means that each message is it's own unique snowflake, and each message deserves it's own unique response. That means it is
    • Intrinsically difficult to cache,
    • A message-based processing pipeline, which is quite resource intensive
  • A REST API is a client/server protocol, where resources are served to a multitude of clients. Like Peter pointed out, it can be trivially scaled/cached/balanced.

That may be somewhat generalizing, but I think it's fairly accurate description. So, also without having deep insight into 2.0, I think you should consider whether what we're building up to is going to be a dialogue or a client/server scenario.

Cent two

  • Writing a schema for JSON-RPC is, imho, very difficult. The ecosystem for json-rpc schemas is not mature. I have, as well as @cdetrio, tried to make formal definitions of the expected requests and response schemas in use, in order to create validation tests. There are tickets a bit here and there regarding this, I can't find it right now, but suffice to say it's been a PITA to validate/maintain/specify client behaviour without any rigorous schemas on expectations.
  • Writing schema for RESTful service is very mature, with things like swagger and similar tools which generate end-user docs, examples and validation.

@FrankSzendzielarz
Copy link
Member

FrankSzendzielarz commented May 6, 2019

My Various Cents

  • Swagger metadata can be used to deploy Swagger UI pages for remote testing and integration
  • Swagger metadata can be used in Swagger codegen to auto-create stub clients and stub servers
  • The metadata defines message contents clearly
  • HTTP Media Formatters allow any encoding (flexibility) so the particular client can negotiate the encoding (RLP, XML, JSON etc...) with HTTP Accept / Content type headers as per usual in web dev, and typically web API servers will handle all that
  • Error messages are standardized. This also helps with capacity management. The usual 503 or 429 HTTP messages can be sent - and this could happen either at a networking (Infura) level and/or in the server implementation itself. (Some notes on capacity management https://ethereum-magicians.org/t/a-cross-protocol-cross-implementation-standard-for-server-capacity-management-and-flow-control/3123)

@FrankSzendzielarz
Copy link
Member

FrankSzendzielarz commented May 6, 2019

Here is a rough, part-implemented (missing other objects deeper in the object graph under BeaconBlock) example of a Beacon node HTTP REST-like architecture and API

https://beaconapi20190506111547.azurewebsites.net/

Because the Swagger metadata in the URL is downloadable this could help serve as a spec.

I can keep extending and modifying this so that it actually does validation etc., if people want. Maybe it could evolve into a test harness or an implementation. Let me know please.

You can add proto-buf media formatters and RLP formatters as well those default JSON and XML ones you see there. You can also try to auto-generate clients in the language of your choice here with the gen/clients POST method. Eg: this was auto gen'd for golang and note the docs folder.
go-client-generated.zip
Rust, just to be fair:
rust-client-generated.zip

@spble
Copy link
Contributor Author

spble commented May 7, 2019

Thanks very much for the input @karalabe - I definitely agree with your points regarding REST. I think HTTP-REST is what I had in my mind, I was just following Eth1.0 with JSON-RPC.

My vote is definitely for a HTTP REST interface, which returns JSON by default.

Thank you for the part implementation @FrankSzendzielarz - I will integrate your suggestions into my next API proposal and post it on #1011

@paulhauner
Copy link
Contributor

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve.

I support this.

@arnetheduck
Copy link
Contributor

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve

likewise, support this, for the advantages of working better with "standard" infrastructure. also good to work on specifying it unambiguously - current status quo is indeed a bit of mess to figure out, and swagger seems as good as any.

nothing prevents clients from using another, more performant or specialized protocol in their internal communication (for example when a beacon node and validator from the same client suite talks to each other), when the goal is not interop.

@gcolvin
Copy link

gcolvin commented May 11, 2019

@karalabe Has been right about this for going on two decades now.
Roy Thomas Fielding, Architectural Styles and the Design of Network-based Software Architectures
CHAPTER 5: Representational State Transfer (REST)

@spble
Copy link
Contributor Author

spble commented May 13, 2019

So a REST API seems to be the consensus.

I have proposed an OpenAPI spec in PR #1069, which can also be viewed on SwaggerHub

Closing this issue in favour of the PR.

@spble spble closed this as completed May 13, 2019
@BelfordZ
Copy link

BelfordZ commented May 23, 2019

I propose we use OpenRPC + JSON-RPC

@zcstarr
Copy link

zcstarr commented Jun 27, 2019

I don't understand, the arguments made above seem anti useful. It seems that this change would be locking you into a transport, that has high levels of inefficiency. With JSON-Rpc you have a choice about how to the data gets there. If this is meant to be low level infrastructure, being as agnostic as possible with the transport seems the most beneficial.

When it comes to tooling, JSON-Schema specifications are your friend, they have always been. Additionally has anyone used swagger, and swagger tooling in production. Just because swagger exist doesn't actually solve your problems of testability and documentation discoverability. OpenAPI may support code generation, but there's the possibility of doing the same for JSON-RPC.

The issues are many fold.

  1. A switch like this breaks all the tooling people have made around JSON-Rpc
  2. With REST there's no ability to batch request, there of course are work arounds, but its not particularly useful.
  3. You've just locked your infrastructure into http , with 0 support for potentially faster transports
  4. Using swagger tooling isn't great, the ecosystem for swagger, isn't as well maintained as you'd think
  5. Why is the juice worth the squeeze, weaker performance, transport lockin, breaking any ecosystem/tooling that expects to communicate over JSON-Rpc.

If someone could layout the benefits outside of swagger docs that would be amazing, generative resources are good thing, but there's other tooling to generate JSON-Rpc clients/servers as well.

This is a really important issue, can't believe I missed this coming down the pipe.

@holiman
Copy link

holiman commented Jun 28, 2019

@zcstarr I can only reiterate, really. json-rpc is awesome for two peers having a dialogue. In the recent development of clef, the externalized signer from geth, we use bi-directional json-rpc between clef (daemon) and the ui. Both parties can send messages to the other, and get responses - have asyncronous dialogues.

However, REST assumes that the communication is a client requesting resources form a server -- they are not peers.

The two models are inherently different,

  • The latter is inherently cache-friendly,
  • The latter does not make assumptions about a what partcular server answers a particular request (whereas json-rpc has individual sequential ID for each message, thus assuming a certain statefullness in the request/response) -- so the latter is easily load-balanced.

I do believe that swagger is more mature than json-schemas, but regardless, I don't see that being the the primary driver personally. Fwiw, there exists no json-schema for the eth 1.0 json-rpc, despite attempts historically to address this. It's been a source of bugs over many years.

As for locking into a transport, that's partially true. However, it also solves many other problems:

  • In the transport layer, one might want authorization. HTTP solves that (basic, digest, client-cert, other)
  • One might want session (long lived authororization). HTTP solves that with cookies, header-based solutions like jwt etc.
  • One might want per-resource cache-directives (http has it)
  • One might want content types, compression (and negotiation about compression)
  • One might want to utilize existing CA infrastructure and encryption,

Regarding the points raised

  1. Breaks tooling
  • There's quite a lot of tooling around http already, lots of tools that we don't even have to build, because they already exist across a variety of platforms
  1. No ability to batch request
  • HTTP does have batching, in the form of http pipelining. More paralellizability is coming with HTTP3 (see next point)
  1. Not true, HTTP3 is in the works, based on Quic.
  2. Maybe, can't say
  3. See reasons above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests