Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to move out GraphQL API from fuel-core to fuel-indexer #588

Open
6 of 11 tasks
xgreenx opened this issue Sep 1, 2022 · 4 comments
Open
6 of 11 tasks

Proposal to move out GraphQL API from fuel-core to fuel-indexer #588

xgreenx opened this issue Sep 1, 2022 · 4 comments
Labels
architecture Something related to the architecture or the architecture description itself. breaking A breaking api change documentation Improvements or additions to documentation graphql-api Affects API of the GraphQL

Comments

@xgreenx
Copy link
Collaborator

xgreenx commented Sep 1, 2022

Overview

The proposal was discussed with the fuel-indexer team and accepted. You can find the proposal's description at the end of this description. The beginning of its description tracks the progress of the implementation.


The progress of the implementation

Unresolved questions

  • Do we need to move debugger API into fuel-indexer?

Child tickets:


The initial proposal

Summary

The fuel-core implements CRUD functionality, interaction with the user, indexation, subscription on events(not implemented yet), and implementation of some specific queries. All this API functionality is not required for the blockchain.

This issue is a place where we can discuss which parts of the API functionality we want to move out and what not. Where do we want to implement moved-out functionality, how, and who?

It may affect other teams(indexer team, dev-ops team, documentation team), so they also may, and should participate in the final decision.

Motivation

Moving out API functionality brings benefits along several different axes:

  • Performance: The client team can focus on the architecture and optimization of the blockchain itself. The API code also will have its architecture for optimal code in a separate repository.

  • Code-complexity: The code doesn't mix blockchain functionality with API. It makes the code clear and entirely focuses on the main task - smart contract execution.

  • Parallelisation of work: Changes can be done in parallel by different teams with some synchronization steps regarding breaking changes(p2p protocol, entity layout).

Performance

Right now, many supported queries are done in a non-optimal way(O(N), O(N^2), O(N^2 * log(N))). It is because we don't have indexes for some fields or pairs of fields. It is a vulnerability for the mainnet because anyone with a few requests can shut down/block/overload the node(you can create 1000 dust UTXO and send several queries to get a spendable balance).

Almost all input queries require iteration over all owner's entities in the database with a load of all fields and filtering(by asset id or spent status) them in runtime.

Example of owned_coins_by_asset_id image

We can manually add indexes into RocksDB, but it:

  • Makes the development slower and forces us to create our database with indexes(and we still will not be able to make them so performance as indexes in the RDB).
  • This means introducing the index support on the Database layer, decreasing the node's performance.
  • Each exotic query can require more work and architecture changes to work optimally.

Using the database that supports the creation of indexes solves many problems, but it will be a pain for us to manage all schemas for different databases.

Code-complexity

Integration of API into other parts of the blockchain breaks the abstraction and requires creating workarounds or writing non-optimal code.

If the API functionality is part of the blockchain, it creates the following issues:

  • The architecture should worry about information(transactions/blocks/entities) propagation to the API service from other services.
  • Database management is more complex and requires more abstractions(because we need to connect RocksDB with another database).
  • The whole API module should be configurable.

If the API is a separate node/binary/daemon, it can have its logic on how to process each block and transaction. It can, optimally, update all tables/entities in the database and use proper indexes(and manage them) to support all required queries.
The entry point may be the typical fuel-p2p service. But except for the p2p service, it also will contain Rest API to work with users.

Another good point is that API is independent of the blockchain node, and it is not a problem to make changes in the DB schema to support a new query or data format.

Parallelisation of work

The proper and actual API is essential for a good development experience and requires the introduction of many endpoints. The client team is more focused on the blockchain changes first, so unable to deliver it in time. With a separate API node/indexer, we can have another team that is entirely focused on that.

It makes the development parallelizable from the beginning. Of course, those two teams should sync periodically about breaking changes(p2p protocol, entity layout), and the API team will depend on changes in the fuel-core. It forces the development of fuel-core to be more modular and act as a crate more than the package. But the dependency will be only on several services, so we can minimize the number of conflicts.

Possible solution

Fuel already has a particular product indexer, that indexes blocks, transactions, and events, and the team grows(human resources). The indexer uses SQL database under the hood and supports indexation. Also, in the future, it can be our full history node and support timeline queries.

The indexer auto-generate DB schemas based on the entities description from handlers. But it doesn't have schemas for common data used by the blockchain. Moving API to them also means starting indexation and storing blocks/transactions/data required for the queries. As a consequence, support of additional schemes in the database with manually specified indexes based on the list of supported queries(as an example coinsToSpend).

Possible problems

  • The fuel-core should be modular enough to easily integrate with the indexer. Right now, indexers use fuel-core to query blocks and receipts. Maybe it should receive all information from the p2p service in the future as any peer node in the network. Also, traits used in the fuel-vm(one of the indexer's dependencies) should be as generic as possible to cover the work with different databases.

  • We need to move most queries to the indexer, and maybe the current architecture is not ready. It requires additional work on the indexer side(maybe someone from the client team should do that).

  • The relationship between indexer and fuel-core will be tight because data layout will affect them too. Someone from the client team should be involved in the indexer's processes to notify them/us about upcoming/required changes.

  • It makes UX for developers harder because they need to install the fuel-core node together with the indexer. Maybe we need to think about one binary that will include the functionality of both products for a simple setup.

  • It may require updating the integration tests, maybe infrastructure/deployment/CI.

Unresolved questions

  1. Which GraphQL queries do we want to process in the fuel-core, and which do we want to move out?
  2. Do we need to support some GraphQL queries on the fuel-core? Or better make it a fully p2p node without any endpoint?
  3. How better to start the transition, and who will work on it?
  4. Who is responsible for coordinating future work with the indexer team(like breaching change notifications/discussions)? Do we need a coordinator?
  5. May it somehow affect future integration with the bridge and relayer?
  6. What is the approximate scope of the work? Can we finish it before the mainnet and update all repositories/documentation?
@xgreenx xgreenx changed the title Proposal to move out CRUD functionality from fuel-core [WIP] Proposal to move out CRUD functionality from fuel-core Sep 1, 2022
@xgreenx xgreenx added breaking A breaking api change architecture Something related to the architecture or the architecture description itself. documentation Improvements or additions to documentation labels Sep 2, 2022
@xgreenx xgreenx changed the title [WIP] Proposal to move out CRUD functionality from fuel-core Proposal to move out CRUD functionality from fuel-core Sep 2, 2022
@Voxelot
Copy link
Member

Voxelot commented Sep 3, 2022

I agree with the sentiment that a node only functioning as a pure blockchain validator / block producer shouldn't have to incur extra database hits to build and maintain secondary indexes only used by the SDK or API consumers. However, we also need to balance making our lives easier with a simpler client codebase vs shunting additional complexity onto users. This decision should be informed by our philosophy of:

Efficiency & developer experience is everything

Splitting fuel-core into multiple binaries (p2p only blockchain node vs separate API layer process) has a real cost on DX (developer experience) that is not trivial. It also isn't standard behavior. For example, most blockchain clients such as Ethereum provide at least some kind of basic RPC API built-in. A graphql first approach has been a key selling point of Fuel since the beginning, so we aren't going to use REST or jsonrpc.

While the idea of a headless client is interesting, I think that should be a cli flag rather than a separate binary IMO. Internally we can structure the codebase however we like for modularity, but the final product should be a single binary that uses runtime feature flags to enable/disable functionality such as API endpoints. This gives the ability to run a node in a more efficient manner when desired, without causing major breaking changes or degradation to the DX.

Relying on the indexer for more complex or dapp-specific queries makes sense, however, there should be a core set of APIs available on any fuel-node that allow for some bare minimum usage of the network with or without operating an additional indexer node or additional database. I don't think it makes sense to delegate something like transaction submission to an external process or indexer since it's so heavily related to the transaction pool. The other issue with not exposing RocksDB data via graphql at all, and entirely relying on an API node using some other kind of datastore, is that the client essentially becomes a black box. Most of our integration tests would become useless without incorporating several services.

It is a vulnerability for the mainnet because anyone with a few requests can shut down/block/overload the node

The blast radius for bad API queries is somewhat limited by our planned use of sentry nodes for mainnet. However, that's no excuse to leave ourselves open to intentional or unintentional DOS. Unless we can actually extract these APIs to the indexer, we need to address these performance issues within the node.

The question around which queries to keep in fuel-core node should be based on some criteria:

  • Is it essential for what we'd define as bare minimum client interaction for DX?
  • If not, does the query only use data we need to store in RocksDB anyways and without requiring any complex sorting/filtering.

coinsToSpend would definitely benefit from being a built-in piece of functionality for the indexer node. However, given how core this API is for doing any sort of basic SDK actions, it would be fairly painful on the DX as well.

I'd rather find a way to modularize DB activity related to servicing the APIs (i.e. index maintenance) without needing to do a complete re-architecture or harming our DX. This way, if there is complicated logic related to making API queries faster, it is self-contained and easy to disable at runtime.

@xgreenx
Copy link
Collaborator Author

xgreenx commented Sep 3, 2022

The developer doesn't need to install/interact precisely with fuel-core. We can have a separate package that starts the indexer and fuel-core together. Indexer has a GraphQL port exposed to the user. Under the hood, it forwards all transactions to fuel-core.

In this scenario, we have three entities:

  • fuel-p2p-node(it is renamed fuel-core) - the main blockchain node that communicates only via p2p. Most nodes in the network will be fuel-p2p-node and participate in the consensus and manage the network stability. They don't need to be aware of API stuff.
  • indexer - the binary with p2p and p2p-sync services to receive information from the network. It also has logic to process each block/transaction and index the information. It connects to external or to local fuel-p2p-node.
  • fuel-api-node - the binary that runs fuel-p2p-node and indexer together. It exposes the indexer's GraphQL port, and the user interacts with this port as previously. Under the hood, indexers forward all transactions to fuel-p2p-node, which works as a common p2p local network.

The user installs/works/interacts/plays with fuel-api-node. The network uses the same fuel-api-node to provide the information. The DX is the same in that case, deployment is more configurable. The problem is that most validators will be fuel-p2p-node, and only small parts of nodes will be fuel-api-node.

But if you are fuel-api-node, you are the entry point to the network, and you can decide which transactions to share and which to ignore=)

@Voxelot
Copy link
Member

Voxelot commented Sep 7, 2022

Connecting things under the hood between fuel-indexer and fuel-core is a good idea. I think the main architectural issue we have right now is that all the APIs for fuel-core are implemented inside of graphql resolvers.

Ideally, we move all the core impl logic inside the graphql resolvers into a Rust API directly accessible on FuelService. Then, suppose we wanted to make a fuel-api-node that embeds fuel-indexer and fuel-core into a single binary together. In that case, fuel-indexer could just accept a trait for all the fuel-core API's it needs (such as fetching blocks or receipts) which could either be backed by the fuel-gql-client (for remote access) or an actual FuelService instance (for embedded).

This also would allow us to explore supporting alternatives to graphql in the future, as the API handlers would just be shims over the Rust API.

As far as extracting graphql from fuel-core altogether into a separate fuel-api-node process, I think this is a big architectural pivot that would put our current roadmap at risk. If there are APIs we are concerned about supporting due to performance concerns, we should come up with ways to rate-limit or restructure them to be more efficient on the current fuel-core architecture for now. Once the indexer is closer to production ready, and we've delivered the bridging milestones, we can revisit these big ideas.

@xgreenx
Copy link
Collaborator Author

xgreenx commented Sep 8, 2022

Ideally, we move all the core impl logic inside the graphql resolvers into a Rust API directly accessible on FuelService. Then, suppose we wanted to make a fuel-api-node that embeds fuel-indexer and fuel-core into a single binary together. In that case, fuel-indexer could just accept a trait for all the fuel-core API's it needs (such as fetching blocks or receipts) which could either be backed by the fuel-gql-client (for remote access) or an actual FuelService instance (for embedded).

Yep, it is an excellent way to segregate GraphQL and code functionality=)

This also would allow us to explore supporting alternatives to graphql in the future, as the API handlers would just be shims over the Rust API.

Yep, we need to make GraphQL independent during segregation and act as a wrapper around the core functionality. In this case, it can be replaceable like the lego detail in the constructor=D

As far as extracting graphql from fuel-core altogether into a separate fuel-api-node process, I think this is a big architectural pivot that would put our current roadmap at risk.

Yea, it is what I don't like. But in the long perspective, it will simplify our development and support of new queries(and I hope it will save a lot of time).

Once the indexer is closer to production ready, and we've delivered the bridging milestones, we can revisit these big ideas.

I agree. For now, I think we can already create several tasks related to this big idea and start slowly doing them(based on free resources). For example:

  • Segregation of the GraphQL and core functionality. Also, seems it is actual for Support indexing blocks and transactions fuel-indexer#151
  • Also, each core functionality can be split by trait per module. For example, to provide the entire work with blocks, transactions, and receipts. In this case, the indexer can use only the required traits.
  • Make GraphQL service independent and support enabling/disabling(Maybe think about this feature for each service).

These tasks(and maybe you will see some more) can be helpful for the indexer team and us.

And as you said, when the indexer is almost production ready, we can think about the full transition to them and the creation of fuel-api-node.

If there are APIs we are concerned about supporting due to performance concerns, we should come up with ways to rate-limit or restructure them to be more efficient on the current fuel-core architecture for now.

We need to have count limitations for each query based on its performance. Maybe we also need to think about the framework to measure the gas of execution(like we want to have for VM operations)=) But again, we don't have time to do that, and maybe that idea can be useful for the indexer team.

The problem is that sometimes one request(as coinsToSpend) may consume a lot of time. So I see here several essential(for the stability of the node) tasks:

  • Each query that may receive or return a lot of entities should limit it(input and output result)
  • We need to think about data pruning. For example, the blockchain doesn't need to store information about spent Coins(maybe we have other entities that are dead). So we need to clean up the storage, and it will improve the performance of GraphQL queries. It may affect the merkle tree construction, so we need to be careful with this task.
  • While we support API to request data, we need rate limits. Maybe some queries require more strict rates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture Something related to the architecture or the architecture description itself. breaking A breaking api change documentation Improvements or additions to documentation graphql-api Affects API of the GraphQL
Projects
None yet
Development

No branches or pull requests

4 participants