Skip to content
This repository has been archived by the owner on Nov 12, 2020. It is now read-only.

Design: Lightweight node with just a cache of the graph DB #169

Closed
ashwinphatak opened this issue Mar 9, 2020 · 24 comments
Closed

Design: Lightweight node with just a cache of the graph DB #169

ashwinphatak opened this issue Mar 9, 2020 · 24 comments
Labels

Comments

@ashwinphatak
Copy link
Contributor

  • For use on devices like the Pi, which may not have the computing or space requirements to run a full blockchain node.
  • No blockchain history
@ashwinphatak
Copy link
Contributor Author

devnet disk size after ~5 days of running:

ashwinp@xbox:~/.wireline/wnsd/data$ du -h
20K     ./evidence.db
98M     ./blockstore.db
24K     ./tx_index.db
414M    ./cs.wal
19M     ./application.db
14M     ./state.db
543M    .

@ashwinphatak
Copy link
Contributor Author

ashwinphatak commented Mar 9, 2020

@ashwinphatak
Copy link
Contributor Author

Other files: https://github.com/tendermint/tendermint/blob/master/docs/tendermint-core/running-in-production.md#database

blockstore.db: Keeps the entire blockchain - stores blocks, block commits, and block meta data, each indexed by height. Used to sync new peers.

state.db: Stores the current blockchain state (ie. height, validators, consensus params). Only grows if consensus params or validators change. Also used to temporarily store intermediate results during block processing.

@ashwinphatak
Copy link
Contributor Author

Current open issues for syncing state without full-replay:

Sync current state without full replay for Applications => tendermint/tendermint#828

ADR 053: State Sync Prototype

Prune blockchain history => tendermint/tendermint#3652

Node should go back to fast-syncing when lagging significantly => tendermint/tendermint#129

@ashwinphatak
Copy link
Contributor Author

ashwinphatak commented Mar 9, 2020

Based on the above, looks like syncing just app state, and doing it correctly, is still an unsolved problem. Also, to make it work we'll end up building something much like a full node.

Simpler approach: Build a REST API for a full-node to return the entire state (Record[], NamingRecord[]) as JSON (or binary) and cache it in the lightweight node (with periodic refresh) and import it into something like our existing in-mem WNS graphdb (currently used for tests, and currently lacking naming API).

This is inspired by the wnsd export command, which exports the state to a new genesis JSON file.

For efficiency reasons, we could index record entries by block height, and the 'refresh'/'sync' REST API call would take a block height, returning only records after block height.

@ashwinphatak
Copy link
Contributor Author

ashwinphatak commented Mar 9, 2020

https://docs.tendermint.com/master/rpc/#/Websocket/subscribe also exists, but since we'd have to decode the tx (filtering is on Event, full Tx isn't delivered, so we'd call another API to get the Tx bytes), might be more work than the above.

@ashwinphatak
Copy link
Contributor Author

@ashwinphatak
Copy link
Contributor Author

https://docs.cosmos.network/master/interfaces/lite/ -> could the lightweight node expose the current GQL API, using a light client to query a peer RPC endpoint?

@ashwinphatak
Copy link
Contributor Author

Looking the the cosmos-sdk code ~/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/client/context/query.go, it appears that only direct store/state queries support verification using merkel proofs.

cosmos/cosmos-sdk#5317 also mentions that queriers don't support proofs (/Users/ashwinp/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/baseapp/baseapp.go)

cosmos/cosmos-sdk#5317 (comment)

cosmos/cosmos-sdk#3151 (comment)

Furthermore, if we do complex things like scanning and filtering, we can't prove to the client that we (the node) filtered correctly - we could return results not matching the filter or fail to return results matching the filter

Given the above, we're better off starting the cache with a trusted initial data set and then pulling new changes from a full-node with light client verification.

https://docs.tendermint.com/master/tendermint-core/light-client-protocol.html

@ashwinphatak
Copy link
Contributor Author

https://pkg.go.dev/github.com/tendermint/tendermint/lite2?tab=doc

As per option 3, might be best to run a tendermint RPC proxy to a full-node, to perform all the proof checking.

https://docs.tendermint.com/master/tendermint-core/light-client-protocol.html#http-proxy

Need to init the RPC proxy with a trusted height and hash.

https://docs.tendermint.com/master/tendermint-core/light-client-protocol.html#where-to-obtain-trusted-height-hash

@ashwinphatak
Copy link
Contributor Author

@ashwinphatak
Copy link
Contributor Author

tendermint lite works for custom paths, but throws an error for the store:

ERROR: ABCIQuery: response error: RPC error -32603 - Internal error: unrecognized proof type iavl:v

This looks like the reason: https://github.com/tendermint/tendermint/blob/master/crypto/merkle/proof.go#L130

// DefaultProofRuntime only knows about Simple value
// proofs.
// To use e.g. IAVL proofs, register op-decoders as
// defined in the IAVL package.
func DefaultProofRuntime() (prt *ProofRuntime) {
	prt = NewProofRuntime()
	prt.RegisterOpDecoder(ProofOpSimpleValue, SimpleValueOpDecoder)
	return
}

@ashwinphatak
Copy link
Contributor Author

$ tendermint lite --chain-id wireline
I[2020-03-11|16:37:03.288] Connecting to source HTTP client...          module=main
I[2020-03-11|16:37:03.288] Constructing Verifier...                     module=main
I[2020-03-11|16:37:03.328] Starting proxy...                            module=main
I[2020-03-11|16:37:03.330] Starting RPC HTTP server on 127.0.0.1:8888   module=main
I[2020-03-11|16:37:21.684] HTTPRestRPC                                  module=main method=/abci_query args="[<*rpctypes.Context Value> /store/bond/key <[]uint8 Value>]" returns="[<*core_types.ResultABCIQuery Value> <error Value>]"
I[2020-03-11|16:37:21.684] Served RPC HTTP response                     module=main method=GET url="/abci_query?path=%22/store/bond/key%22&data=0x20cd5d8f6dd7e4e2904836b341f0b5b6de89be8b6b9247a95f9b26944b8fba9a" status=200 duration=3039 remoteAddr=127.0.0.1:62118
^CI[2020-03-11|16:39:21.742] captured interrupt, exiting...               module=main

@ashwinphatak
Copy link
Contributor Author

Could create a verifier ourselves using the approach in ~/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/client/context/context.go. Then, we don't need to run tendermint lite.

Note: Needs folder to store trust db.

@ashwinphatak
Copy link
Contributor Author

Querying by key returns a merkle proof.

			bondKey := append([]byte{0x00}, []byte(id)...)

			opts := rpcclient.ABCIQueryOptions{
				Prove: true,
			}

			rpc := rpcclient.NewHTTP("tcp://localhost:26657", "/websocket")
			res, err := rpc.ABCIQueryWithOptions("/store/bond/key", bondKey, opts)
			if err != nil {
				return err
			}

			// TODO(ashwin): Verify proof.

			var bond types.Bond
			cdc.MustUnmarshalBinaryBare(res.Response.Value, &bond)

			fmt.Println(string(cdc.MustMarshalJSON(bond)))

Listing a subspace doesn't yet return merkle proof in cosmos-sdk, so this will be a limitation for now.

			bondSpace := []byte{0x00}

			rpc := rpcclient.NewHTTP("tcp://localhost:26657", "/websocket")
			res, err := rpc.ABCIQuery("/store/bond/subspace", bondSpace)
			if err != nil {
				return err
			}

			var KVs []storeTypes.KVPair
			cdc.MustUnmarshalBinaryLengthPrefixed(res.Response.Value, &KVs)

			for _, kv := range KVs {
				var bond types.Bond
				cdc.MustUnmarshalBinaryBare(kv.Value, &bond)
				fmt.Println(string(cdc.MustMarshalJSON(bond)))
			}

~/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/store/iavl/store.go

// Query implements ABCI interface, allows queries
//
// by default we will return from (latest height -1),
// as we will have merkle proofs immediately (header height = data height + 1)
// If latest-1 is not present, use latest (which must be present)
// if you care to have the latest data to see a tx results, you must
// explicitly set the height you want to see
func (st *Store) Query(req abci.RequestQuery) (res abci.ResponseQuery) {
	if len(req.Data) == 0 {
		msg := "Query cannot be zero length"
		return serrors.ErrTxDecode(msg).QueryResult()
	}

	tree := st.tree

	// store the height we chose in the response, with 0 being changed to the
	// latest height
	res.Height = getHeight(tree, req)

	switch req.Path {
	case "/key": // get by key
		key := req.Data // data holds the key bytes

		res.Key = key
		if !st.VersionExists(res.Height) {
			res.Log = cmn.ErrorWrap(iavl.ErrVersionDoesNotExist, "").Error()
			break
		}

		if req.Prove {
			value, proof, err := tree.GetVersionedWithProof(key, res.Height)
			if err != nil {
				res.Log = err.Error()
				break
			}
			if proof == nil {
				// Proof == nil implies that the store is empty.
				if value != nil {
					panic("unexpected value for an empty proof")
				}
			}
			if value != nil {
				// value was found
				res.Value = value
				res.Proof = &merkle.Proof{Ops: []merkle.ProofOp{iavl.NewIAVLValueOp(key, proof).ProofOp()}}
			} else {
				// value wasn't found
				res.Value = nil
				res.Proof = &merkle.Proof{Ops: []merkle.ProofOp{iavl.NewIAVLAbsenceOp(key, proof).ProofOp()}}
			}
		} else {
			_, res.Value = tree.GetVersioned(key, res.Height)
		}

	case "/subspace":
		var KVs []types.KVPair

		subspace := req.Data
		res.Key = subspace

		iterator := types.KVStorePrefixIterator(st, subspace)
		for ; iterator.Valid(); iterator.Next() {
			KVs = append(KVs, types.KVPair{Key: iterator.Key(), Value: iterator.Value()})
		}

		iterator.Close()
		res.Value = cdc.MustMarshalBinaryLengthPrefixed(KVs)

	default:
		msg := fmt.Sprintf("Unexpected Query path: %v", req.Path)
		return serrors.ErrUnknownRequest(msg).QueryResult()
	}

	return
}

@ashwinphatak
Copy link
Contributor Author

The underlying IAVL tree supports proofs for ranges, so it might make its way into cosmos-sdk at some point of time.

https://github.com/tendermint/iavl/blob/1dc8bb7e1204900f086c5ab323ab0ab0c79a6c36/proof_range.go#L467

// GetRangeWithProof gets key/value pairs within the specified range and limit.
func (t *ImmutableTree) GetRangeWithProof(startKey []byte, endKey []byte, limit int) (keys, values [][]byte, proof *RangeProof, err error) {
	proof, keys, values, err = t.getRangeProof(startKey, endKey, limit)
	return
}

@ashwinphatak
Copy link
Contributor Author

If we create an index from block num => [] changed graphdb keys, we should be able to solve two important problems.

  1. As the index itself will be in the store, we can get a merkle proof for the complete set of changes, making it impossible for anyone to fake/drop items from the list of bonds, for example. Kind of like a range proof for each block.

  2. Indexing by block number will allow the light nodes to perform incremental sync, block by block, until they catch up.

@ashwinphatak
Copy link
Contributor Author

wns export with --height doesn't seem to work. Neither does halt-height argument to wnsd start. Needs investigation.

@ashwinphatak
Copy link
Contributor Author

abci_query with height < current height also seems to return no results. Pruning strategy also seems to be the default, which is syncable.

@ashwinphatak
Copy link
Contributor Author

Turns out the sdk-tutorials code on which wns is based doesn't actually set the pruning flag on the base app. Easy to fix.

@ashwinphatak
Copy link
Contributor Author

wnsd export --height 5
panic: app.baseKey expected to be nil; duplicate init?

goroutine 1 [running]:
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).initFromMainStore(0xc000160ea0, 0xc0000df800, 0x0, 0x0)
	/Users/ashwinp/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/baseapp/baseapp.go:236 +0x2a0
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).LoadVersion(0xc000160ea0, 0x5, 0xc0000df800, 0x4, 0xc0009ef678)
	/Users/ashwinp/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/baseapp/baseapp.go:214 +0x7d
github.com/wirelineio/wns.(*nameServiceApp).LoadHeight(...)
	/Users/ashwinp/projects/wireline/wns/app.go:347
main.exportAppStateAndTMValidators(0x22b2fe0, 0xc000c5a7a0, 0x22c63e0, 0xc0000c00c0, 0x0, 0x0, 0x5, 0xc000c5a700, 0x2b99ab0, 0x0, ...)
	/Users/ashwinp/projects/wireline/wns/cmd/wnsd/main.go:91 +0xee
github.com/cosmos/cosmos-sdk/server.ExportCmd.func1(0xc0009b9b80, 0xc000112bc0, 0x0, 0x2, 0x0, 0x0)
	/Users/ashwinp/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/server/export.go:65 +0x344
github.com/spf13/cobra.(*Command).execute(0xc0009b9b80, 0xc000112b80, 0x2, 0x2, 0xc0009b9b80, 0xc000112b80)
	/Users/ashwinp/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:826 +0x460
github.com/spf13/cobra.(*Command).ExecuteC(0xc000133400, 0x2, 0xc000112b00, 0x1ee449d)
	/Users/ashwinp/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
	/Users/ashwinp/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/tendermint/tendermint/libs/cli.Executor.Execute(0xc000133400, 0x20bf4a8, 0x2, 0xc0009c7080)
	/Users/ashwinp/go/pkg/mod/github.com/tendermint/tendermint@v0.32.2/libs/cli/setup.go:89 +0x3c
main.main()
	/Users/ashwinp/projects/wireline/wns/cmd/wnsd/main.go:75 +0x762

@ashwinphatak
Copy link
Contributor Author

Seems to be an existing issue: cosmos/cosmos-sdk#3147

Cherry pick fix from here: cosmos/cosmos-sdk#2791

@ashwinphatak ashwinphatak changed the title Lightweight node with just a cache of the graph DB Design: Lightweight node with just a cache of the graph DB Mar 12, 2020
@AFDudley
Copy link
Contributor

AFDudley commented Mar 12, 2020

Looking the the cosmos-sdk code ~/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/client/context/query.go, it appears that only direct store/state queries support verification using merkel proofs.

cosmos/cosmos-sdk#5317 also mentions that queriers don't support proofs (/Users/ashwinp/go/pkg/mod/github.com/cosmos/cosmos-sdk@v0.37.0/baseapp/baseapp.go)

cosmos/cosmos-sdk#5317 (comment)

cosmos/cosmos-sdk#3151 (comment)

Furthermore, if we do complex things like scanning and filtering, we can't prove to the client that we (the node) filtered correctly - we could return results not matching the filter or fail to return results matching the filter

It depends on how the filter and the app are constructed, this while technically true, makes it sound worse than it is.

Given the above, we're better off starting the cache with a trusted initial data set and then pulling new changes from a full-node with light client verification.

https://docs.tendermint.com/master/tendermint-core/light-client-protocol.html

Depending on exactly what's happening we can augment this with checkpoints.

EDIT: you figured out the checkpointing thing 😁

@ashwinphatak ashwinphatak removed their assignment Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants