Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R4R]txindex/kv: Fsync data to disk immediately after receiving it #4104

Merged
merged 4 commits into from
Nov 4, 2019

Conversation

unclezoro
Copy link
Contributor

@unclezoro unclezoro commented Nov 2, 2019

Why this pr:
When restarting chain node, sometimes we lost tx index about recent(around 80)blocks, and some client complains that they can't find the tx by RPC call(tx_search) when the tx do exist in the block.

I try to partially fix this issue in a simple way by writing the index data in a sync way.
There is no performance difference under 1K TPS according to our test.

It is still possible that lost index data after restarting the node, but only 2 block data will lost at most.
I try to totally fix this in https://github.com/tendermint/tendermint/pull/3847/files, but this one is simple and can solve most part of the issue. Please review first, thks.

@unclezoro

This comment has been minimized.

@@ -30,3 +30,4 @@ program](https://hackerone.com/tendermint).
### BUG FIXES:

- [tools] [\#4023](https://github.com/tendermint/tendermint/issues/4023) Refresh `tm-monitor` health when validator count is updated (@erikgrinaker)
- [index] [\#4104](https://github.com/tendermint/tendermint/pull/4104) Fix tx index lag too much
Copy link
Contributor

@melekes melekes Nov 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [index] [\#4104](https://github.com/tendermint/tendermint/pull/4104) Fix tx index lag too much
- [state] [\#4104](https://github.com/tendermint/tendermint/pull/4104) txindex/kv: Fsync data to disk immediately after receiving it (@guagualvcha)

@melekes
Copy link
Contributor

melekes commented Nov 3, 2019

BEFORE:

BenchmarkTxIndex1-2               100000             12434 ns/op
BenchmarkTxIndex500-2                300           5151564 ns/op
BenchmarkTxIndex1000-2               100          15053910 ns/op
BenchmarkTxIndex2000-2               100          18238892 ns/op
BenchmarkTxIndex10000-2               20         124287930 ns/op

AFTER:

BenchmarkTxIndex1-2                 2000            795431 ns/op
BenchmarkTxIndex500-2                200           6385124 ns/op
BenchmarkTxIndex1000-2               100          11388219 ns/op
BenchmarkTxIndex2000-2               100          20514873 ns/op
BenchmarkTxIndex10000-2               20         107456004 ns/op

Performance drop is pretty steep, but I think it's the right thing to do UNTIL we have a WAL.

@unclezoro unclezoro changed the title [R4R]fix tx index lag too much [R4R]txindex/kv: Fsync data to disk immediately after receiving it Nov 4, 2019
@melekes melekes merged commit 76deaa9 into tendermint:master Nov 4, 2019
@ebuchman
Copy link
Contributor

@melekes what is that benchmark on ? Individual Index calls or AddBatch calls? The former we don't seem to actually use other than in tests, the later I would expect much lesser performance hit from.

Regardless this is an important fix - thanks @guagualvcha . Seems also independent from #3211 which we'll need to address as well.

@melekes
Copy link
Contributor

melekes commented Nov 20, 2019

@melekes what is that benchmark on ? Individual Index calls or AddBatch calls?

AddBatch calls

func benchmarkTxIndex(txsCount int64, b *testing.B) {
dir, err := ioutil.TempDir("", "tx_index_db")
if err != nil {
b.Fatal(err)
}
defer os.RemoveAll(dir) // nolint: errcheck
store := db.NewDB("tx_index", "goleveldb", dir)
indexer := NewTxIndex(store)
batch := txindex.NewBatch(txsCount)
txIndex := uint32(0)
for i := int64(0); i < txsCount; i++ {
tx := cmn.RandBytes(250)
txResult := &types.TxResult{
Height: 1,
Index: txIndex,
Tx: tx,
Result: abci.ResponseDeliverTx{
Data: []byte{0},
Code: abci.CodeTypeOK,
Log: "",
Events: []abci.Event{},
},
}
if err := batch.Add(txResult); err != nil {
b.Fatal(err)
}
txIndex++
}
b.ResetTimer()
for n := 0; n < b.N; n++ {
err = indexer.AddBatch(batch)
}
if err != nil {
b.Fatal(err)
}
}

@ebuchman
Copy link
Contributor

Hmm. Is the first number in BenchmarkTxIndex2000-2 the number of txs? The cost is pretty high for the smaller nums but seems it's amortized for larger ones so maybe it's ok. Crazy that the cost is like 70x for the first one and otherwise only around ~1.2x or so for the others

@melekes
Copy link
Contributor

melekes commented Nov 20, 2019

Is the first number in BenchmarkTxIndex2000-2 the number of txs?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants