Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset to tendermint master #534

Merged
merged 732 commits into from
Sep 28, 2021
Merged

Reset to tendermint master #534

merged 732 commits into from
Sep 28, 2021

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented Sep 20, 2021

Description

This PR uses git merge -s ours to essentially reset this repo to tendermint master. This serves the same purpose as #527 but in a way less hacky way, and doesn't mess up the git blame.

Keeping as a draft for now, as we will be cherry-picking the parts of celestia-core that we want to keep into this PR, almost as a temporary master branch

closes: #538, #512, #505, #492
completes the majority of #528

alexanderbez and others added 30 commits July 6, 2021 14:26
Closes: #6661 

Note: see another error during the events indexing, guess the raw tx size exceeds the limitation?
```
3:17PM ERR failed to index block txs err="pq: index row size 2768 exceeds btree version 4 maximum 2704 for index \"tx_results_tx_result_key\"" height=5205112 module=txindex
## Description

remove pkg/errors since we use the provided fmt.Errorf
…t (#6637)

This commit extends the fix in #6518, so all other goroutine which run
concurrently with processBlockchainCh can safely send data to blockchain
out channel via a bridge channel. This helps eliminating all possible
data race with sending and closing blockchainCh.Out channel at the same
time.

Fixes #6516
…679)

Bumps [gaurav-nelson/github-action-markdown-link-check](https://github.com/gaurav-nelson/github-action-markdown-link-check) from 1.0.12 to 1.0.13.
- [Release notes](https://github.com/gaurav-nelson/github-action-markdown-link-check/releases)
- [Commits](gaurav-nelson/github-action-markdown-link-check@1.0.12...1.0.13)

---
updated-dependencies:
- dependency-name: gaurav-nelson/github-action-markdown-link-check
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
## Description

Run go mod tidy
Bumps [github.com/golangci/golangci-lint](https://github.com/golangci/golangci-lint) from 1.38.0 to 1.41.1.
- [Release notes](https://github.com/golangci/golangci-lint/releases)
- [Changelog](https://github.com/golangci/golangci-lint/blob/master/CHANGELOG.md)
- [Commits](golangci/golangci-lint@v1.38.0...v1.41.1)

---
updated-dependencies:
- dependency-name: github.com/golangci/golangci-lint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I put this error log in here because I thought it might be a helpful indicator to see when a reactor sends a message to a peer that doesn't have that channel open but it turns out this is happening all the time and it's kind of annoying
## Description

Move tools to subdir to fix `go get`
When a peer responds with no lightblock for the height we queried, we call the [removePeer method](https://github.com/tendermint/tendermint/blob/master/internal/statesync/reactor.go#L339). This removes the peer from the [dispatcher's list of called peer's](https://github.com/tendermint/tendermint/blob/ad6588315256162a9ef799a5a77ab7237cd46f38/internal/statesync/dispatcher.go#L159). When the dispatcher then receives responses from the removed peer, it [drops their responses](https://github.com/tendermint/tendermint/blob/ad6588315256162a9ef799a5a77ab7237cd46f38/internal/statesync/dispatcher.go#L130). These responses may be meaningful or contain a block or data that will help statesync proceed.

[The logs](https://gist.github.com/tychoish/34a1f61eaae3c36c23efc7d0001e805c), when this change is applied, show an additional 3 networking testnets passing. 

addresses:  #6691
* remove counter app

* remove unneeeded ci

* lint fix

* modify tx sizes

* cleanup docs

* Update abci/cmd/abci-cli/abci-cli.go

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

* Update docs/app-dev/getting-started.md

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

* Update docs/app-dev/getting-started.md

Co-authored-by: Callum Waters <cmwaters19@gmail.com>

* bring back comment

* migrate to kvstore and not persistent

* remove unused func

* test persistent

Co-authored-by: Callum Waters <cmwaters19@gmail.com>
…#6696)

* Revert "statesync: keep peer despite lightblock query fail (#6692)"

This reverts commit 50b00df.
* statesync: remove outgoing calls race condition
Bumps [github.com/google/uuid](https://github.com/google/uuid) from 1.2.0 to 1.3.0.
- [Release notes](https://github.com/google/uuid/releases)
- [Commits](google/uuid@v1.2.0...v1.3.0)

---
updated-dependencies:
- dependency-name: github.com/google/uuid
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@liamsi
Copy link
Member

liamsi commented Sep 28, 2021

So, this looks good to me. the only question I have is why does 1bb8ddc show up instead of: cf7537e / the changes of tendermint/tendermint#6975?

tychoish and others added 10 commits September 28, 2021 13:31
I believe this assertion is likely redundant given that we're checking the block apphash.
In the last run, there were two problems at the RPC layer returned
from light nodes' RPC end points. I think exercising the light client
proxy RPC system is something that can/should be done via unit
testing, and that likely these errors are (in production) transient
and (in CI) very likely to fail for test environment issues.
I've been noticing that there are a number of situations where the
statesync reactor blocks waiting for peers (or similar,) I've moved
things around to improve outcomes in local tests.
The main effect of this change is to flush the socket client and server message
encoding buffers immediately once the message is fully and correctly encoded.
This allows us to remove the timer and some other special cases, without
changing the observed behaviour of the system.

-- Background

The socket protocol client and server each use a buffered writer to encode
request and response messages onto the underlying connection. This reduces the
possibility of a single message being split across multiple writes, but has the
side-effect that a request may remain buffered for some time.

The implementation worked around this by keeping a ticker that occasionally
triggers a flush, and by flushing the writer in response to an explicit request
baked into the client/server protocol (see also #6994).

These workarounds are both unnecessary: Once a message has been dequeued for
sending and fully encoded in wire format, there is no real use keeping all or
part of it buffered locally.  Moreover, using an asynchronous process to flush
the buffer makes the round-trip performance of the request unpredictable.

-- Benchmarks

Code: https://play.golang.org/p/0ChUOxJOiHt

I found no pre-existing performance benchmarks to justify the flush pattern,
but a natural question is whether this will significantly harm client/server
performance.  To test this, I implemented a simple benchmark that transfers
randomly-sized byte buffers from a no-op "client" to a no-op "server" over a
Unix-domain socket, using a buffered writer, both with and without explicit
flushes after each write.

As the following data show, flushing every time (FLUSH=true) does reduce raw
throughput, but not by a significant amount except for very small request
sizes, where the transfer time is already trivial (1.9μs).  Given that the
client is calibrated for 1MiB transactions, the overhead is not meaningful.

The percentage in each section is the speedup for flushing only when the buffer
is full, relative to flushing every block.  The benchmark uses the default
buffer size (4096 bytes), which is the same value used by the socket client and
server implementation:

  FLUSH  NBLOCKS  MAX      AVG     TOTAL       ELAPSED       TIME/BLOCK
  false  3957471  512      255     1011165416  2.00018873s   505ns
  true   1068568  512      255     273064368   2.000217051s  1.871µs
                                                             (73%)

  false  536096   4096     2048    1098066401  2.000229108s  3.731µs
  true   477911   4096     2047    978746731   2.000177825s  4.185µs
                                                             (10.8%)

  false  124595   16384    8181    1019340160  2.000235086s  16.053µs
  true   120995   16384    8179    989703064   2.000329349s  16.532µs
                                                             (2.9%)

  false  2114     1048576  525693  1111316541  2.000479928s  946.3µs
  true   2083     1048576  526379  1096449173  2.001817137s  961.025µs
                                                             (1.5%)

Note also that the FLUSH=false baseline is actually faster than the production
code, which flushes more often than is required by the buffer filling up.

Moreover, the timer slows down the overall transaction rate of the client and
server, indepenedent of how fast the socket transfer is, so the loss on a real
workload is probably much less.
I observed a couple of problems with the generator in some recent tests:

- there were a couple of hybrid test cases which did not have any
  legacy nodes (randomness and all.) I change the probability to
  produce more reliable results.

- added options to the generation to be able to add a max (to
  compliment the earlier min) number of nodes for local testing.

- added an option to support reversing the sort order so "more
  complex" networks were first, as well as tweaked some of the point
  values.

- this refactored the generators cli parsing to be a bit more clear.
This reduces this situation where a node will get stuck block syncing,
which seemed to happen a lot in last nights run.
@evan-forbes evan-forbes force-pushed the evan/merge-theirs branch 2 times, most recently from 38c0c29 to 9fac0c3 Compare September 28, 2021 19:06
@lgtm-com
Copy link

lgtm-com bot commented Sep 28, 2021

This pull request fixes 1 alert when merging 4fe6fce into 488ac31 - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

@evan-forbes
Copy link
Member Author

evan-forbes commented Sep 28, 2021

So, this looks good to me. the only question I have is why does 1bb8ddc show up instead of: cf7537e / the changes of tendermint/tendermint#6975?

I ran go mod tidy after pulling changes from upstream, and then altered the title of the merge commit to reflect that. I think that messed github up, or made git think it was me introducing all of the changes instead of preserving the git history like a normal merge.

I have redone the most recent pull from upstream and not altered the title of the commit, which seems to fix it.

thanks again for catching that!

@lgtm-com
Copy link

lgtm-com bot commented Sep 28, 2021

This pull request fixes 1 alert when merging be3f371 into 488ac31 - view on LGTM.com

fixed alerts:

  • 1 for Use of insufficient randomness as the key of a cryptographic algorithm

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's merge this beast of a PR 💪

@evan-forbes evan-forbes merged commit cb696bb into master Sep 28, 2021
@liamsi liamsi deleted the evan/merge-theirs branch December 3, 2021 02:15
evan-forbes pushed a commit that referenced this pull request Jun 9, 2023
1. `InitialHeight` is specified twice
2. The contents of `Execute` refer to the parameter `state` not `s`

(cherry picked from commit 9c5158a)

Co-authored-by: Rootul P <rootulp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor DAH generation to give the user more control over what happens to the EDS