-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(network): improve efficiency of known peers handling #2074
Conversation
@onur-ozkan @smk762 Have you already confirmed that #2073 is really fixed in this PR branch? |
It would be good to add a test for stats collection rpc (if possible) as a test is required for any hotfixes to avoid future regressions. |
|
As I remember, I modified this a bit in the original PR (a modified behaviour) ref. #1026 (comment) by adding reserved peer store. |
Signed-off-by: onur-ozkan <work@onurozkan.dev>
a6c0b9d
to
c16a9eb
Compare
AddReservedPeer
invoked
They don't need the update. Errors should disappear after a couple of seconds and not occur again. Are they continuously logging errors? |
The list populates early on, then previously connected nodes lose connection just after the 2 minute mark. peers.mp4stats collection interval is 10 seconds, as is the display of results on the right, using
to return the latest result for each registered node, sorted by name. Stats collection node in video is not using docker. All registered nodes are present in MM2.json |
Are you using this (c16a9eb) version? I test the following nodes multiple times (copied from https://github.com/smk762/kmd_ntx_stats_docker/blob/master/code/scripts/collect_seednode_stats.py#L48): declare -A nodes=(
['chmex_EU']='{"IP":"1.eu.seed.adex.dexstats.info","PeerID":"12D3KooWGP4ryfJHXjfnbXUWP6FJeDLiif8jMT8obQvCKMSPUB8X"}'
['chmex_NA']='{"IP":"1.na.seed.adex.dexstats.info","PeerID":"12D3KooWDNUgDwAAuJbyoS5DiRbhvMSwrUh1yepKsJH8URcFwPp3"}'
['chmex_SH']='{"IP":"1.sh.seed.adex.dexstats.info","PeerID":"12D3KooWE8Ju9SZyZrfkUgi25gFKv1Yc6zcQZ5GXtEged8rmLW3t"}'
['cipi_AR']='{"IP":"cipi_ar.cipig.net","PeerID":"12D3KooWMsfmq3bNNPZTr7HdhTQvxovuR1jo5qvM362VQZorTk3F"}'
['cipi_EU']='{"IP":"cipi_eu.cipig.net","PeerID":"12D3KooWBhGrTVfaK9v12eA3Et84Y8Bc6ixfZVVGShsad2GBWzm3"}'
['cipi_NA']='{"IP":"cipi_na.cipig.net","PeerID":"12D3KooWBoQYTPf4q2bnsw8fUA2LKoknccVLrAcF1caCa48ev8QU"}'
['caglarkaya_EU']='{"IP":"eu.caglarkaya.net","PeerID":"12D3KooWEg7MBp1P9k9rYVBcW5pa8tsHhyE5UuGAAerCARLzZBPn"}'
['computergenie_EU']='{"IP":"cgeu.computergenie.gay","PeerID":"12D3KooWGkPFi43Nq6cAcc3gib1iECZijnKZLgEf1q1MBRKLczJF"}'
['computergenie_NA']='{"IP":"cg.computergenie.gay","PeerID":"12D3KooWCJWT5PAG1jdYHyMnoDcxBKMpPrUVi9gwSvVLjLUGmtQg"}'
['dragonhound_AR']='{"IP":"ar.smk.dog","PeerID":"12D3KooWSUABQ2beSQW2nXLiqn4DtfXyqbJQDd2SvmgoVwXjrd9c"}'
['dragonhound_DEV']='{"IP":"dev.smk.dog","PeerID":"12D3KooWEnrvbqvtTowYMR8FnBeKtryTj9RcXGx8EPpFZHou2ruP"}'
['dragonhound_EU']='{"IP":"s7eu.smk.dog","PeerID":"12D3KooWDgFfyAzbuYNLMzMaZT9zBJX9EHd38XLQDRbNDYAYqMzd"}'
['dragonhound_NA']='{"IP":"s7na.smk.dog","PeerID":"12D3KooWSmizY35qrfwX8qsuo8H8qrrvDjXBTMRBfeYsRQoybHaA"}'
['fediakash_AR']='{"IP":"fediakash.mooo.com","PeerID":"12D3KooWCSidNncnbDXrX5G6uWdFdCBrMpaCAqtNxSyfUcZgwF7t"}'
['gcharang_DEV']='{"IP":"mm-dev.lordofthechains.com","PeerID":"12D3KooWMEwnQMPUHcGw65xMmhs1Aoc8WSEfCqTa9fFx2Y3PM9xg"}'
['gcharang_SH']='{"IP":"mm-sh.lordofthechains.com","PeerID":"12D3KooWHAk9eJ78pwbopZMeHMhCEhXbph3CJ8Hbz5L1KWTmPf8C"}'
['gcharang_AR']='{"IP":"mm-ar.lordofthechains.com","PeerID":"12D3KooWDsFMoRoL5A4ii3UonuQZ9Ti2hrc7PpytRrct2Fg8GRq9"}'
['mcrypt_SH']='{"IP":"mcrypt2.v6.rocks","PeerID":"12D3KooWCDAPYXtNzC3x9kYuZySSf1WtxjGgasxapHEdFWs8Bep3"}'
['nodeone_NA']='{"IP":"nodeone.computergenie.gay","PeerID":"12D3KooWBTNDr6ih5efzVSxXtDv9wcVxHNj8RCvUnpKfKb6eUYet"}'
['sheeba_SH']='{"IP":"sheeba.computergenie.gay","PeerID":"12D3KooWC1P69a5TwpNisZYBXRgkrJDjGfn4QZ2L4nHZDGjcdR2N"}'
['smdmitry_AR']='{"IP":"mm2-smdmitry-ar.smdmitry.com","PeerID":"12D3KooWJ3dEWK7ym1uwc5SmwbmfFSRmELrA9aPJYxFRrQCCNdwF"}'
['smdmitry2_AR']='{"IP":"mm2-smdmitry2-ar.smdmitry.com","PeerID":"12D3KooWEpiMuCc47cYUXiLY5LcEEesREUNpZXF6KZA8jmFgxAeE"}'
['smdmitry_EU']='{"IP":"mm2-smdmitry-eu.smdmitry.com","PeerID":"12D3KooWJTYiU9CqVyycpMnGC96WyP1GE62Ng5g93AUe9wRx5g7W"}'
['smdmitry_SH']='{"IP":"mm2-smdmitry-sh.smdmitry.com","PeerID":"12D3KooWQP7PNNX5DSyhPX5igPQKQhet4KX7YaDqiGuNnarr4vRX"}'
['strob_SH']='{"IP":"sh.strobfx.com","PeerID":"12D3KooWFY5TmKpusUJ3jJBYK4va8xQchnJ6yyxCD7wZ2pWVK23p"}'
['tonyl_AR']='{"IP":"ar.farting.pro","PeerID":"12D3KooWEMTeavnNtPPYr1u4aPFB6U39kdMD32SU1EpHGWqMpUJk"}'
['tonyl_DEV']='{"IP":"dev.farting.pro","PeerID":"12D3KooWDubAUWDP2PgUXHjEdN3SGnkszcyUgahALFvaxgp9Jcyt"}'
['van_EU']='{"IP":"van.computergenie.gay","PeerID":"12D3KooWMX4hEznkanh4bTShzCZNx8JJkvGLETYtdVw8CWSaTUfQ"}'
['webworker01_EU']='{"IP":"eu2.webworker.sh","PeerID":"12D3KooWGF5siktvWLtXoRKgbzPYHn4rib9Fu8HHJEECRcNbNoAs"}'
['webworker01_NA']='{"IP":"na2.webworker.sh","PeerID":"12D3KooWRiv4gFUUSy2772YTagkZYdVkjLwiXkdcrtDQQuEqQaJ9"}'
['who-biz_NA']='{"IP":"adex.blur.cash","PeerID":"12D3KooWQp97gsRE5LbcUPjZcP7N6qqk2YbxJmPRUDeKVM5tbcQH"}'
) DNS resolution doesn't work on some of them (an entirely different issue not related to us). For the other nodes, I can add them to the mm2 and start collecting version stats. These are the ones consistently failing with every request: 28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node smdmitry2_AR responded to version request with error: Error on request the peer PeerId("12D3KooWEpiMuCc47cYUXiLY5LcEEesREUNpZXF6KZA8jmFgxAeE"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node tonyl_AR responded to version request with error: Error on request the peer PeerId("12D3KooWEMTeavnNtPPYr1u4aPFB6U39kdMD32SU1EpHGWqMpUJk"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node tonyl_DEV responded to version request with error: Error on request the peer PeerId("12D3KooWDubAUWDP2PgUXHjEdN3SGnkszcyUgahALFvaxgp9Jcyt"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node gcharang_AR responded to version request with error: Error on request the peer PeerId("12D3KooWDsFMoRoL5A4ii3UonuQZ9Ti2hrc7PpytRrct2Fg8GRq9"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node gcharang_DEV responded to version request with error: Error on request the peer PeerId("12D3KooWMEwnQMPUHcGw65xMmhs1Aoc8WSEfCqTa9fFx2Y3PM9xg"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node who-biz_NA responded to version request with error: Error on request the peer PeerId("12D3KooWQp97gsRE5LbcUPjZcP7N6qqk2YbxJmPRUDeKVM5tbcQH"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node chmex_EU responded to version request with error: Error on request the peer PeerId("12D3KooWGP4ryfJHXjfnbXUWP6FJeDLiif8jMT8obQvCKMSPUB8X"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node gcharang_SH responded to version request with error: Error on request the peer PeerId("12D3KooWHAk9eJ78pwbopZMeHMhCEhXbph3CJ8Hbz5L1KWTmPf8C"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node caglarkaya_EU responded to version request with error: Error on request the peer PeerId("12D3KooWEg7MBp1P9k9rYVBcW5pa8tsHhyE5UuGAAerCARLzZBPn"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node chmex_SH responded to version request with error: Error on request the peer PeerId("12D3KooWE8Ju9SZyZrfkUgi25gFKv1Yc6zcQZ5GXtEged8rmLW3t"): "Canceled". Request next peer
28 06:43:22, mm2_main::mm2::lp_stats:343] ERROR Node chmex_NA responded to version request with error: Error on request the peer PeerId("12D3KooWDNUgDwAAuJbyoS5DiRbhvMSwrUh1yepKsJH8URcFwPp3"): "Canceled". Request next peer It's highly likely that there is an issue with their deployment. I even excluded all other nodes and tried this failing nodes alone couple times: ['gcharang_AR']='{"IP":"mm-ar.lordofthechains.com","PeerID":"12D3KooWDsFMoRoL5A4ii3UonuQZ9Ti2hrc7PpytRrct2Fg8GRq9"}'
['chmex_NA']='{"IP":"1.na.seed.adex.dexstats.info","PeerID":"12D3KooWDNUgDwAAuJbyoS5DiRbhvMSwrUh1yepKsJH8URcFwPp3"}'
['tonyl_DEV']='{"IP":"dev.farting.pro","PeerID":"12D3KooWDubAUWDP2PgUXHjEdN3SGnkszcyUgahALFvaxgp9Jcyt"}'
['who-biz_NA']='{"IP":"adex.blur.cash","PeerID":"12D3KooWQp97gsRE5LbcUPjZcP7N6qqk2YbxJmPRUDeKVM5tbcQH"}'
['chmex_EU']='{"IP":"1.eu.seed.adex.dexstats.info","PeerID":"12D3KooWGP4ryfJHXjfnbXUWP6FJeDLiif8jMT8obQvCKMSPUB8X"}'
['gcharang_SH']='{"IP":"mm-sh.lordofthechains.com","PeerID":"12D3KooWHAk9eJ78pwbopZMeHMhCEhXbph3CJ8Hbz5L1KWTmPf8C"}'
['smdmitry2_AR']='{"IP":"mm2-smdmitry2-ar.smdmitry.com","PeerID":"12D3KooWEpiMuCc47cYUXiLY5LcEEesREUNpZXF6KZA8jmFgxAeE"}'
['tonyl_AR']='{"IP":"ar.farting.pro","PeerID":"12D3KooWEMTeavnNtPPYr1u4aPFB6U39kdMD32SU1EpHGWqMpUJk"}'
['gcharang_DEV']='{"IP":"mm-dev.lordofthechains.com","PeerID":"12D3KooWMEwnQMPUHcGw65xMmhs1Aoc8WSEfCqTa9fFx2Y3PM9xg"}'
['caglarkaya_EU']='{"IP":"eu.caglarkaya.net","PeerID":"12D3KooWEg7MBp1P9k9rYVBcW5pa8tsHhyE5UuGAAerCARLzZBPn"}'
['chmex_SH']='{"IP":"1.sh.seed.adex.dexstats.info","PeerID":"12D3KooWE8Ju9SZyZrfkUgi25gFKv1Yc6zcQZ5GXtEged8rmLW3t"}' But we can't connect to them; therefore, requests fail. For the rest nodes, it works flawlessly all the time. Please note that during the initial request, there is a chance that some requests might fail for a few seconds as connection dials are just starting up. However, this is temporary and only lasts for a very short time frame (around 3-5 seconds). |
This isn't needed btw. |
Confirmed, recent video below. seednodes param removed from MM2.json. Running on server in CLI to eliminate any "bad connection" or docker effects. nodes respond at first, then just after the 3 min mark, most of them drop. seednodes.mp4I'll extend the query time to a minute (was 10 sec) and retry. |
I did a test with 1 second query interval and run it for an hour, didn't see any problem such as dropping connections or so. |
same drop after 3 min at 1 min intervals 3mins.mp4Perhaps I'm getting issues due to a larger set of registered nodes? I tried 1 sec intervals and it seemed to last longer, perhaps like a "keepalive" at that freq - tho it will result in a significantly large MM2.db over the course of a notary season. "Real world" conditions, I'd be running on a 15 min loop to check 4 times an hour to confirm a notary is eligible for a score for that hour by returning the correct version. Is there a time period after which connected peers will update / change? Here's my list of nodes for easier import:
and the import shell script:
|
I don't think so. I also did 10 seconds test with
and non of them failed. Maybe try only testing these nodes with 10 seconds interval and see if you get any trouble? I started thinking that you have some connection issues in the server. If you can, please also remove the database before the test. |
Most errors look like they are due to |
I setup everything on a different server for a comparison test. Initially, with only the 9 hardcoded seednodes registered, everything was responding. I registered the 30 notary seeds (in same session), and strangely, none of the new ones would respond, and only 2/9 hardcoded seeds were still responding. I restarted with a reduced set of 26 nodes I know have responded at least once during testing, as below:
After initially populating the fresh DB, only 2 nodes which would be expected to be responsive were returning error. A couple of minutes later, 2 more that were responding initially were returning an error (4 total). After a couple more minutes, only 2 nodes were still returning a successful response. After 5-10 minutes, this remained unchanged - the same 2 nodes were still responding, the rest were still not. I still believe the number of registered nodes could be a factor, and I dont think exclusively testing the hardcoded seeds is an ideal test case in relation to the intended use case of this method. |
Signed-off-by: onur-ozkan <work@onurozkan.dev>
AddReservedPeer
invoked
064ee80 should greatly improve peer connection handling that you shouldn't even notice the connection drops (as they will immediately reconnect), even if the server has a slow connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Confirm that with the latest commit, all 26 registered nodes which are properly configured are returning a successful response within 1 minute of stats loop starting, and the connections remain persistent, returning a successful response without issue after an hour.
* dev: feat(indexeddb): advanced cursor filtering impl (KomodoPlatform#2066) update dockerhub destination repository (KomodoPlatform#2082) feat(event streaming): configurable worker path, use SharedWorker (KomodoPlatform#2080) fix(hd_tests): fix test_hd_utxo_tx_history unit test (KomodoPlatform#2078) feat(network): improve efficiency of known peers handling (KomodoPlatform#2074) feat(nft): enable eth with non fungible tokens (KomodoPlatform#2049) feat(ETH transport & heartbeats): various enhancements/features (KomodoPlatform#2058)
Fixes #2073
cc @KomodoPlatform/qa