Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poll all nodes, remove outliers, ddos protection & amend RPC response with endpoint #2521

Merged
merged 15 commits into from
Feb 10, 2020

Conversation

wezrule
Copy link
Contributor

@wezrule wezrule commented Jan 29, 2020

This provides a set of enhancements for node telemetry:
1 - Remove outliers from consolidated data. 10% of the lower and upper bounds are removed
2 - Get metrics for all nodes instead of a random selection.
3 - Cache for 60 seconds on live, 15 seconds on beta. Nodes will reject telemetry_req messages if they are received from the same peer within this timeframe (+ the alarm cutoff tolerance).
4 - "cached" removed from RPC command
5 - "address" & "port" added to the RPC response when using "raw" to be able to match the metrics.
6 - "timestamp" added when using "raw" or single requests so that the data can be used accurately by services.

Other PRs will follow for some other requests

@wezrule wezrule added quality improvements This item indicates the need for or supplies changes that improve maintainability beta testing wanted telemetry labels Jan 29, 2020
@wezrule wezrule added this to the Research for Future Release milestone Jan 29, 2020
@wezrule wezrule self-assigned this Jan 29, 2020
@zhyatt zhyatt added the documentation This item indicates the need for or supplies updated or expanded documentation label Feb 3, 2020
@wezrule wezrule force-pushed the node_telemetry_improvements branch from d20c855 to 6e95e1d Compare February 4, 2020 10:44
nano/node/common.cpp Outdated Show resolved Hide resolved
nano/node/json_handler.cpp Outdated Show resolved Hide resolved
nano/node/json_handler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@guilhermelawless guilhermelawless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

SergiySW
SergiySW previously approved these changes Feb 7, 2020
@wezrule wezrule force-pushed the node_telemetry_improvements branch from 755d9a4 to 68c739c Compare February 10, 2020 09:48
@wezrule wezrule merged commit bd93581 into nanocurrency:develop Feb 10, 2020
@wezrule wezrule deleted the node_telemetry_improvements branch February 10, 2020 15:39
argakiig pushed a commit that referenced this pull request Jun 16, 2020
* Update beta network bootstrap weights for v21, cutoff 7M (#2537)

* Cache hash for multiple block->hash () calls (#2536)

* Remove representatives with closed channels (#2530)

* ASAN error with database transaction tracker json serialization (#2538)

* Lower beta network work threshold to 1/64x base (#2540)

* Bounded memory and redesign in the confirmation height processor (#2531)

* Bounded memory and redesign in the confirmation height processor

* Disable frontiers confirmation for test, block doesn't exist during callback in fork resolution

* Fix rpc.confirmation_height_currently_processing test

* Cement blocks below receives not above

* Fixes gcc build (hopefully)

* Store start and end hash in pending write to remove extraneous IO

* Optimise for the case where the top hash is 2 above cemented frontier

* Fix TSAN issues

* Serg comments

* Use cached genesis_hash in CLI --confirmation_height_clear option

* Set accounts_confirmed_info_size to 0 when clearing

* Remove const for prepare_iterated_blocks_for_cementing as confusing

* State blocks sideband upgrade adding is_send/receive/epoch (#2545)

This changes the epoch byte in the sideband to also store these flags in the 3 most significant bits. These can be used to avoid grabbing the previous block to check its balance in some situations after ledger processing.

The upgrade is done in-place, but takes a long time due to having to a random read to retrieve the previous block. A vacuum is done at the end. During the upgrade the ledger grows to almost 2x the ledger size, and up to 3x is required to vacuum as well.

Due to using the most significant bits, the upgrade can safely be stopped and restarted (from the beginning), and no additional versioning was needed.

* Handle legacy confirm_req using the aggregator (#2541)

* Handle legacy confirm_req using the aggregator

* Fix test node.local_votes_cache

* Adjust test to make sure the cache is used

* Use system polls instead

* Poll all nodes, remove outliers, ddos protection & amend RPC response with endpoint (#2521)

* Poll all nodes and remove some metrics from bounds when consolidating

* Update out of date comment

* Formatting

* Fix clang build on actions with long std::tuple

* Allow square brackets in ipv6 address in RPC

* Merge with develop

* Fix ASAN issue

* Gui review comments

* Make parse_address accept v4 and v6 ip addresses

* Incorrect order of arguments

* Use new cached genesis hash

* Move last_telemetry_req to bootstrap_server

* BOOST_1.69 (#2547)

* Probabilistic network packet filter (#2543)

* Extract stream.hpp from blocks.hpp

* Add a probabilistic filter based on direct mapped caches, keyed by 128-bit SipHash.

Co-Authored-By: Colin LeMahieu <clemahieu@gmail.com>

The filter receives an array of bytes representing a network packet, and is checked for duplicity of the packet. The probability of a false duplicate is marginal, but not zero, and decreases with the size of the filter. The probability of a false non-duplicate is the infinitesimal probability of a 128-bit SipHash collision.

Items are not normally erased. Instead, if a new item is different from the one at the insertion index (digest % capacity), the old item is replaced. There is also a function to erase an element from the filter, if the digest matches it.

The filter state is protected by a mutex, whereas hashing is performed while not holding it.

Uses 1MB of memory for every 64k elements.

* Remove explicit instantiations

* Remove alias

* Optionally set digest in ::apply and add ::clear method directly from a digest

* Encapsulating election::confirmed so it's implementation can vary.
Updating relevant tests.

* Add wallet-processed block to work watcher via the block processor (#2548)

* Add wallet-processed node to work watcher via the block processor

* Remove active size check as the request loop may drop one before the check

* guard policy setting by version checks (#2550)

* guard policy setting by version checks greater or equal
short explainations of policies being set

* Restore max block processor signature verification batch size (#2546)

* Restore max block processor signature verification batch size

Removed in https://github.com/nanocurrency/nano-node/pull/2279

* Use assert to validate if else condition

* [ASAN] Access node through a weak_ptr on distributed_work dtor (#2554)

* [ASAN] Access node through a weak_ptr on distributed_work dtor

Since there is nothing to force a distributed_work to get destroyed before the node stops, this is a safer way to access the node on the destructor. Fixes some ASAN issues.

* Add a comment explaining the use of node_w

* [ASAN] fix issue in rpc.work_peer_many

* Return created election when adding to active_transactions (#2551)

* This changes the signature of active_transactions::start to match STL container ::insert signatures by returning a pointer to the newly created election and also a bool whether the election creation took place.
This allows the call site to retrieve the election that was inserted and determine if it was the initiator of the election creation.
Also updated tests that were taking extra steps to retrieve the newly created election.

* Renaming active_transactions::start/add to insert/insert_impl to more closely match the behavior that's expected.

* Separate inactive votes cache from gap cache (#2542)

* Separate inactive votes cache from gap cache
* Move functional changes to new PR

* Inactive votes cache confirmation status (#2553)

* Adding separate confirmation status / bootstrap status for inactive votes cache (it can be different with disabled lazy bootstrap using config bootstrap_fraction_numerator)
* Apply reviews
* Inactive vote cache item can be confirmed with single vote in tests
* Add test to prevent confirmation without quorum

* Add launch flag --disable_block_processor_unchecked_deletion (#2557)

* Disabled test on actions (#2561)

* Add a fake websocket client to test the node websocket server (#2562)

This client keeps connections alive and runs in a separate thread

* Incremental options for ws confirmation subscription (#2566)

Possibility to add or remove accounts in an existing subscription. This is useful for external wallets that can't use the all_local_accounts flag.

* Fix election calling confirm_if_quorum after destruction (#2563)

When insert_inactive_votes_cache confirms the election it deletes itself from the active roots, causing the next confirm_if_quorum call to access freed memory.

* Update --debug_profile_process CLI test (#2564)

* Update --debug_profile_process CLI test
* fix blocks signatures
* support daemon launch options for profiling
* use block count cache

* Apply reviews changes

* Request telemetry data for local node (#2560)

* Disabled UDP by default (#2555)

* Use only the necessary protocols in UPnP (#2571)

* Do not request UPnP for UDP when disabled

* (unrelated) print endline on flag errors

* Buffer drop policies (#2559)

* Buffer drop policies

* Clarify enum comment

* Add GSL-style narrow_cast (#2567)

* Move back timer comments to header (#2572)

* Work version concept (#2569)

* Work version concept

Adds nano::work_version which is passed to all work_generate and work_validate methods (except for tests, ensured with an assert).

The functional changes are quite minimal, as every caller defaults to the only work version available (work_1):

- RPC work_generate, work_validate and block_create can receive an optional "work_version"
- The work websocket outputs the work version

These changes constitute the pipeline necessary for conditional work validation.

* Use nano::to_string to match the work version text

* Work version concept (#2569)

Adds nano::work_version which is passed to all work_generate and work_validate methods (except for tests, ensured with an assert).

The functional changes are quite minimal, as every caller defaults to the only work version available (work_1):

- RPC work_generate, work_validate and block_create can receive an optional "work_version"
- The work websocket outputs the work version

These changes constitute the pipeline necessary for conditional work validation.

* Use a different confirmation height algorithm when ledger is almost fully cemented (#2544)

* Confirmation height algorithm for almost fully cemented ledgers

* Rename variable to make intent clearer

* Count on no confirmation height changes in database outside processor

* Fix timing issue with existing test

* Unnecessary to list confirmation height as a non-locked table in the blockprocessor

* Stein review comments

* Update member order in confirmation_height_processor

* Serg review - TSAN error on initialization, add latch

* Make sure writes are completed before trying to change to another processor

* Add bounded/unbounded counts to stats and check in tests. Add new test too

* Formatting

* Readd conf height as a non-locked table in the blockprocessor needed with RocksDB

* Fix intermittent rpc.confirmation_height_currently_processing test failure

* Update --debug_profile_bootstrap CLI test (#2556)

* Update debug_profile_bootstrap CLI test
* Fix block count check
* Improve logging
* Use cached block counts
* Apply reviews changes

* Add tests for vote_processor (#2574)

* Fix intermittent node telemetry test failures (#2576)

* IPC 2.0 (#2487)

* IPC 2.0

* Use nano::locked for confirmation subscriber vector

* Remove unused local

* Update toml tests, fix some const issues

* Some access permission improvements

* Comments and nano::locked improvements

* Guilherme review feedback: formatting, advanced cmake flags, disallow deny for roles

* Wesley feedback

* Try to please win build on CI

* Add generated file to git ignore

* install api/flatbuffers/* depending on platform

* correct path for api in MacOS app

* add api to docker image

Co-authored-by: Russel Waters <vaelstrom@gmail.com>

* Log stable filename (#2534)

* Support stable log filename on rotation

* Add to toml test

* warnings: Remove some unused locals and captures (#2583)

* tsan: race in telemetry::ongoing_single_request_cleanup (#2584)

* Check against op aborted on secure rpc acceptor shutdown (#2582)

* [TSAN] confirmation_height.cemented_gap_below_receive test (#2586)

* [TSAN] vote_processor.flush test (#2587)

* friendly backtraces in actions (#2591)

* symlink to backtrace.h in dockerfile
Add defines for BOOST_STACKTRACE_BACKTRACE_INCLUDE_FILE
update testing scripts to handle clang linking to gcc backtrace.h
symlinked location
Continue on Error in ps1 scripts

* use boost 1.70 for clang to allow for

`BOOST_STACKTRACE_BACKTRACE_INCLUDE_FILE` definition
revert win actions guards in assert_internal

* Output stacktrace with custom debug assert (#2568)

* Output stacktrace with debug assert

* Error on CI if assert is used

* Remove <cassert> and add debug_assert to platform specific dirs

* Fix CI regular expression

* Missed updating assert for newly added files

* Remove vote cache & generate new vote if election winner is changed (#2585)

* Improve confirmation_solicitor.batches test (#2580)

* prevent self connection
* enable UDP
* add missing deadline

* Add timestamp to telemetry responses (#2573)

* [RocksDB] Limit write locks to necessary tables (#2592)

* Reworking confirmation_height.dependent_election test so it's less prone to race conditions. (#2589)

* Remove dropped_election_cache in preparation for election refactor. (#2590)

* Remove dropped_election_cache in preparation for election refactor.
* Removing unused parameter.

* Fix rpc telemetry test timestamps (#2595)

* Configurable inactive votes cache size (#2579)

* Use C++17 locally, C++14 on CI (#2597)

Co-authored-by: Russel Waters <russel@nano.org>

Co-authored-by: Russel Waters <vaelstrom@gmail.com>

* Block work version (#2599)

As of this change, work version refers to different work algorithms. Each block type can unequivocally return its work version. A new algorithm will likely need a new state block definition, to avoid ambiguity, although it could be inferred by trying to validate with both algorithms and then setting the version to the one that was valid (if any).

This change allowed removing work_version as a parameter for some work_validate and work_generate overloads. The work watcher can also infer the version from the block it is watching.

* Minimize work validation calls (#2577)

With this PR there should only be one work_validate call per block until it is processed (then more are done to get the difficulty, a future PR will also remove the need for those).

* Start vote generator for changed winner only if voting is enabled (#2593)

* Reduce active mutex locking with election winner details (#2600)

* Reduce active mutex locking with election winner details

* Make sure root is removed before accessing election

* Serg comment on existing code

* Add flag for vote_processor capacity, and tests (#2575)

Slightly limited in testing due to being unable to lock the vote processor loop (on the other hand, the block processor can be paused by starting a write transaction).

- Add node flag vote_processor_capacity, similar to the block processor flags
- Remove check to always process for the test network, the capacity is more than enough
- The representative levels stay the same but are now calculated on the fly from the given capacity
- ::vote() now returns a boolean representing if the vote was processed, both for testing purposes and will be useful for the upcoming network filter

* Don't use active transactions mutex lock during confirmation solicitor.prepare () (#2602)

* Aggressive flooding for local blocks (#2549)

* Aggressive flooding for local blocks

Floods locally produced blocks (going through the work watcher, or process_local - wallet, RPC process) such that they are sent to all PRs and random subset of other peers.

This allows more protection against Sybil attacks, and more streamlined elections as there are less occasions of votes arriving before blocks.

As a consequence, the average amount of echoes will increase by 1 for PRs. However, with this change in place and used by most of the network, a future enhancement would be to reduce the republishing fanout.

This change is based on a proposal by @Srayman for more information see the original proposal at https://medium.com/nanocurrency/proposal-for-nano-node-network-optimizations-21003e79cdba

* Add flag to disable republishing in block processor and a test for aggressive flooding

* Adjust timings for sanitizer builds

* Wrap in a lambda to simplify test, and adjust it

* Block difficulty and work validation cleanup (#2601)

Blocks now have a method to calculate and return the difficulty based on their work value, work version, and root. Work validation has been cleaned up, removing optional output difficulty from all methods, and with new work_difficulty and work_threshold methods. These are now used where they make sense, such as when previously work_validate was done even though we only wanted to compare the resulting difficulty. Functionally the same, but much less verbose.

Without changing the interface, we can decide to cache the difficulty in the block in the future, if it ever shows up in profiling.

Further changes will be required when a new work version is added, especially if the proof is larger than 8 bytes, but these changes are a step forward in that directtion.

* Bootstrap attempts and connections/pulls separation (#2499)

- New class `nano::bootstrap_connections` to manage client connections & bulk pulls
- Separate source file for connections
- Restored bootstrap clients list (to close connections with `stop` command)
- Parent class `nano::bootstrap_attempt` & child classes for legacy, lazy & wallet bootstraps
- Separate source files for bootstrap attempts & lazy attempts
- Allowing several concurrent bootstrap attempts (currently for different bootstrap modes, in the future can be easily modified to allow same concurrent modes)
- Separate `nano::bootstrap_attempts` class for fast attempt search with incremental ID
- Bulk pull info is modified to include bootstrap attempt incremental ID
- `force` option in bootstrap RPCs & RPC "bootstrap" are currenlty designed to close all concurrent attempts
- Config field `bootstrap_initiator_threads` to manage bootstrap concurrency. Default 2 for multithreaded systems, 1 for singlethreaded & test network
- RPC "bootstrap_status" is modified to show all bootstrap attempts & new connections class
- Reduced lazy bootstrap attempts memory consumption: processed blocks unordered map is modified to store only 64 bit hash of block instead of full 256 bit union
- Fixed `websocket.bootstrap_exited` test TSAN warnings

* Utility: nano::optional_ptr (#2605)

* optional_ptr

* Implement feedback from Wesley/Guilherme

* CMakeLists backwards compatibility (#2607)

use version checks supported on older cmakes
use atleast 3.11 FindBoost.cmake as import targets are not properly
exposed on previous versions

* Lock before stopping when it is necessary to notify other threads (#2608)

This prevents having all threads waiting and the system freezing. Mostly important for tests, especially with sanitizers, but could happen on stopping the node normally.

Portmapping requires an atomic bool, and the database queue doesn't after this change.

Other small related bootstrap changes included in this PR per Serg's suggestion.

* Simplify telemetry data processing (#2598)

* Simplify telemetry data processing

* Stein review comments

* Fix network.replace_port

* Launch flag --allow_bootstrap_peers_duplicates (#2606)

Allowing multiple connections to same bootstrap server (useful in tests or for fastest beta nodes sync)

* Fix intermittent send_node_id_handshake unit test failures (#2612)

* Update preconfigured_peers comment regarding port (#2616)

* update url for sourceforge (#2617)

* Election state refactor (#2535)

Converting the election class into a state machine instead of working on rebroadcast iterations.

* Allow CLI --config values for inactive node tests (#2594)

* Attach sideband to block (#2596)

* Move epoch from secure to nano_lib

* Attach sideband to block and always (de)serialize along with it

Using the sideband is only valid for blocks that were processed with code `progress`, otherwise they may not be set (important examples: old, fork).

* Make sideband optional to ensure usage correctness

* Interim

* Use new nano::optional_ptr to hold the sideband

* Adjust some tests to ensure two nodes don't simultaneously process the same block object

* Update comment

* Unchecked deletion (#2609)

* Host qt assets for windows on s3 (#2622)

* [RocksDB] Tests not reading account count from store correctly (#2623)

* Fix confirmation_height.gap_live intermittent test failure (#2621)

* Simplify request aggregator mutex lock behavior (#2614)

* Use the sideband when available in ledger.is_send (#2620)

* confirmation_height.modified_chain test fails on a non-debug build (#2624)

* tsan fix: rpc.wallet_destroy (#2615)

* Improve --debug_profile_bootstrap performance (#2626)

block sideband is not required for some CLI tests

* Election refactor follow up (#2619)

* active_transactions include cleanup

* Allocate a new solicitor on every confirmation request loop

* Increment election confirmation_request_count

This was accidentally erased in the refactor and there was no test to ensure it. One test removed as it was not testing the intended functionality. Other tests were updated to ensure confirmation_request_count is incremented. It was necessary to disable the rep crawler to properly test the confirmation loop.

* Ensuring confirmed elections are not returned in RPC confirmation_info

* Adding information on confirmed blocks for RPC confirmation_active

* Moving election status definitions to secure/common

* Re-lock after activating dependencies (bug found by @cryptocode)

Otherwise, could call state_change to `expired_unconfirmed` without owning the mutex

* Enhance node.activate_dependencies test by ensuring full confirmation (Serg review)

* LMDB sync options and new config settings (#2588)

* LMDB sync options and new config settings

* Force sync always for wallet store

* Update adjusted difficulty in batches (#2604)

* Update adjusted difficulty in batches each request loop because ordered roots are used only in this loop. Also prevent extra item modification if adjusted difficulty remains the same (i.e. single block election without dependencies in roots container)
* Use new sorting only if mutex lock was removed for frontiers search
* Adjust dependent blocks difficulty

* Make confirmation_solicitor.batches more robust under tsan (#2628)

* Make network.replace_port more robust under tsan (#2630)

* Add ASSERT_TIMELY (#2633)

* Add ASSERT_TIMELY and apply to request_aggregator.one

* Add lambda version

* split docker artifacts to a separate job (#2636)

* Read transaction scope in active (#2640)

* Fix request_aggregator unit tests (#2632)

* Fix request_aggregator unit tests

With #2614 cached votes are sent after removing the request and unlocking the mutex, making it necessary to wait for confirm_ack count to be updated.

* Using ASSERT_TIMELY

* gather sha256 hashes of artifacts and upload to s3 with artifacts (#2647)

* gather sha256 hashes of artifacts and upload to s3 with artifacts

extra whitespace changes in powershell scripts

* typo

* Read config file for CLI commands (#2637)

* Use attempts list for TCP channels (#2581)

* Use attempts list for TCP channels to prevent multiple concurrent connections start to same peer
* Erase from attempts list after realtime TCP connection failure (udp fallback function) or success (insert function)
* Explicitly close sockets after realtime TCP connection start failure
* Use tags for TCP & UDP attempts lists
* New test for max attempts
* Limit max peers per IP for live & beta networks to 5
* Debug assert if there is limit overflow in tests
* And special flag to allow using more connections

* Fix wrong number of representatives in confirmation solicitor (#2648)

* Bandwidth considerations following election refactor (#2646)

* Bandwidth considerations with election refactor

- Block broadcasting is used as a backup mechanism, now only done after and every 20 seconds
- Reduced send_confirm_req period from 15 to 5 seconds, as it is the primary mechanism and only targets representatives that haven't voted yet
- Increased time to activate dependencies ensuring at least one block broadcast is performed

* Initial flood for elections created via node::block_confirm

* Revert "Initial flood for elections created via node::block_confirm"

Following Colin review, this reverts commit bdb200ab09ad3e2c0cb7b709045599c374c8695f.

* Telemetry results not correctly using cache timeout (#2650)

* Increase active elections capacity with periodic full checks (#2641)

* Confirmed to expiry fixed at 5 seconds

* Only increase the election counter for unconfirmed elections

* Increase default active elections size to 50k but only perform a full check periodically

* Add timing logging for large number of elections (every 5 seconds max)

* Adjust timing logging for block processor

* Tune search_frontiers such that therre are no delays in aggressive mode but also not too many added

* Add erroneously removed mutex lock in https://github.com/nanocurrency/nano-node/pull/2619

* Feedback on variable name

* Empty commit to fix actions

* Tweak blockprocessor logging to resemble previous behavior with disabled timing logging

* Network duplicate filter for publish messages (#2643)

* Filter duplicate publish messages before deserializing

When a message is unique, the digest is saved and passed around to network processing, which may drop it if the block processor is full.

Cleaning up a long unchecked block erases its digest from the publish filter.

The blocks_filter has been removed due to redundancy. The size of this filter is 256k, which uses about 4MB.

* Batch erase in unchecked_cleanup due to a potentially large list

* Erase representatives with full queues when adding to confirmation solicitor (#2649)

With this change less processing happens for elections that won't fit in the solicitor.

* Include requesting telemetry metrics from temporary channels (#2653)

* Include requesting telemetry metrics from temporary channels

* Read transaction no longer needed (Gui comment)

* Parallelize state block signature verification with block processor (#2570)

* Parallelize state block signature verification with batch block processing

* Make sure all state blocks are flushed too

* Serg review comment about notification cleanup

* Use half the amount of threads for the signature checker

* Remove multithreaded cutoff

* Use n/2 extra threads. Handles odd number of CPU threads too.

* Formatting

* Simplify expression (Gui comment)

* Update tick count in timer::update (#2655)

* Update tick count in timer::update

* Return ticks from restart

* Delay inactive/gap cache bootstrap start for 30 seconds (#2631)

instead of 5 seconds. To allow more frequent new blocks arrival with realtime network & kess usage for expensive bootstrap

* Signature checker blocking fix (#2659)

* Signature checker blocking fix

* Change variable name to remove shadowing

* Fix node.aggressive_flooding (#2656)

* Fix node.aggressive_flooding

This test was resorting to bootstrapping as backup when it should fail instead. Now the test has bootstrap disabled, and the fix was to locally process the genesis chain in each node, ensuring all representatives will be able to connect the last blocks.

With a sanitizer, less nodes are used because the repcrawler timings can remove one or more representatives during the test, making it fail.

Disabling aggressive flooding in blockprocessor::process_live makes the test fail, as expected. Also did some cleanup using ASSERT_TIMELY.

* Also decrease number of nodes and change timings under valgrind

* Improve confirmation consistency (#2625)

* Test for confirmation consistency

* Remove unnecessary transaction (Gui review comment)

* Protect access to root container (Gui found during TSAN run)

* Gui review comments

* Remove checking for inserted in add_recently_confirmed

* typo on windows sha256 Out-File (#2660)

write out sum on nix to directory we have write permission on

* Increase minimum time to log block processing (#2661)

* Increase minimum time to log block processing

* Revert change to vote processor timings

* Output current function in assert diagnostics (#2665)

* Sign telemetry messages (#2618)

* Sign telemetry messages

* Stein review comment about reusing ed25519 wrappers

* Remove unused header

* Add IGNORE_GTEST_INCL define to node.cpp

* Definitions for work thresholds with epoch_2 (#2638)

* Definitions for work thresholds with epoch_2

This article outlines the decision process for these thresholds: [Development Update: V21 PoW Difficulty Increases](https://medium.com/nanocurrency/development-update-v21-pow-difficulty-increases-362b5d052c8e).

Only adds definitions, `nano::work_threshold (version, details)` is not yet used anywhere besides tests.

* Add temporary work validation in blockprocessor::add until ledger validation is in

* Formatting

* Check if a vote is for a recently confirmed block (#2663)

* Check if a vote is for a recently confirmed block

* Formatting

* Use count() as the iterator is not necessary (Wes comment)

* clang 8 included in actions env (#2670)


* ensure we use clang-format-8

* Move all request aggregator operations out of the mutex hold scope (#2662)

* Move all request aggregator operations out of the mutex hold scope

* Accidentally removed test assert (Wes review)

* Remove telemetry message versions (#2610)

* Add telemetry response websocket callbacks (#2634)

* Add telemetry websocket callbacks

* Websocket additions

* Remove TODOs

* Directed block broadcasting for long elections (#2505)

* Split election block broadcasting into directed and random floods

* Formattting

* Assert no errror on system.poll_until_true (cc comment)

* As one line

* Simplify and explain the two copies (Serg comment)

* Remove peers with different genesis block or invalid telemetry signature (#2603)

* Remove peers with different genesis block after tcp node handshake

* Formatting

* Handle case of valid signature but not matching node_id

* Add UDP channel removal too

* Check node_id for mismatch with channel to save sig check (Serg review)

* Update from merge

* Move erase code to network class

* Optimize mutex access when adding blocks to block processor (#2676)

* Delay voting for non-priority elections under saturation (#2666)

* Delay voting for non-priority elections under saturation

This is part of https://github.com/nanocurrency/nano-node/pull/2440 . Using a similar strategy of saving the last prioritized difficulty during `update_active_difficulty()`, the result of inserting an election now includes another boolean, hinting that the election may not be a priority due to a large number of active elections and low difficulty of the inserted block.

Active difficulty is now calculated only from these prioritized elections. The cutoff is defined at 10% of thhe `active_elections_size` config which is now 50k. During `request_confirm`, the top elections are prioritized.

When an election is prioritized, a vote is generated for the current winner.

* Make sure to have at least one element when updating last_prioritized_difficulty

* Prevent getting stuck in block processor flush (#2675)

* Prevent getting stuck in block processor flush

I noticed `node.block_processor_reject_state` was often freezing on windows, this is due to a the verification callback being called and notifying the block processor before transitioning into an inactive state, so the `block_processor::flush` ends up waiting for the condition forever. I've added a second callback to solve this.

* Lock before notifying to prevent a race with condition.wait; check if flushing first

* Change the test to launch async and wait for future, to prevent freezing in the future but still fail

* Comment formatting

* Add comment on why lock before notifying (all Wesley review)

* Refactor work thresholds as nano::work_thresholds (#2672)

Once we move to c++17, these can be constexpr-defined in lib/config.hpp.

* Validate work difficulty during ledger processing (#2667)

Block work is now validated during ledger processing according to their epoch version and some block details. The following article details the reasoning for this change: [Development Update: V21 PoW Difficulty Increases](https://medium.com/nanocurrency/development-update-v21-pow-difficulty-increases-362b5d052c8e).

When an account is upgraded with epoch 2, blocks must meet the new thresholds or they are discarded during ledger processing. Before ledger processing, an initial work validation is performed on node entry.

* Modify inactive cache conditions to prevent multiple insertions for confirmed entries (#2674)

* Fix intermittent wallet.work_watcher_update failure (#2677)

This was due to sometimes the blocks having difficulty larger than `limited_active_difficulty` (min between trended_active_difficulty and `config.max_work_generate_difficulty`). Forcing the config value to a high value and the trend to immediately above the highest difficulty between the blocks is sufficient.

* Handle epoch_2 work thresholds in the wallet and most RPCs (#2671)

* Handle epoch_2 work thresholds in most RPCs

This PR is quite large with work generation and validation being a central piece of the node.

In lib/config, work thresholds have been added for the beta and test network. They are now placed under a `nano::work_thresholds`. Unlike the main network, the thresholds are 2x and 1/2x from the base threshold. For tests, the previous base threshold was lowered such that the new highest difficulty (2x) is the same as before. The reason for this is that work generation targets the highest difficulty.

`nano::work_validate` was renamed to `work_validate_entry`, further validation must be done by using `work_threshold_entry` and `work_threshold_full`.

Work generation was changed to always require `difficulty`, except for tests. The reason is that any caller should not assume a base difficulty anymore.

These RPCs now fully support the new thresholds and are ready for epoch_2 (will add some tests):
- `process` validates at the minimum, entry difficulty, and handles the new `insufficient_work` result from ledger processing (used in https://github.com/nanocurrency/nano-node/pull/2667)
- `send` and `account_representative_set` get the account epoch version and targets the correct difficulty
- `receive` aditionally checks if the `source` epoch is higher, allowing new epoch propagation
- `epoch_upgrader`

These RPCs have changes to be compatible but are not yet optimal:
- `block_create` now accepts an optional `difficulty`, and will generate at the highest difficulty otherwise (8x on mainnet).
- `work_generate` also generates at the highest difficulty if `difficulty` is not specified, and multipliers are now off the highest difficulty.
- `work_validate` validates at the highest difficulty unless `difficulty` is specified, meaning this is a breeaking change and it can return "not valid" for valid blocks. Multipliers are now off the highest difficulty.

Support is also limited for active difficulty. It is currently calculated off the `epoch_1` difficulty to avoid any changes.

All these cases will be handled in separate PRs, as this one is large enough as-is.

* Define in source file

* Fix websocket active_difficulty

* Extract system::work_generate_limited

* Fix debug_asserts in active_transactions::update_active_difficulty and add a test to ensure it

* Add disabled test for RPC process, to be enabled in https://github.com/nanocurrency/nano-node/pull/2667

* Add auxiliary system::upgrade_genesis_epoch_2

* Set sideband block details for legacy blocks, unused for now

* Higher amplitude between thresholds for the test network for easier testing

* Handle work thresholds in the node wallet

* Tests validating the node wallet handles thresholds

* Final tests and adjustments

* Fix ASSERT_NE

* Add test ensuring the reduced work is also used when opening accounts previously upgraded

* Ensure blocks from wallet meet the minimum difficulty

* No need to change sideband for legacy, since the wallet never uses them

* Enable test rpc.process_ledger_insufficient_work

* Fix a couple of RPC tests being slow due to testing unrelated functionality

* Fix intermittent failure in active_transactions.confirmation_consistency (#2682)

* Fix intermittent failure in active_transactions.confirmation_consistency

`recently_cemented` is only changed in the conf height observer callbacks. This is the intended behavior, but the test was intermittently failing under TSAN.

* Empty commit

* Consistently add conflicting block to election (#2652)

* Add conflicting block after creating the election

* Add a test ensuring correct behavior

* Optimize vote post-processing operations (#2627)

* Add return flag for rep_crawler::response

* Optimize vote post-processing operations

This can be done now that confirmed elections linger for 1 to 10 seconds in active roots.

The rep crawler is always checked, but online weight only for live votes or for rep crawler queries. Gap cache only for indeterminate votes

* Epoch 2 started flag in ledger cache (#2684)

* Epoch 2 started flag in ledger cache

This flag is flipped when the first epoch 2 block is successfully processed.

Differences in behavior after the flag is flipped:
- RPC `work_generate` uses the new epoch 2 threshold as default
- RPC `work_validate` validates for the new threshold as default (breaking behavior in previous PR)
- RPC `block_create` uses the new threshold as default
- Node wallet pre-caches work at the new threshold

* Adjust tests for node default difficulty

* Also affect work websocket multiplier, and change default test work generate difficulty

* Fix rpc.work_validate test

* Ensure state is kept on ledger initialization (node restart)

* Increase minimum supported protocol version to 17 (#2683)

Node version 19

* Improve automatic frontiers confirmation (#2686)

* Improve automatic frontiers confirmation

* Refactor code into function and move up a level (Gui comment)

* Typo semi-colon

* Conditions need reversing

* set timeout to 1hr, tests historically complete before then (#2687)

move clang-format to separate workflow for additional static analyzers

* Delay wallet work caching to allow using lower difficulty on demand (#2680)

* Delay wallet work caching to allow using lower difficulty on demand

* Use node.default_difficulty for the threshold

* Work version parameter in default_difficulty and use it in more places (#2690)

* Receive work version in default_difficulty and use it in more places

* Empty commit to trigger actions

* Update test

* Move excluded_peers to network (#2693)

* Moves peer exclusion code into its own headers, moves the object from bootstrap initiator to network, some general cleanup and adds a validation test

* Get limited size from method

* Compositing peer_exclusion for container info collection (Wes review)

* Fix container info collection and include in test

* Wrapper for RPC worker tasks (#2692)

* Wrapper for RPC worker tasks

* Pass the rpc ptr by const ref (Wes review)

* Remove assert if a delayed work cache request is not found (#2696)

In https://github.com/nanocurrency/nano-node/pull/2680 I erased from `delayed_work` in `wallet::action_complete` to prevent doing queueing an outdated request, but this can make it assert when the request is not found later. Removing the assert is enough as the default action is to not do anything about it, which is correct.

* Fix qt test wallet.seed_work_generation (#2695)

* Epoch open blocks should have corresponding pending entries (#2673)

* Epoch open blocks should have corresponding pending entries
* Simplify pending check condition
* Add block store function pending_any ()

* Move vote generator calls into election code (#2688)

* Generate votes for:
    - started election (if prioritized)
    - changed winner
    - election prioritization
* Move votes generator from block processor to active transactions

* updates for fuzzer (#2697)

* Fix qt tests failing to click radio buttons (#2699)

* Rate limiting using token buckets (#2645)

* Rate limiting with token bucket

* Add burst ratio setting (3x default) and unlimited test

* Better handling of largest_burst with unlimited config

* Use steady_clock everywhere

* Count tokens even for undroppables, use unsigned data type and clarify refill expression

* Use debug_assert

* Guilherme review feedback: use floating point ratio, and log it on node startup

* Bandwidth drop stats and toml test updates

* Using relaxed atomics for counts not involved in control flow in conf height processor (#2651)

* Using relaxed atomics for counters in conf height processor

* Set memory order for pending_writes_size (Serg review)

* Missed modifying confirmed_iterated_pairs_size

* Use relaxed atomic wrapper

* SFINAE out most invalid types

* Using incorrect compare_exchange

* Add missing type helper alias

* Prevent more rare deadlocks due to races for condition variables (#2706)

* Prevent more rare deadlocks due to races for condition variables

* Use void

* Revert unecessary condition check

* Revert unecessary lock to stop

* Prevent reconnecting to excluded peers with sufficient score. (#2694)

* Incorrect cemented count during conf height algo transition (#2664)

* Incorrect cemented count during transition in some circumstances

* Better test name

* Update test to use new construct after merge

* Active difficulty normalization (#2691)

* use multiplier instead of difficulty in conflict_info as base
* normalize multiplier for different epochs & blocks types based on sideband information
`normalized = (multiplier + (ratio - 1)) / ratio;`
* use adjusted & active multipliers instead of difficulties

* Difficulty calculation for RPC block_create (#2703)

* Difficulty calculation for RPC block_create
* Use count to find out if difficulty is defined

* Ensure max_work_generate_difficulty is updated when changing the default difficulty (#2705)

* Ensure max_work_generate_difficulty is updated when changing the default difficulty

* Fix broken RPC tests

* [TSAN] lock order inversion in active transactions / wallet (#2711)

in core_test websocket.confirmation_options

* Add node sequence for tests (#2712)

* Remove "valid" from RPC work_validate if difficulty is not explicit (#2689)

This is a semantics change for RPC `work_validate`, due to the transition to new work levels with epoch 2.

Two new fields are added:
- `valid_all` , true if the work is valid at the **current default difficulty** (updated to epoch_2 levels after the first epoch_2 block is processed)
- `valid_receive` , true if the work is valid at the lower epoch_2 receive difficulty

If difficulty is not explicitly given, then the `valid` field is not included in the response, to break integrations loudly.

* Fix intermittent failure in ledger.work_validation due to random work being above threshold (#2708)

* Improve telemetry request/response under load (#2669)

* Improve telemetry request/response under load

* Fix broken merge

* Extend response time to 10s on beta/live

* Some minor cleanup and extra comments

* Formatting

* Add missed break statement in stats

* Add Security Policy file (#2700)

* Initial draft of security document

* Minor formatting issues

* Add links to security fix PR and release

* Minor typo fix

* Add guilhermelawless - guilherme@nano.org GPG key

* Add link (only works once merged) to guilherme.asc

* Update GPG key links and email addresses

* Adjust gpg key files and links

* Colin LeMahieu signing key.

* Wesley signing key

* Sergey Kroshnin signing key

Co-authored-by: Guilherme Lawless <guilherme@nano.org>
Co-authored-by: Guilherme Lawless <guilherme.lawless@gmail.com>
Co-authored-by: clemahieu <clemahieu@gmail.com>
Co-authored-by: Wesley Shillingford <wezrule@hotmail.com>
Co-authored-by: Sergey Kroshnin <sergiysw@gmail.com>

* Asynchronous epoch upgrade RPC (#2704)

* Asynchronous epoch upgrade RPC
* Add slow_tests version with more accounts
* Prevent rare deadlocks due to races for condition variables
Co-authored-by: Guilherme Lawless <guilherme@nano.org>

* Vote generator session for batch insertions (#2702)

* Vote generator session for batch insertions
Restoring removed in PR https://github.com/nanocurrency/nano-node/pull/2688
* Make vote_generator_session single threaded
* Vote generator session test

* Dont peer with v20 and earlier after epoch 2 block is seen (#2701)

* Don't actively peer with v20 nodes when an epoch 2 block is seen

* Drop peers and other improvements

* Formatting

* Update changes due to merge

* Use exchange (Gui comment)

* Ignore in-progress messages and purge any channels during periodic cleanup in case any are missed (Gui test)

* Use variable names for the constants (Serg review)

* [TSAN] start_time data race in bootstrap_client (#2698)

* Ensure propagation and removal for the work watcher (#2709)

* Ensure propagation and removal for the work watcher

- Allow removing a block by its root, even if the hash does not match
- Simplify (2 less indent levels) `work_watcher::watching`
- Test suite ensuring removal and propagation in different conditions

* Remove duplicate test

* Simplify work_watcher.generation_disabled test and reduce test time

* Check still watched after work generation, before flooding (Serg comment)

* Difficulty updates for elections with multiple blocks (#2710)

* Difficulty updates for elections with multiple blocks

`active_transactions::publish` now performs a difficulty update for a new block, ensuring elections with forks have equivalent priority accross the network as long as they see the same blocks. `election->publish` is no longer called from `update_difficulty(_impl)` as it has no effect.

Difficulty normalization (https://github.com/nanocurrency/nano-node/pull/2691) made this more difficult as forks don't have a loaded sideband and we wish to avoid extra disk reads. https://github.com/clemahieu suggested storing the previous block's epoch and balance in order to infer we can infer block details, implemented in this PR.

In elections for live blocks, the only case where it is not possible to correctly infer the threshold is during an epoch upgrade, for blocks performing an upgrade (epoch will mismatch from root, and the block itself might be an epoch block). This only affects prioritization and the window is short. In the future, this can be disambiguated by including some flags in the block itself.

New tests added ensuring correct difficulty updates for old blocks and forks.

* Check block previous zero first to avoid tx_begin_read (Serg review)

* Assign previous_balance for all blocks (Serg review)

* Use const ref (Wes review)

* Improve batching of writes in unbounded conf height processor (#2714)

* Epoch upgrader as an async task (#2718)

* Epoch upgrade on background instead of worker

Also fixes the RPC test not being multithreaded

* Epoch upgrade as an async task; fix limited count in multithreaded mode; improve tests

* Use node.epoch_upgrader directly in the slow test

* std min fix

* No need for std::bind (Wes review)

* Move TCP messages processing to network threads (#2613)

* Move TCP realtime messages prosessing to network threads
* Add timeout for run_next ()
* Always log network threads processing exceptions
* Remove unnecessary statements (Wesley)
* Comment for not removing requests.front ()

* Add difficulty and multiplier to CLI work generation commands (#2707)

* Preparation for building with shared boost (#2611)

* Preparation for building with shared boost
Can be tested currently by ensuring `-DBOOST_ROOT=<path_to_boost_root> -DNANO_SHARED_BOOST=ON` are set for cmake
This will also require you have locally built static boost 1.67+
added switch for shared/static linkage to bootstrap_boost.sh default to static

* tabs vs spaces

* Verbiage for clarity

* '-j n' switch for bootstrap_boost.sh

allows for multiple jobs to be built at once with b2

* Fix logic in active_transactions.prioritize_chains test (#2717)

* Add Flatbuffers schema evolution rules to IDL (#2644)

* changelog_generator refactored as changelog.py (#2722)

* changelog_generator refactored as changelog.py
mapping between sections and labels handled in SECTIONS
Other added as a catchall if there are no matching labels

* formatting errors

* Websockets Section added
reorder as PRs will only appear in one
Update header, Highlight breaking changes.
Sort Breaking changes to the beginning of the sections

* Full Changelog link

link for github compare between start and end references

* Swap --- before header

* Separate election state for the broadcasting block fallback (#2720)

Instead of basing only on time, a hard limit on confirmation requests helps avoid resorting to fallback mechanisms (block broadcasting, escalating to dependents) too soon.

This is implemented with a new state broadcasting , and before moving to this state at least two confirmation requests must be done. This indirectly extends the time before moving to backtracking by 10 seconds.

Time between broadcasts reduced to 15 from 20 seconds.

* Remove confirmation requests for a new representative (#2721)

Because it's done in request loop quite frequent

* Clarify nano_pow_server configs are not in use (#2724)

* Clarify nano_pow_server configs are not in use
* Additionally disable possible nano_pow_server launch in daemon (not used)

* Release write_guard lock when no longer required (#2716)

* Release write_guard lock when no longer required

* Use move constructor for write_guard

* Increase maximum values for various settings

* Formatting

* Remove unnecessary extra space in comment

* Update slow_tests

* Fix slow_test check

* Dynamically set batch_write_size based on previous write performance. Change is gradual to account for random spikes/slowdowns.

* Add a tolerance in case amount to cement is just above to save waiting on block processor for small amount of blocks

* Reduce batch_write_size in slow_tests now that it's configurable so that it takes less time

* Don't call release if there's no blocks which were cemented at the end

* Typo in comment (thanks Gui)

* Prevent yoyoing as much

* (Unrelated) Fix prioritize_frontiers_overwrite test

* Increase amount of time spent searching for frontiers when there is a low amount of active transactions

* Have a force_write in unbounded to be consistent with bounded which is based on blocks

* Typo in comment

* Modify heuristics for updating active multiplier (Gui comment)

* Give magic number a variable (gui)

* Fix incorrect comparison (Gui)

* Add public function to determine if write_guard is owned and use that (Gui)

* Allow restarting elections with higher work (#2715)

* Allow restarting elections with higher work

* Use debug_assert

* Simpler return variable (Wes review)

* Add missing transition_active, move insertion in recently_confirmed to election cleanup

* Test condition fix

* Some confirmed block observer callbacks being missed (#2723)

* Observer callbacks being missed

* Small optimization to potentially save read disk io

* Also reset variables after cementinting all

* (Unrelated) Remove old comment

* Add check for election_winner_details in case of duplicate elections for same block

* Only check for inconsistency between blocks_confirmed and observer stats on beta/live

* Revert test changes which can go back to using confirmation_height_processor::add directly

* Update observer (Gui comment)

* Add active difficulty to node telemetry (#2728)

* Add active difficulty to node telemetry

* Variable name typo

* Use hex string to be consistent with for active_difficulty RPC

* Use a multi-index container to allow fifo queue for pending confirmations (#2730)

* Use a multi-index container to allow fifo queue for pending confirmations

* Use mi::identity (thanks Gui!)

* Missed variable update

* Can use count instead of iterators

* Safely read override values when no config file is present (#2727)

One of the toml read methods was not reading within a try/catch. Re-arranged the methods slightly to avoid duplicating code.

There was also this check before setting the error when reading from a stream:
```
auto pos (stream.tellg ());
if (pos != std::streampos (0))
```

Removing this check has all tests pass nonetheless, and now only the lowest level method is responsible for setting the internal tree and error.

* New stats for elections (#2731)

* New stats for elections

election_non_priority, election_priority, election_block_conflict, election_difficulty_update,
election_drop, election_restart

* Unecessary explicit ctor delete (Wes comment)

* record_rep_weights to py3 (#2732)

* Allow starting more that max_peers_per_ip test nodes (#2735)

* Websocket new_unconfirmed_block (#2729)

* Obtain state subtype string from block details and use in two RPC methods

* (Unrelated) fix some compilation warnings

* New websocket subscription "new_unconfirmed_block" for newly arrived blocks.

* Websocket notification for RPC work_generate without peers (#2734)

* CLI compare_rep_weights to compare ledger and hardcoded weights (#2719)

* CLI compare_rep_weights to compare ledger and hardcoded rep weights

* Clarify the basis of comparison are the hardcoded weights not the other way around (Serg comment)

* Add const-qualifier to uint128_union::format_balance

* Add standard deviation (sigma) to the output

* Wrap sum in a lambda (Wes review)

* Log each individual mismatch sample

* Refactor and output outliers

* Eat one line (Wes)

* Wes review

* Change threshold to 1-sigma, also present new representatives (not present in hardcoded)

* Filter ledger weights to 99% cummulative weight (same as hardcoded)

* Reserve known and use alias (Wes comments)

* Remove invalid uses of epoch_1 work threshold (#2733)

Was affecting websocket active_difficulty. Tests for it and RPC active_difficulty now pre-upgrade the node to epoch2 to ensure the output is correct.

* Fix intermittently failing rpc.confirmation_height_currently_processing (#2737)

* Fix intermittent node_telemetry.remove_peer_different_genesis test (#2738)

* Fix rpc.wallet_history failures (#2739)

This was simply due to the first block getting auto received on confirmation so the `receive_action` would fail. Disabling voting solves it.

Removed the thread sleeps, they don't seem to be required for this test.

* CLI command for a frontier confirmation speed test (#2725)

* CLI command for a frontier confirmation speed test
* Use separate boost::asio::io_context for CLI test

* Fix system.generate_send_new intermittent failures (#2742)

This is likely due to wallet rep counts not updating quick enough on CI due to online weight fluctuating very heavily in this test, causing votes to not be generated. Waiting for the online weight to stabilize by waiting on a voting rep should fix it.

Ran CI twice and didn't trigger whereas it would trigger often without this change.

* Tally votes on conflicting block with no inactive votes (#2744)

Mostly applicable to tests, but consider this sequence of events:
- An election gets created for a processed block
- A vote arrives for a conflicting block; gets added to election, not inactive
- The conflicting block gets processed

Currently, `election::publish` calls `insert_inactive_votes_cache` but votes are not tallied since there was no inactive vote.

This fixes the above situation by calling `confirm_if_quorum` if no votes were cached when a new conflicting block is inserted.

* Fix intermittently failing conflicts.adjusted_multiplier_test (#2745)

* Fix node.fork_invalid_block_signature intermittent failures, re-enable on windows CI (#2743)

* Fix minor test-specific intermittent failures (#2748)

* Fix intermittent failures in websocket.active_difficulty test

* Fix intermittent failures in websocket.bootstrap_exited test due to attempt finishing immediately

* Fix work_watcher.propagate by allowing higher max work generation difficulty

* Apply similar fix to node_telemetry.remove_peer_different_genesis_udp as the tcp version in https://github.com/nanocurrency/nano-node/pull/2738

* Move node_telemetry.all_peers_use_single_request_cache to slow_tests as it takes too long

* Fix intermittent failure in active_transactions.activate_dependencies due to the election being removed before full cementing (still in conf height processor)

* Don't check network size, peers are added for exclusion but they can still reconnect via UDP; explicitly request telemetry to make the test faster (UDP does not request telemetry on connection

* Fix network.tcp_no_connect_excluded_peers test failure on MacOS (#2750)

* Result difficulty in RPC block_create (#2752)

* Fix Xcode IDE warnings (#2746)

* Serialize telemetry as big endian (#2751)

* Flood difficulty updates from RPC process (#2753)

* Flood difficulty updates from RPC process

This is currently done from the work watcher which doesn't go through ledger processing.

Now, there is a new `process_old` which floods the block if it was locally produced and there was a work update.

* (unrelated) assert no error on system.poll_until_true in a telemetry test

* [TSAN] race for system in test websocket.bootstrap_exited (#2757)

* Perform wallet representative action without holding any mutex (#2759)

* Perform wallet representative action without holding any mutex

 This relaxes restrictions on the action and avoids potential deadlocks through lock-order-inversion

* Relax debug_assert when adding to votes_cache as the representative counts can change even during a foreach_representative action

* Windows requires explicitly unlocking the mutex once locked

* Multithreaded --validate_blocks (#2749)

* Multitreaded --debug_validate_blocks

Block count: 33136176 (November live network snapshot)

develop branch: 8523 seconds validation
2 threads: 4587 seconds validation time
4 threads / 4 cores: 2673 seconds validation time
8 threads (Hyperthreading) / 4 cores: 1814 seconds validation time

Additionally silent command execution to show only validation status & errors count

* Alias --validate_blocks
* Apply Wesley reviews
* Apply Guilherme reviews
* Apply Wesley review (emplace_back)

* Fix intermittent failure in test wallet.work_cache_delayed (#2760)

By random chance, the pre-set work could pass even for the blocks it's not intended for (due to test difficulty being low). The two removed lines are, in the end, out of scope for this test.

* Add cemented block log timings (#2762)

* Add cemented block log timings

* Reduce timer count () calls (Gui review)

* Votes from local representatives should not be flooded on processing (#2766)

* Add representatives cache in wallets, use in websocket and ipc

* Don't broadcast a processed vote from a local representative

* Revert changes to wallets::exists, add a test to ensure it can't be broken in the future (Serg review)

* Fix previous balance in active_transaction::insert () (#2767)

preventing possible segfault

* Optional "block" given to RPC "work_generate" to infer difficulty (#2754)

* Simplify block_impl () json/text retrieval
* Optional block for work_generate ()
* Test for "work_generate" with block
* Apply Guilherme review
* Difficulty from previous block function for "block_create" & "work_generate"

* Republishing a vote to principal representatives (#2772)

This behavior was recently changed as part of https://github.com/nanocurrency/nano-node/pull/2468 . This specific change has the least impact to bandwidth usage out of all changes in that commit.

The list of peers to which a new vote is republished now includes PRs. Fanout is unchanged at `.5 * sqrt(peers)`. The effective fanout change on the main network is from ~14 to ~17.

This change allows PRs that do not directly connect to other PRs to still see their votes. Ensuring PR connectivity is another approach that will be explored in the future.

* Deprecate --batch_size/debug_mass_acitvity CLI options (#2769)

* Deprecate batch_size CLI option

* Also deprecate --generate_mass_activity CLI command

* Force node exit if ledger inconsistency in the conf height processor is found (#2768)

* Improve conf height processor resilience during fork testing

* Add extra checks during write transaction

* release_assert in conf height processor if a ledger inconsistency is found

* Remove unnecessary header file change

* Removed wrong member variable + typo

* Redesign tests to use bounded/unbounded processor directly and death asserts

* Output to console also (Serg review)

* Use convention from other death test (Gui comment)

* Fix death test failures on release builds

* Fix scope in test (Gui comment)

* Enable WebSocket server by default in Docker image (#2774)

* Enable WebSocket server by default in Docker image

The rpc server is enabled by default and port mappings are provided for both the rpc server and the websocket server. Keeping the configuration and documentation consistent between the two will be clearer.

* Consistently document default config

* Set boost min to 1.69 (#2779)

Minimum boost moved to 1.69. 
promote nano-pow-server sub module to a commit with 1.69 boost requirements as well
Documentation update in https://github.com/nanocurrency/nano-docs/pull/297

* enable shared boost for tests (#2783)

use static boost for windows tests due to strange linking error
enable multi-core build support for b2

* Push front blocks from unchecked (#2565)

* Push front blocks from unchecked

Push blocks from unchecked to front of processing deque to keep more operations with unchecked inside of single write transaction.
It's designed to help with realtime blocks traffic if block processor is not performing large task like bootstrap.
If deque is a quarter full then push back to allow other blocks processing.

* Apply Wesley review

* Union std::hash coverage (#2781)

* Add failing test ensuring full-coverage of std::hash for unions

* Fix std::hash specialization of uint256_union and uint512_union

* Add slow test live processing a variety of blocks

* Use qwords (Serg)

* Compiler warnings

* Change static_assert to test asserts as changing the active member of a union is not legal at compile time

* Confirmation requests and broadcasts if available vote is for a conflicting block (#2784)

* Stuck uncemented blocks after heavy load (#2782)

* Stuck uncemented blocks

* online_reps unrelated change

* Increase timeouts in slow_test for Debug build

* Cleanup variables before sleeping to keep stats->objects in sync.

* [TSAN] Fix off-by-one in socket.drop_policy test (#2786)

Causing access of the counted_completion mutex after it goes out of
scope.

* Double bandwidth limit (#2787)

* use full cache for PR's requesting modules not currently specified in minimal (#2790)

fix gcc dockerfile

* Fix insufficient work logging (#2791)

- Incorrectly formed boost format
- Work and difficulty not logged as hex

* Bisected election backtracking (#2778)

* Add ledger::backtrack limited to 128 jumps

* Bisect election dependencies when activating

* Addressing special case where the winner may not have a loaded sideband

* Simplify ledger::backtrack by specifying desired number of jumps instead. Limit is placed on the caller

* Perform DB transactions without holding the active mutex by batch processing

* Add a debug_assert on getting the previous block since it happens in the same tx

* Add a pessimistic fallback by also starting an election for the first unconfirmed block

* Set empty sideband, required by recent change (thanks Serg!)

* Check if first unconfirmed block is not being processed by conf height

* Already in active_transactions

* Use std::min<uint64_t> to avoid overflow (Wes review)

* Use owns_lock from mutex (Wes)

* Retrieve block when activating dependencies (#2796)

@wezrule noticed a failure during a test related to a block not being found in the ledger when activating dependencies.

There are in fact two issues:
1. In between adding the dependency and activating, the block could be rolled back, so we need to check if it exists
2. Due to potential simultaneous sideband changes in ledger processing, we need to retrieve a new copy of the block instead of simply checking if it exists

Added `store::block_account_calculated` that is used when we already have a sideband-loaded block in memory.

* Check executable paths in load_test (#2797)

To output a descriptive error message rather than:
```
terminate called after throwing an instance of
'boost::process::process_error'
  what():  execve failed: No such file or directory
```

* update bundled FindBoost.cmake (#2792)

cmake 3.13.0 added support for 1.69 borrowed findBoost.cmake and updated conditional for using it

* CLI commands incorrect ledger cache setup (#2794)

* Sequential voting (#2785)

* Some operations need to be processed after transaction commit, specifically when needing to process blocks that have just been inserted in to the ledger.
Create the block_post_events class who's destructor will execute queued events.

* There are multiple cases where we want to iterate over depend…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This item indicates the need for or supplies updated or expanded documentation quality improvements This item indicates the need for or supplies changes that improve maintainability telemetry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants