-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240
Conversation
This reverts commit efb319c.
This reverts commit 40d0710.
Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>
Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>
This is the next step in the refactoring steps for changing gas profiles to track gas by parameter (near#8033). Here we make `ExtCostsConfig` opaque and look up parameters by ```rust pub fn cost(&self, param: ExtCosts) -> Gas ``` instead of using a specific field inside. There are side-effects for this in 1. parameter definition 2. JSON RPC 3. parameter estimator 1) We no longer load the parameters through "parameter table -> JSON -> serde_deser" steps because `ExtCostsConfig` no longer has serde derives. Instead each `ExtCosts` maps to a `Parameter` that allows looking up the value directly from `ParameterTable`. This explicit mapping also replaces the `Parameter::ext_costs()` iterator previously used to find all parameters that are ext costs. We used to define `wasm_read_cached_trie_node` in `53.txt` and fill old values with serde default. Serde was removed here, so I changed it to define the parameter in the base files. This is equivalent to the old behavior, only it is less clear when we added the parameter. 2) JSON RPC must keep the old format. Thus, I added `ExtCostsConfigView` and `VMConfigView` there. It is a direct copy-paste of the old structs in the old format but without serde magic to fill in missing values. 3) The estimator generates a `ExtCostsConfig` from estimations. This is now done through a mapping from estimated costs to `ExtCosts`. # Testing The exact JSON output is checked in existing tests `test_json_unchanged`.
Added create_test_signer that creates the signer for the given account with the seed that matches the account name. Can be used in tests only.
* fix a bug in skip procesing * add test * remove comments
removed edge field from Connection to avoid redundancy with GraphSnapshot.local_edges removed sending a duplicate RoutingTableUpdate message in response to RequestUpdateNonce
They were already rejected by wasmer2 before this change, just after preparation. So this should not be a protocol change. Second attempt at near#8029, without a protocol change this time
read_memory and write_memory have to perform the same checks as fits_memory already does. So rather than panicking, change the methods to return an error. This makes the interface panic-free and in some situations allows fits_memory call to be skipped. However, separate fits_memory check may still be necessary so keep the method but document in detail why it’s needed and, to keep interfaces consistent, change it to return a Result. Finally, read_memory_u8 is never used so get rid of it. While at it, add tests for WasmerMemory. Essentially copies of tests from Wasmer2Memory.
in the mirror code, where we're starting an indexer for the target chain, the genesis records file is usually very big since we're forking mainnet or testnet state to run a mocknet test. So starting the indexer with full genesis validation takes quite a long time and we don't really need it.
With the wrapping computation of `new_used_gas` any contract is in full control of this value, including those that are actually less than either operand (due to overflows.) This is not actually a big deal by itself, if not for the fact that it gives an opportunity for the attacker to nuke the entire network out of service. The in-line comments largely explain the reasoning behind the fix and why it preserves the protocol-facing behaviour, thus not necessitating a protocol version change. To reiterate what the comments say, basically the only case that we want to resolve is for when the attacker picks a value of `new_used_gas` such that it is less than `burnt_gas`. Instead of asserting and aborting we simply set `self.promises_gas = 0`. --- The fix in this PR has been included in 1.29.3 (72d4a4d) and 1.30.0-rc.6 (4290758).
The promise_and method processes one promise index at a time. There’s no benefit in having all them in memory. Rather than copying the indices from Vec<u8> into a new Vec<u64> simply do the little endian conversion while going through the former vector. This saves an allocation.
…ts (near#8195) Bumps [certifi](https://github.com/certifi/python-certifi) from 2021.10.8 to 2022.12.7. - [Release notes](https://github.com/certifi/python-certifi/releases) - [Commits](certifi/python-certifi@2021.10.08...2022.12.07) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Update the security policy to include hackenproof for security issue submissions.
Adds a separate page `/debug/pages/tier1_network_info` displaying information about TIER1 connections. For TIER1 connections we mostly display the same information as TIER2 connections, except that: - TIER1 messages don't include height, last block hash, or nonce info, so these columns are omitted - TIER1 nodes have configured public proxies; a column is added to display their addresses Includes a couple of minor tweaks to the TIER2 `network_info` page (show PeerId of the current node at the top of the page, rename the connection table's `AccountId` column to `PeerId`).
Deprecating AnnounceAccount in favor of AccountData, when finding routing target. AnnounceAccount will be still used as a fallback. In the next release, once we confirm that AnnounceAccount is not used any more, we will remove the fallback. We will stop broadcasting AnnounceAccount only once MIN_SUPPORTED_VERSION > 57.
None | ||
} | ||
}; | ||
let updateable_client_config = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better UX if the updateable configs are read through a separate file instead of the original config file. It has a few benefits:
- Users won't be able to modify a config field that is not actually upgradeable. Right now, if user does that, the code just silently drops the update, which could be a source of confusion to the user. On top of that, the node can be in trouble when they restart.
- Users can't accidentally change the config file to a bad state, so they won't be in trouble when they restart the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the suggestion is to have config.json
which is assumed to be in a good state; and to duplicate the updateable fields into a separate file, e.g. dyn_config.json
.
If in the future we decide to make max_num_peers
updateable at runtime, will that mean that the same field will be configurable in config.json
as network.max_num_peers
and in dyn_config.json
as max_num_peers
?
I would find that UX confusing, because reading config.json
is no longer enough to understand the node's configuration. There is always a chance that some field is overridden in dyn_config.json
.
On the other hand, if I want to change max_num_peers
, why shouldn't I change it in both config.json
and dyn_config.json
?
Which means that to read the configuration I need to read two config files; and to update the configuration I need to update two config files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my main concern with let user update config.json directly is that in that case, the values in config.json may not actually reflect what the config the node is using. But I also see your point about two sources of truth.
What about we let the user update the dyn_config.json but the code will automatically update config.json
to reflect the updated value after dyn_config.json is applied. This way, the user only needs to rely on config.json
for the source of truth. And they don't have to worry about messing up the config.json
when the node is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it even safer, the program can also save a copy of the old config value, so the user has something to fall back to in case they make a mistake and want to revert to the previous config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea about this concern. Could the following rules be an easier solution?
- Introduce a new config field
allow_config_change: true
(default is false) in the originalconfig.json
, and this value is not allowed changed in runtime. Just in case people do not want to change config in runtime and keep same behavior as current. - If the dyn_config is valid and applied. Could we save a config snapshot as
config.{timesampts}.json
? - If the snapshot can not store, neard will panic in case of any config changes without a clear snapshot.
- (Optional) When the neard start and
allow_config_change
is true, it also createsconfig.{timesampts}.json
such that the behavior will much more consistent.
The reason to use config.{timesampts}.json
but not config_log.txt
is
- user will easier to reuse the
config.{timesampts}.json
by renaming it toconfig.json
. - it is easier to use diff tools for two config files
- keep a single place to store the log
- once the dyn_config is applied, the snapshot is a kind of receipt, and we can just write a one-line log
config.{timesampts}.json applied
without verbose.
If these look good from your end, I am happy to implement this. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code will automatically update config.json
I find it very confusing when a program modifies its own configs.
the user has something to fall back to in case they make a mistake
Unclear how useful this is. I rely on the (persistent) undo history. Some users make and manage backups of the configs.
In general, I don't want to prescribe how the users should manage their config.json
file. There are too many exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_config_change
Seems redundant when a SIGHUP
is required to trigger a config reload. If the user doesn't send a signal, nothing will happen.
save a config snapshot
I believe many users would like to manage config versions themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like the idea of a single 'config.json' file.
For the config{timestamp}.json - the part I don't like, is that it pollutes the directory a little - what we could do instead, is to have the /config HTTP endpoint (or /debug/config) that would always report the current config. (this will allow people to do diffs in an easy way if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/config
is already work in progress: https://pagodaplatform.atlassian.net/browse/ND-273
Please also update the description of the PR. |
Ok(()) | ||
} | ||
// TODO: Add more config validation. | ||
// TODO: Validate `ClientConfig` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for validating ClientConfig
instead. Ideally, the validation should happen as soon as ClientConfig is loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if I leave this to a follow-up code change?
chain/client/src/info.rs
Outdated
} | ||
if let Some(statistics) = statistics { | ||
rocksdb_metrics::export_stats_as_metrics(statistics); | ||
} | ||
|
||
if let Some(config_updater) = &config_updater { | ||
config_updater.log_error(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about passing the error all the way from neard to client and to here just for logging. Without this, the logic for ConfigUpdater is much simpler as the struct does not contain any state. I see two other options:
- Simply let the node panic in neard after loading the new config if the config is updated to an invalid setting. This may be a little harsh, but I think it is also reasonable. When restarting the node, it panics if the config is invalid. So I think it is ok to terminate the node if the updated config is invalid.
- Log the error into a separate file such as
config_log.txt
. This way, the error message will not be lost in the other log messages, so you don't have store the message and log it every time with theinfo()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a tradeoff between code complexity and making it more user-friendly. Or rather, more difficult to shoot yourself in the foot and not notice it.
I lean strongly on the side of user-friendliness.
There is also a metric exported about the config being invalid.
let the node panic
That is user-anti-friendly. I can imagine updating the config during an incident, and accidentally killing a very valuable node. Don't want that.
And if a validator accidentally kills a node and misses blocks, that is not good either.
config_log.txt
That is non-trivial to use properly 🤔
I would like to use it like
killall -SIGHUP neard ; cat ~/.near/config_log.txt || echo 'Config valid'
But it may take longer than a moment to apply that config change, parse the config and validate it.
To be sure it's applied a delay is needed.
killall -SIGHUP neard ; sleep 10; cat ~/.near/config_log.txt || echo 'Config valid'
But I also cannot guarantee that 10
seconds is a sufficient delay.
And anyway, if I can remember to check config_log.txt
, I can remember to check the Prometheus metric.
…ble at runtime (near#8240) Introduce mutable fields in ClientConfig. Intoruce the infrastructure to reload `config.json` and notify `Client` that certain fields were updated. Refactoring to inline `dyn_config.json` into `config.json`.
Introduce mutable fields in ClientConfig.
Intoruce the infrastructure to reload
config.json
and notifyClient
that certain fields were updated.Refactoring to inline
dyn_config.json
intoconfig.json
.