-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proc macro for tedge config #1936
Proc macro for tedge config #1936
Conversation
2dc44a9
to
d893370
Compare
d893370
to
fb59fda
Compare
mod parse; | ||
mod validate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't help but remember Parse, don’t validate ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ironically, that's pretty much what validate is doing. It could do with clearer naming to distinguish between "things that can be parsed with syn and darling" and "things that are meaningful for the rest of the application, after e.g. converting boolean flags into enum discriminants" (which is currently in the validate module, even though it's following the idea of "Parse, don't validate").
@didier-wenzek you had concerns with implementing With the fn all_or_nothing<T, U>(_: OptionalConfig<T>, _: OptionalConfig<U>) -> Result<Option<(T, U)>, SomeError> (in "normal" Rust code) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not dive yet into the macro implementation. But the outcome is really appealing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to understand how TEdgeConfigDefault
is used and what's the point of this dummy TEdgeConfigLocation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEdgeConfigLocation
is a dummy implementation that will be replaced by the real implementation in tedge_config
because there wasn't any value in copying the code for it, and I didn't want the macro crate to depend on tedge_config
to keep compile times short while I was still developing. Because the location has to be passed into the from_dto
method, it has to exist in order to call define_tedge_config!
.
TEdgeConfigDefault
is an abstraction over the possible functions we can use to generate a default value if one isn't populated. For instance, a NonZeroU16
has to be created via a function call, but for a default port, this doesn't depend on any input, it can be something equivalent to || NonZeroU16::try_from(1883).unwrap()
. However some default (or read only) values may depend on other configurations, or the tedge config dir (such as the device certificate path, which is {config_root}/device-certs/tedge-certificate.pem
by default. For this, we need a function like:
fn default_device_cert(location: &TEdgeConfigLocation) -> Utf8PathBuf {
location
.tedge_config_root_path()
.join("device-certs")
.join("tedge-certificate.pem")
}
The trait TEdgeConfigDefault<Output = T>
is implemented by both fn() -> T
and fn(&TEdgeConfigLocation) -> T
, so this allows both of these functions to be called by the exact same (generated) code. This means the default functions just depend on whatever they need, keeping the config definition simple.
The idea (and implementation details) were borrowed from axum, there's a really good talk on it at https://youtu.be/7DOYtnCXucw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEdgeConfigLocation is a dummy implementation that will be replaced by the real implementation in tedge_config because there wasn't any value in copying the code for it.
It makes sense.
TEdgeConfigDefault is an abstraction over the possible functions we can use to generate a default value if one isn't populated ... so these default functions just depend on whatever they need, keeping the config definition simple.
Thank you. It's all clear now and that's indeed a great design.
The idea (and implementation details) were borrowed from axum, there's a really good talk on it at https://youtu.be/7DOYtnCXucw.
Thank you for the link.
|
||
static DEFAULT_ROOT_CERT_PATH: &str = "/etc/ssl/certs"; | ||
|
||
define_tedge_config! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm what I understand:
- The
define_tedge_config!
defines aTEdgeConfigDto
struct which hierarchy reflects the macro call structure. - This DTO manages read / write / types / docs / examples as well as default values.
- The DTO structure is to be used to update the config but one needs to call
TEdgeCsonfigReader::from_dto(&dto, &TEdgeConfigLocation)
to get aTEdgeConfig
where all the defaults have been resolved (or the errors raised). - The
TEdgeConfig
struct has the same hierarchy and be used as a regular struct with no method invocations to get specific settings however deep there are in the hierarchy.
Is this correct? If so this is really appealing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is essentially correct, and it's encouraging that it could be understood (even if only following discussions on the other PR)!
I think at the moment I'm generating the documentation from the reader, so read-only configurations are treated in the same way there to read-write configurations, but that doesn't make a huge difference to generating them from the DTO (and we may just want to get rid of read-only configurations entirely).
The intention with the reader is to only expose an immutable borrow to other crates so they can only read the underlying configuration. I intend to enable writing to the configuration, e.g. to set c8y.url
in tedge connect c8y
in a similar way to the current setup, where we give the caller &mut TEdgeConfigDto
and they just apply the necessary updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is essentially correct, and it's encouraging that it could be understood (even if only following discussions on the other PR)!
Following the other PR definitely helps. But the tests too.
I think at the moment I'm generating the documentation from the reader, so read-only configurations are treated in the same way there to read-write configurations, but that doesn't make a huge difference to generating them from the DTO (and we may just want to get rid of read-only configurations entirely).
Just makes what it's simpler to maintain.
The intention with the reader is to only expose an immutable borrow to other crates so they can only read the underlying configuration.
That's perfect.
Robot Results
Passed Tests
|
This will allow the "all or nothing" logic to be moved out of the macro.
pub fn all_or_nothing<T, U>( | ||
t: OptionalConfig<T>, | ||
u: OptionalConfig<U>, | ||
) -> Result<Option<(T, U)>, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely simpler and better to have this defined as a regular function instead of a macro.
Beyond this specific case, I see this simplification as a better separation of the responsibilities. Adding constraints on what makes sense should be done at the application level while the config should be focus on a more syntactic level (what is set, what is missing, what is ill-formed).
let dto = TEdgeConfigDto::default(); | ||
let config = TEdgeConfigReader::from_dto(&dto, &TEdgeConfigLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this to the example, as this is that straight use the generated structs that let me prefer this proposal.
let dto = TEdgeConfigDto::default(); | |
let config = TEdgeConfigReader::from_dto(&dto, &TEdgeConfigLocation); | |
let mut dto = TEdgeConfigDto::default(); | |
dto.mqtt.bind.address = Some(IpAddr::V4(Ipv4Addr::new(1, 2, 3, 4))); | |
let config = TEdgeConfigReader::from_dto(&dto, &TEdgeConfigLocation); | |
let host: IpAddress = config.mqtt.bind.address; | |
let port: NonZeroU16 = config.mqtt.bind.port; | |
println!("mqtt = {}:{}", host, port); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't lines 172 and 173 move values out of config? Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't lines 172 and 173 move values out of config? Is this intended?
No, because IpAddress
and NonZeroU16
both implement Copy
.
I haven't updated the examples to make this obvious yet, partially because I'm still playing around with the implementation, but the read-only configurations are slightly cumbersome to use. In order to depend on other configuration values, the input for generating these configurations is In an ideal world, the fields for these configurations would just hold The solution I've come up with for this (which doesn't quite match what's currently pushed): let config = TEdgeConfigReader::from_dto(...);
// We can read "normal" configurations simply by doing
assert_eq!(config.mqtt.port, 1883);
// But read-only configurations need `config` passing in as an argument
assert_eq!(config.device.id.try_read(&config).unwrap(), "device-id"); This will have a tiny impact as very few configurations are read-only, but it is slightly annoying. It does make it much easier, however, to deal with the fact the device id computation is fallible, as it means the error is an owned value. |
That's indeed unusual. But not a big deal as this will be used in really few places. This can even be abstracted with a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't have much comments regarding macro implementation, because TBH it's a bit hard to wrap my head around it all, and because I didn't really learn macros, especially procedural ones. But I do get the general gist of a macro, generating a DTO, in turn generating a Reader, that will be used by most of the code that just reads from the configuration. So I have just a few questions and perhaps nitpicks regarding the usage of the macro.
/// Path to the CA certificate used by MQTT clients to use when authenticating the MQTT broker | ||
#[tedge_config(example = "/etc/mosquitto/ca_certificates/ca.crt")] | ||
#[doku(as = "PathBuf")] | ||
#[serde(alias = "cafile")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really related to the macro stuff, but I'd rather see #[serde(rename = ...)]
used here and for other options with _
in rust name but no _
in the actual TOML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using alias here to cope with the existing keys (i.e. to accept them in tedge config
), but I was trying to make the keys more consistent than they were previously (mainly putting _
anywhere we have words that should be delimited but not by .
. If we want all of cafile
, certfile
etc. to remain as is, we can just modify the field name directly to match tedge config
.
#[doku(as = "u16")] | ||
port: NonZeroU16, | ||
|
||
auth: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm affraid I don't completely understand how optional settings work in this proposal. Can only final settings (mqtt.client.auth.cafile
, mqtt.client.auth.certfile
, etc.) be made optional and entire groups like mqtt.client.auth
cannot? How are settings marked as optional? I understand that when #[tedge_config(default = ...)]
is used, a default can be used in case a setting is not set, but what if there is no reasonable default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. My understanding is that will have to be manage at the application layer using functions like all_or_nothing
to interpret (None,None)
as None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no default, the field in the reader is optional. It possibly makes the code slightly less clear here than if you provide an explicit Option<_>
for each field that needs it, but I originally tried to implement it that way, and it's frankly just a pain to deal with in the macro, compared to deciding whether a field is optional based on the presence of a default or not.
Groups are never optional (but they are only serialized if they contain values for one or more fields) because this drastically increases the complexity involved in reading and updating the configuration. all_or_nothing
is a tool to try and deal with this problem. My intention is that where we need custom logic, like mqtt_config()
at the moment, we should just add that as methods on the reader as, like with much of this stuff, most of the configurations don't need especially complex handling.
|
||
/// The default device type | ||
#[tedge_config(example = "thin-edge.io")] | ||
#[tedge_config(rename = "type")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this just proxy to #[serde(rename)]
? Because e.g. on line 84 we use #[serde(alias = ...)]
, so when should we use one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does proxy to #[serde(rename)]
. Long story short, I couldn't find a nice way of parsing a (strict) subset of serde attributes that we do something special with and allowing the rest to just pass through to the underlying structs.
For rename
specifically, this not only affects the TOML deserialisation, but it also affects the key used in tedge config
commands, and this is why tedge_config
cares about renaming. This generally is only needed for fields like type
, where we want to avoid conflicting with Rust keywords as raw identifiers are difficult to read. The macro invocation will fail to compile if you try and use #[serde(rename)]
to prevent confusion. Looking back on how much effort goes into establishing if this exists, we could just as easily parse #[serde(rename)]
instead.
Aliasing fields, on the other hand, is used to migrate a key from one name to another, e.g. to add or remove underscores. This is used, via doku, to work out if we have renamed the field in the past to allow tedge config
to parse the key successfully (with a warning that the key is deprecated). Because the define_tedge_config
macro doesn't interact with this attribute directly, it doesn't need to parse it so we can leave it as #[serde(alias = ...)]
.
I think this question points to a larger possibly-faulty design decision I took. I assumed that it would be simple to be transparent and just use serde
attributes where possible, but maybe it might be clearer to replace the common cases (i.e. these two) with tedge_config
specific attributes, and give them names more relevant to our specific use case (e.g. #[tedge_config(deprecated_name = "cafile")]
instead of #[serde(alias = "cafile)]
). Alternatively, the existing solution might be the way, but with "which attribute do I use where" (more clearly) documented.
@Bravo555 @didier-wenzek, what are your opinions on those possible solutions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jarhodes314 The need for a specific #[tedge_config(rename)]
vs #[serde(rename)]
makes sense.
About alias
, I think the confusion is beyond just a lack of documentation. It's not obvious that #[serde(alias = ...)]
has a specific meaning for the tedge_config
and that this meaning is added thanks to doku
.
So if things were free I would vote for #[tedge_config(deprecated_name = "cafile")]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now modified this to match your suggestion. I've also added #[tedge_config(deprecated_key = "mqtt.port")]
. The difference between deprecated_name
and deprecated_key
is that deprecated_name
is for renaming a field/group, but deprecated_key
is for mapping one key to a different structure (e.g. mqtt.port
-> mqtt.bind.port
), which cannot just translate to #[serde(alias)]
. The macro will emit compiler errors if the user uses the wrong one (by checking if the input contains .
, if it does, it can only be a key, and if it doesn't, it can only be a (field/group) name as all fields are nested in at least one group).
I think having different attributes and a compiler error is easier to grasp than one attribute that changes its behaviour dynamically depending on whether the input contains .
, but I'm happy to change the behaviour if you disagree with this.
use syn::parse_macro_input; | ||
|
||
#[proc_macro] | ||
/// Defines the necessary structures to create a tedge config struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO if we do end up basically moving all the complexity into this proc macro, then this necessitates some really really comprehensive documentation of the macro. I'm talking some summary, enumeration of all possible attributes, ideally with examples, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for sure, I wanted to focus on writing the macro itself first to ensure there weren't major unforeseen issues that will severely affect the usability of the solution.
@@ -0,0 +1,87 @@ | |||
use crate::OptionalConfig; | |||
|
|||
pub fn all_or_nothing<T, U>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding correctly, the function takes only two arguments, what would it look like if we wanted to use it with 3 or 4 arguments? Do we duplicate the function however many times we need or is there a way to solve this more generically, which we'll use when we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment I'm trying to keep this bit simple as this is an area where we have a lot of scope for solving problems we don't really have. I will refactor this to a single tuple argument via a trait so we at least don't need to create a different all_or_nothing
function for each cardinality, but for now, I'm going to avoid coming up with any particularly clever solutions to generalising this, as I'm not convinced we really have a use case.
struct_field_paths(None, &fields) | ||
}); | ||
|
||
impl TEdgeTomlVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you propose to manage config versions and to have an automatic conversion. It's a good idea. We will have to double check with @reubenmiller for the proposed changes, though.
My questions are more about how this works.
- In a migration step like
mv("mqtt.port", MqttBindPort)
, what is thisMqttBindPort
. I failed to find a definition in the code. - Do we have to maintain two versions of the config. In the current state of the PR, we have both but what are the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MqttBindPort
is WritableKey::MqttBindPort
. WritableKey
is an enum generated by define_tedge_config!
, and I imported WritableKey::*
in this method to reduce noise.
The idea is that the configuration will be read, the version will be checked, and then migrations carried out if necessary and written back to tedge.toml. If we've applied any migrations, we re-read the configuration before creating the TEdgeConfigReader
from it.
I'm not expecting there to be a great burden from maintaining support for existing configuration keys and outdated toml files in future tedge versions, but it should be pretty simple to remove the relevant code if we wish to. As far as I understand, this logic should continue to just work for people upgrading tedge from whatever version, and tedge config will continue to accept (but warn about) the deprecated keys (there are far fewer of those in this PR than the previous PR due to the original key structure being largely preserved here). It uses a toml::Value
based representation to ensure that the migration doesn't corrupt any existing data if the version field somehow disappears from the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved
Proposed changes
PoC of alternative solution to #1916. This version uses a procedural macro to reduce boilerplate code in tedge config.
The syntax is currently (you can see the same example in
crates/common/tedge_config_macros/src/lib.rs
):Types of changes
Paste Link to the issue
Checklist
cargo fmt
as mentioned in CODING_GUIDELINEScargo clippy
as mentioned in CODING_GUIDELINESFurther comments