-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formalize Vector's configuration schema #9115
Comments
IIRC cue can't directly export json schema today - would we be looking to upstream support, or do some intermediary transformations with other tools? |
Very much agree with the problem statements here. What's not addressed in the proposal is how to ensure the actual Rust code remains in sync with the specification. Given that we want a single source of truth, that leaves our options as either (1) generating Rust code from an external source of truth (e.g. JSON Schema, cue, etc), or (2) generating those external representations from the Rust code. There's prior art for going in either direction (e.g. quicktype and the My biased opinion is to lean towards generating JSON schemas from the Rust code. The real source of truth is how Vector behaves, which is governed by the code. Making that code generated would obscure it, making it harder to debug, onboard contributors, etc. And if we really want to ensure that the behavior is always in line with an external definition, it'd be hard to avoid that compile step anyway. Probably the nicest option for generating JSON schemas from our config structs would be to follow the pattern of procedural macros like #[derive(VectorConfig)]
pub struct S3SinkConfig {
pub bucket: String,
pub key_prefix: Option<Template>,
pub options: S3Options,
pub region: RegionOrEndpoint,
pub encoding: EncodingConfig<Encoding>,
#[default("gzip")]
pub compression: Compression,
pub batch: BatchConfig,
pub request: TowerRequestConfig,
#[deprecated(since = "0.15.0", replacement = "auth")]
pub assume_role: Option<String>,
pub auth: AwsAuthentication,
} This would let us have a library of common behaviors in terms of nesting, notification for deprecated fields, parsing human-friendly units for options like counts and byte sizes, etc. It could also get rid of some of our existing boilerplate like And most importantly, we could either delegate to something like I'm not entirely set on the idea given the complexity and downsides, but I do think it has enough potential to be worth some discussion. |
Closing this as in-lieu of #12141 . This is more of a problem statement. |
There are a number of problems with Vector's configuration that can be solved with a single source of truth that drives documentation, validation, and translation:
.cue
files for documentation purposes, but the real schema is defined within the code viaserde
macros. This creates misalignment that results in bug reports and surprising configuration errors.Related Issues
humio_metrics
sink ignoreshost
config #4903buffer.max_events
requires settingbuffer.type
even thoughtype
defaults tomemory
#7326proxy
settings appear not to apply #8864I tried my best to reference all of the relevant issues, but I am certain there are many more.
Cross cutting concerns
Proposal
To solve this we should converge on a common schema specification for Vector's configuration. JSON schema jumps out as the winner since it is easily understood by humans, parseable, extendable, and supported by many different languages and tools. We can achieve this a couple of ways:
cue
definitions.cue
definitions.I prefer 1 since
cue
is much more flexible. It allows us to reduce boilerplate, incorporate stricter validation, etc. I think we should consider decoupling our reference cue definitions from the website and defining a separate library. This library's single purpose is to expose Vector's internal configuration schema in a purpose agnostic format. Then our cue data for documentation can include this library and augment it as necessary.The text was updated successfully, but these errors were encountered: