Flatten schema pkg #4538

MrAlias · 2023-09-20T21:08:52Z

Fixes #4534

Flattens the schema/v1.0 and schema/v1.1 packages into schema.
A new approach is taken to file format compatibility:
- Have the schema package parse the latest file format and all versions less than that into the same struct
- When the file format introduces incompatible changes, introduce a v2 of schema
Don't use a type declaration when an equivalent built-in type will do
Add the All and Resource types to replace the generic Attributes type that was used by both
- This ensures that if one of these fields changes in the future the change can be independently made
Rename VersionDef to Changeset to better match its purpose as the differences between two successive versions

Remove examples from README.md.

codecov · 2023-09-20T21:12:35Z

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (c573785) 82.3% compared to head (6ef6e67) 82.3%.
Report is 3 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #4538   +/-   ##
=====================================
  Coverage   82.3%   82.3%           
=====================================
  Files        226     227    +1     
  Lines      18557   18523   -34     
=====================================
- Hits       15286   15258   -28     
+ Misses      2983    2979    -4     
+ Partials     288     286    -2

Files	Coverage Δ
schema/schema.go	`100.0% <100.0%> (ø)`
schema/parser.go	`66.6% <66.6%> (ø)`

... and 2 files with indirect coverage changes

pellared

I did my best to make a high-level design review first.

schema/parser.go

pellared · 2023-09-22T07:50:48Z

schema/parser.go

+// If r contains an invalid schema URL an error will be returned.
+func Parse(r io.Reader) (*Schema, error) {
+	d := yaml.NewDecoder(r)
+	d.KnownFields(true)


Putting one long comment with many remarks and thoughts as I find them very related to each other.

I have some problems with the parsing logic and interpreting the specification.

Extensibility?

From stable spec:

the schema file format is extensible.

What does it mean? Can we add custom fields that may be added in future versions to forward compatibility? Should consider removing this line? On the other hand, the author maybe just wanted to say that the file_format may evolve in future version (and the "extensibility" term is not used correctly/precisely). Extensibility in "formats" usually means that custom things can be added thanks to some extension points e.g. https://www.w3schools.com/xml/el_extension.asp. As far as I see our current implementation uses d.KnownFields(true). It errors when encounters unknown fields for the currently supported version. Our implementation does not offer forward compatibility. In my opinion, our implementation is OK and the problem is that the specification is not clear. Since it lacks normative wording, we may leave it as is.

Parsing logic vs file_format version

I noticed that our implementation will NOT throw an error when the file has file_format: 1.0.0, but uses a field introduced from 1.1.0 version. For example, if you change file_format to 1.0.0 in valid-example.yaml the ParseFile will successfully parse without any problem. Take notice that, the experimental (sic!) file format says:

Defines the file format. MUST be set to 1.1.0.

Therefore, I am not sure if the current implementation complies with the specification.

If I understand correctly, if we would like to be so strict, we would need to have dedicated structs for each minor version. I find it not pragmatic (an overkill). I would NOT like doing it and I would rather change the specification (if needed) than our implementation.

For reference let me try to describe the parsing strategy that browsers and Docker Compose have for parser HTML/YAML files.

Parser just interprets the "major" version. The minor version specified by the user is "ignored" by the user. It is only informative for humans (what is more, the latest Docker version even ignores the version field and uses only the latest version of the schema when parsing).

Parser emit warnings (errors that can be ignored) in case of unknown fields that it does not understand. However, it should still return the parsed object (forward compatible). The parser could also silently ignore unknown fields.

In my opinion, the current implementation is OK. Right now we simply parse against the latest supported version. We can decide later what we will do when v2 of file_version is defined.

Possible safe forward compatibility

If r contains a Schema with a file format version higher than FileFormat, an error will be returned.

We could still parse a schema if it does NOT use any new field (a kind of forward compatibility which offers support until does NOT use unknown features). I think we could offer this later as this would not be a breaking change. I would view it as an enhancement. However, it is YAGNI we will have to bump the file_version support as soon as new OTel semconv schema uses the file_version.

Summary

I like the current design and implementation. However, it would be good to ensure that the parsing implementation complies with the specification. Especially the part that we are able to parse files which have file_format: 1.0.0 while also having syntax from a newer format e.g. 1.1.0. In my opinion, the current implementation is OK. I believe it would be more practical to clarify or modify the specification instead of altering the proposed implementation.

It also worth to notice that any validation logic changes will affect https://github.com/open-telemetry/build-tools/tree/main/schemas. As far as I reviewed, the current PR does NOT introduce any changes in the parsing+validation logic.

I had considered restricting parsing to only allow parsing of a specific version of the file format, but I don't think that's the user experience we want to provide. If a user has a schema that incorrectly states it is 1.0.0 and uses 1.1.0 fields, I would rather produce a valid schema for the user. I doubt they care that the schema was incorrect, they just want the data.

I agree. Should we add tests for this behavior and should it be documented?

I agree. Should we add tests for this behavior and should it be documented?

Yeah, that sounds good to me. I'll update in a bit.

There is another section in the file format that specifies that this is the intended behavior:

Correspondingly:

Consumers of the schema file SHOULD NOT attempt to interpret the schema file if the MAJOR version number is different (higher or lower) than what the consumer supports. Consumers of the schema file SHOULD NOT attempt to interpret the schema file if the MINOR version number is higher than what the consumer supports. Consumers MAY ignore the PATCH number.

pellared · 2023-09-22T08:07:14Z

schema/schema.go

+		return fmt.Errorf("invalid file format version: %q: %w", s.FileFormat, err)
+	}
+
+	if FileFormat.LessThan(ffVer) {


How about adding a validation that ensures that file_version is greater or equal than 1.0.0?
This would protect us agains 0.x.y and also same pattern could be used when (if) we add support for 2.x.y.

MadVikingGod

A README.md would be useful.

pellared · 2023-09-29T08:08:43Z

A README.md would be useful.

schema/parser.go contains package documentation. We could move it to schema/doc.go.

MrAlias · 2024-01-31T01:03:34Z

Closing. Not needed: #4503 (comment)

MrAlias added 6 commits September 19, 2023 14:15

Move v1.1 up to schema

fa227a3

Move ast and types into schema

5bb9a08

Move parse validation into schema

6f1ed64

Remove v1.0

51f9128

Add example test

b7aa728

Remove examples from README.md.

Move README docs into pkg docs

a19f34b

MrAlias and others added 7 commits September 20, 2023 14:14

Fix import comment

78c8318

Rename Transform to Changeset

d479e47

Replace SemConvVersion with a string

af5abaf

Comment the Changeset fields

975ca51

Upgrade gopkg.io/yaml from v2 to v3 in schema (open-telemetry#4535)

c9efe30

Add changes to changelog

dcce489

Add TODO for slimmed down semver.Version use

76eae4e

MrAlias force-pushed the flatten-schema branch from dfd49dd to 76eae4e Compare September 21, 2023 20:35

Merge branch 'main' into flatten-schema

5e3813c

MrAlias marked this pull request as ready for review September 21, 2023 20:42

MrAlias requested review from Aneurysm9, evantorrie, XSAM, dashpole, MadVikingGod, pellared, hanyuancheung and dmathieu as code owners September 21, 2023 20:42

pellared reviewed Sep 22, 2023

View reviewed changes

MrAlias and others added 3 commits September 22, 2023 11:55

Merge branch 'main' into flatten-schema

37e2b48

Merge branch 'main' into flatten-schema

bffcf25

Remove breaking changes comment

fc0fda0

MadVikingGod approved these changes Sep 28, 2023

View reviewed changes

pellared mentioned this pull request Oct 9, 2023

[config] introduce config package open-telemetry/opentelemetry-go-contrib#4376

Merged

Merge branch 'main' into flatten-schema

db29929

MrAlias marked this pull request as draft January 24, 2024 19:25

Replace FileFormat with FileFormatRange

6ef6e67

MrAlias force-pushed the flatten-schema branch from 5de3aa9 to 6ef6e67 Compare January 26, 2024 16:07

MrAlias closed this Jan 31, 2024

MrAlias deleted the flatten-schema branch January 31, 2024 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten schema pkg #4538

Flatten schema pkg #4538

MrAlias commented Sep 20, 2023 •

edited

Loading

codecov bot commented Sep 20, 2023 •

edited

Loading

pellared left a comment

pellared Sep 22, 2023 •

edited

Loading

MrAlias Sep 22, 2023

pellared Sep 22, 2023

MrAlias Sep 22, 2023

MadVikingGod Sep 28, 2023

pellared Sep 22, 2023 •

edited

Loading

MadVikingGod left a comment

pellared commented Sep 29, 2023

MrAlias commented Jan 31, 2024

Flatten schema pkg #4538

Flatten schema pkg #4538

Conversation

MrAlias commented Sep 20, 2023 • edited Loading

codecov bot commented Sep 20, 2023 • edited Loading

Codecov Report

pellared left a comment

Choose a reason for hiding this comment

pellared Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Extensibility?

Parsing logic vs file_format version

Possible safe forward compatibility

Summary

MrAlias Sep 22, 2023

Choose a reason for hiding this comment

pellared Sep 22, 2023

Choose a reason for hiding this comment

MrAlias Sep 22, 2023

Choose a reason for hiding this comment

MadVikingGod Sep 28, 2023

Choose a reason for hiding this comment

pellared Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

MadVikingGod left a comment

Choose a reason for hiding this comment

pellared commented Sep 29, 2023

MrAlias commented Jan 31, 2024

MrAlias commented Sep 20, 2023 •

edited

Loading

codecov bot commented Sep 20, 2023 •

edited

Loading

pellared Sep 22, 2023 •

edited

Loading

pellared Sep 22, 2023 •

edited

Loading