-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an optional extended parser subclass (YAMLAnchorReplayingFactory
) able to inline anchors
#502
Add an optional extended parser subclass (YAMLAnchorReplayingFactory
) able to inline anchors
#502
Conversation
Ok, this is interesting but I need to digest it for a bit to know what I think. :) As to unit tests, they are under |
Ok, no big objections, but I think we should figure out better name -- "Ext" would be ok for general-purpose extension, but here it is quite specific. I don't have an immediate good name suggestion either, but let's think of something. |
Here are some ideas for a better name:
I personally like the wording "expand"/"follow" more since the anchored nodes are not inlined in the sense that they are recreated as duplicated objects and changing any potential line number information. Using "Refs" instead of "Anchors"/"Aliases" allows for future changes in terms of supporting whatever will be the replacement for Merge Keys (only in YAML 1.1 which is deprecated) in future YAML versions. |
I think this also needs some basic recursion depth / event list size limits, similar to the core decoders. While this feature is opt-in, any CVE reports will still have knock-on effects on other yaml users, and possibly other dataformats-text users. |
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLParserExt.java
Outdated
Show resolved
Hide resolved
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLParserExt.java
Outdated
Show resolved
Hide resolved
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLParserExt.java
Outdated
Show resolved
Hide resolved
How about YAMLAnchorReplayingParser |
@yawkat Do you use any coding style checker or specific IDE in the project that helps being compliant with the coding style? |
Cleanup: renamed `global_depth` to `globalDepth` Cleanup: removed accidential added `;`
@yawkat I added all the suggested changes and renamed. Do you prefer semi-linear history? If so I will rebase the branch. |
@yawkat I added the upper limits as suggested but I wonder whether this should be better made configurable via some property set through on the factory. |
Looks like code doesn't quite compile (refactoring?). |
d6686c9
to
ffdafed
Compare
@cowtowncoder Fixed it and added a unittest similar to the one existing for the YAMLParser |
Thank you @HeikoBoettger-KarlStorz. One last thing (aside from my needing to re-review this) is CLA. Unless we've asked for it earlier (it only needs to be sent once & it's good for all other contributions), it's here: https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf and is usually easiest to do by printing, filling & signing, scanning/taking photo, emailing to Once that is received I can proceed with merging, assuming review goes well. |
@cowtowncoder Since there were two contributor agreement file, I used the one for corporate and it was sent to clas at fasterxml dot com on in 03.10.2024 as written in https://github.com/FasterXML/jackson/blob/master/contributor-agreement-corporate.txt. Is that the wrong mail address? If I need to get the other pdf signed by somebody in the company, this will probably take some time. I can resent the document to the mail address you have mentioned. |
The corporate CLA is correct. You do not need to get the individual contributor CLA signed by your company. |
@HeikoBoettger-KarlStorz as per @yawkat's comment sending either one is enough. However, need to send |
CLA received. Also, ref to Will need to review soon, bit overloaded but hoping to get back to this soon. |
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLAnchorReplayingParser.java
Outdated
Show resolved
Hide resolved
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLAnchorReplayingParser.java
Outdated
Show resolved
Hide resolved
yaml/src/main/java/com/fasterxml/jackson/dataformat/yaml/YAMLAnchorReplayingParser.java
Show resolved
Hide resolved
This looks good to me, but since I am leaving for an extended weekend, will not merge yet (partly since merge to |
YAMLAnchorReplayingFactory
) able to inline anchors
Merged in 2.19 for 2.19.0. NOTE: did NOT include in Porting this over later on should not be much more work, just noting for now. If anyone wants to I'd of course be happy to help with a PR. |
This pull request proposes a workaround for the lack of being able validate yaml files using anchors. The solution is to simply inline all anchors.
What's the matter with the upstream YAMLParser?
The YAMLParser currently produces the JsonEvents representing String and sets an additional status that a consumer needs to retrieve calling an additional method. However when trying to perform a YAMLSchema validation the current approach is to use the existing JSONSchema validation code which doesn't know about this extra method. As a result validating a yaml-file using aliases and anchors is failing to validate due to lack of anchor support.
The backlog contains already some issue talking about the need of a bigger refactoring to fully support anchors and aliases.
What's is the workaround?
This pull request adds a YAMLParserExt class (I didn't had a good idea how to name it) and a factory to make use of it. When the events are requested the implementation remembers the events produced by yaml content that has an anchor. Later when a alias is found and the anchor exists, it simply returns the same events that were part of the anchored content. As a result the schema validator will see the document as if the anchored content was inlined.
What's the risk pulling this in?
Repeating the events might have some unknown side-effects such as that a document will appear larger in terms of produced events than it actually is or code expecting the events to be unique might not work. I also haven't looked into whether there might be issue determining the position (line & column number) of whatever produced the event in the original document. However it works for use to validate our yaml-files and using a separate class to implement this option doesn't break any existing code.
Some thoughts:
$ref
from JSON but I can imaging the yaml aliases can be positioned more flexible.In case you consider pulling this in, please let me know where I can find an example of a unit test, I am more than happy to provide one. The approach I have taken in our internal project was simply running the validation across some yaml documents containing anchors and checking the validation results.