-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] RFC - Direct XML manipulation functions #35281
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
A function that can manipulate the xml string in place seems useful. That feels simpler than doing:
I am not experienced enough with XML to propose what kind of functions we'd need for that. Some OTTL guidelines that may be helpful when brainstorming ideas:
|
Thanks for your thoughts on this @TylerHelmuth. I'm thinking we could mostly rely on Converters here. They would take a Starting from the same xml document (and assuming this is the body):
<Data foo="bar" hello="world">
Some text
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
- <Three note="again">3</Three>
+ <Three>3</Three>
</Two>
</Data>
- <Data foo="bar" hello="world">
+ <Data>
+ <foo>bar</foo>
+ <hello>world</hello>
Some text
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
<Three>3</Three>
</Two>
</Data>
<Data>
<foo>bar</foo>
<hello>world</hello>
- Some text
+ <value>Some text</value>
<One>With text</One>
<Two>
<Three>3</Three>
<Four>4</Four>
<Three>3</Three>
</Two>
</Data> Then finally If I'm not mistaken, the could compose these inline, but it's not clear to me if there's much benefit to this. Personally I would just use separate statements:
Either way, I'm not necessarily proposing the exact Converters in this example, but I think these are pretty close to what we'd need in the short term. Just wanted to articulate better how I imagine the user would incrementally convert their xml into a JSON-equivalent format, and ultimately to a clean attributes map. |
Removing |
I've opened #35301 with the first concrete direct-xml manipulation converter as described above. If this looks good, I'll add a few more in the coming days and start work on the JSON-equivalent XML parser. |
@djaglowski would we gain efficiencies if we could pass around a parsed xml doc between functions? As I look at the proposed converters I am a little worried about having to parse and then convert back to a string each time. |
In theory, yes, but there are two requirements:
If we pass around Another way we could try to go is to define a general "XML Converter" which takes a list of XML-specific statements (which follow a different contract than OTTL Converters). Maybe something like: Maybe it's worthwhile but unless I'm missing something it seems like any of the above would require a decent amount of new machinery. |
Ya I was thinking of passing around xmlquery's parsed doc, but that feels leaky. Sticking with strings for now is ok. Can we add some xml "e2e" benchmarks? Something that benchmarks a reasonable use of the xml functions so we can see what users will experience? |
That's a good idea. I created #35471 to track this and I'll take care of it once we have some more converters to chain together. |
This adds a converter called `ParseSimplifiedXML`. This serves as the final step described in #35281, which will allow users to parse any arbitrary XML document into user-friendly result, by first transforming the document in place with other functions (e.g. #35328 and #35364) and then calling this function. --------- Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
I think this issue can be closed now that the linked PRs have been merged. It's possible that there are other functions we will need, but now that we've established the basic pattern, we should consider them as needed. |
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
XML is frequently used in traditional logging frameworks, but within the collector and downstream tools it is often difficult to manipulate.
Before going further, I believe it would be helpful to define a term: "JSON-equivalent". Basically, a
plog.LogRecord
's body or attributes can be losslessly converted to or from JSON (or YAML, or some other formats).Notably, XML is not JSON-equivalent, at least not generally. However, it is possible to define a subset of XML which is JSON-equivalent, which we could call "JSON-equivalent XML". (More on this below.)
We currently have a
ParseXML
function, but in order to deal with the fact that XML is not generally JSON-equivalent, we are producing an encoding of XML. The encoding is necessarily JSON-equivalent, but ultimately it is an overly verbose representation that OTTL is not well suited to manipulate in ways that respect the encoding. That means that our current strategy for parsing XML has very limited value because users find it difficult to work with in OTTL and at least in some backends.Describe the solution you'd like
In order to better support XML, I believe we should provide the following:
Example
Suppose we have the following XML document:
In order to make this JSON-equivalent, we can't have both attributes and child elements. We also can't have raw values at the same level as child elements. A JSON-equivalent version might look something like this:
This can then be converted directly into a useful object:
In order to accomplish this migration, we need some functionality:
Notably, there is a reasonable amount of subjectivity here. In the example there are two instances of the
Three
tag, but they end up in different formats because of the presence of an attribute on one of them. This may be problematic for the user and there are likely many similar situations. I believe a general solution will require offering a set of composable functions that allow the user to make their own decisions about how to manipulate the representation into a JSON-equivalent format that meets their needs.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: