Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new yamlfilecontent test to independent schema #91

Closed
evgenyz opened this issue Dec 4, 2019 · 73 comments · Fixed by #90
Closed

Add new yamlfilecontent test to independent schema #91

evgenyz opened this issue Dec 4, 2019 · 73 comments · Fixed by #90
Labels
Add to Existing Schema A proposal for the addition of a new Test/Object/State to an existing OVAL schema Platform Independent Issue related to the Platform Independent schema
Milestone

Comments

@evgenyz
Copy link
Contributor

evgenyz commented Dec 4, 2019

Abstract
Nowadays a lot of applications store configuration in YAML or JSON formats. It would be helpful for security content authors to have a straightforward test for elements of these configurations, like the one for XML-formatted configuration files, xmlfilecontent. Because of the nature of YAML documents (they are kind of a database) certain parts of the proposed behavior are also similar to ldap57 and sql57 tests.

Link to Proposal
#90

Additional Context
The proposal concentrate effort around YAML format (as JSON is a subset of YAML) featuring YAML Path/JSONPath subset as the addressing mechanism.

Reference Implementation
YAML Path/JSONPath subset (for libyaml): https://github.com/OpenSCAP/yaml-filter.
Probe (yamfilecontent): https://github.com/OpenSCAP/openscap (branch: maint-1.3), updated version will be available in v1.3.4.
Examples: https://github.com/OpenSCAP/openscap/tree/maint-1.3/tests/probes/yamlfilecontent

Example of YAML configuration:

other:
  theThing: 
    - True
    - False
storage:
  files:
  - contents:
      source: data:...%0A
      verification: {}
    filesystem: root
    mode: 420
    path: /etc/tmpfiles.d/cleanup-cni.conf
  - contents:
      source: data:...%0A
      verification: {}
    filesystem: root
    mode: 422
    path: /etc/systemd/system.conf.d/kubelet-cgroups.conf

and test:

<ind:yamlfilecontent check="all" check_existence="all_exist" comment="Check file access modes" id="test_file_access" version="1">
    <ind:object object_ref="object_file_access_modes" />
</ind:yamlfilecontent_test>
<ind:yamlfilecontent_object id="object_file_access_mode" version="1">
    <ind:filepath>/etc</ind:filepath>
    <ind:filename>some.yaml</ind:filename>
    <ind:yamlpath>$.storage.files[:]['mode','path']</ind:yamlpath>
</ind:yamlfilecontent_object>
<ind:yamlfilecontent_state id="state_file_access_mode" version="1">
    <ind:value datatype="record">
        <ind:field name="mode" datatype="int" operation="greater than or equal">422</ind:field>
        <ind:field name="path" operation="pattern match">^/etc/</ind:field>
    </ind:value>
</ind:yamlfilecontent_state>
...
<ind:yamlfilecontent_item id="item_file_access_mode" status="exists">
    <ind:value datatype="record">
        <ind:field name="mode" datatype="int">420</ind:field>
        <ind:field name="path">/etc/tmpfiles.d/cleanup-cni.conf</ind:field>
    </ind:value>
    <ind:value datatype="record">
        <ind:field name="mode" datatype="int">422</ind:field>
        <ind:field name="path">/etc/systemd/system.conf.d/kubelet-cgroups.conf</ind:field>
    </ind:value>
</ind:yamlfilecontent_item>

Due to the limitations imposed on the record and it's child, field (name attribute must not contain A-Z and should not be empty), entities the test has some workarounds, which could be dropped in the newer version of the core schema (and the test):

<ind:yamlfilecontent check="all" check_existence="all_exist" comment="Check the thing" id="test_the_thing_1" version="1">
    <ind:object object_ref="object_the_thing_1" />
</ind:yamlfilecontent_test>
<ind:yamlfilecontent_object id="object_the_thing_1" version="1">
    <ind:filepath>/etc</ind:filepath>
    <ind:filename>some.yaml</ind:filename>
    <ind:yamlpath>$.other</ind:yamlpath>
</ind:yamlfilecontent_object>
<ind:yamlfilecontent_state id="state_the_thing_1" version="1">
    <ind:value datatype="record">
        <!-- if the target is a map, and key has any of [A-Z^], it would be lower-cased and escaped with '^' symbol -->
        <ind:field name="the^thing" datatype="boolean">True</ind:field>
    </ind:value>
</ind:yamlfilecontent_state>

<ind:yamlfilecontent check="all" check_existence="all_exist" comment="Check the thing" id="test_the_thing_2" version="1">
    <ind:object object_ref="object_the_thing_2" />
</ind:yamlfilecontent_test>
<ind:yamlfilecontent_object id="object_the_thing_2" version="1">
    <ind:filepath>/etc</ind:filepath>
    <ind:filename>some.yaml</ind:filename>
    <ind:yamlpath>$.other.theThing[:]</ind:yamlpath>
</ind:yamlfilecontent_object>
<ind:yamlfilecontent_state id="state_the_thing_2" version="1">
    <ind:value datatype="record">
        <!-- if the target is a scalar or a sequence of scalars, the key(s) would be collected as '#' -->
        <ind:field name="#" datatype="boolean">True</ind:field>
    </ind:value>
</ind:yamlfilecontent_state>

Inline or Var-Referenced YAML Documents

There is a way to use YAML documents captured from other objects in the system or just hard-coded to the test:

<ind:yamlfilecontent_test>
    <ind:object object_ref="object_2" />
    <ind:state state_ref="state_2" />
</ind:yamlfilecontent_test>

<ind:yamlfilecontent_object id="object_1">
    <ind:file>example.yaml</ind:file> <!-- yaml: "{foo: bar}" -->
    <ind:yamlpath>$.yaml</ind:yamlpath>
</ind:yamlfilecontent_object>

<ind:yamlfilecontent_object id="object_2">
    <ind:content var_ref="var_1">
    <ind:yamlpath>$.foo</ind:yamlpath>
</ind:yamlfilecontent_object>

<ind:yamlfilecontent_state id="state_2">
    <ind:value datatype="record">
        <ind:field name="#" datatype="string" operation="equal">bar</ind:field>
    </ind:value>
</ind:yamlfilecontent_state>

<local_variable id="var_1" datatype="string">
    <object_component object_ref="object_1" item_field="value" record_field="#"/>
</local_variable>

This could help dealing with russian doll style YAML files where other YAML documents are stored as strings.

Type Handling
As it was already mentioned, this test is pretty similar to xmlfilecontent except for one thing: type handling. There are no types in XML files, everything is a string. In contrast, YAML (and JSON) documents come with the concept of explicitly and implicitly typed data.

The reference implementation uses YAML 1.2 Core Schema to infer types of scalars in the document. It then matches type of the scalar with OVAL type provided in yamlfilecontent_state and then, if types are equal, compares the values according to the type comparison rules. If types are different the value is considered as not matching, and no type coercion of any kind is performed.

Reasoning: #91 (comment)

@Memnarch
Copy link

Memnarch commented Dec 9, 2019

is there a native library? I know there is a good set of admins who are not willing to install python on a (server) machine.

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 9, 2019

is there a native library?

You are asking for a library that is native relative to what?

The link is about the YAML Path definition. All OVAL-enabled tools would be free to implement it in any language and framework their authors see it fit.

@wmunyan
Copy link
Contributor

wmunyan commented Dec 9, 2019

First off, I think this would be a great addition to OVAL. My first recommendation would be to restrict the yamlpath, similar to how textfilecontent and xmlfilecontent does. This applies specifically to your example that included the "array" notation (the [0:-1] bit).

The textfilecontent54 construct added the instance element, an Entity(Object|State)IntValue to determine which of multiple results are returned.

The modified content example would look like this:

<ind:yamlfilecontent_object id="object_file_access_mode" version="1">
    <ind:path>/etc</ind:path>
    <ind:filename>some.yaml</ind:filename>
    <ind:yamlpath>/storage/files/mode</ind:yamlpath>
    <ind:instance datatype="int" operation="greater than or equal">1</ind:instance>
</ind:yamlfilecontent_object>

The above object would then yield 2 items, the first with instance of 1 and a value_of of 420 and the second with instance of 2 and a value_of of 422.

Make sense? Feedback most certainly welcome.
Cheers,
-Bill M. (CIS)

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 10, 2019

The textfilecontent54 construct added the instance element, an Entity(Object|State)IntValue to determine which of multiple results are returned.

There is no instance in xmlfilecontent (after which I modelled YAML one), and it is able to return set of values. Do you see it as a drawback of xmlfilecontent? Should it be improved as well?

@solind
Copy link

solind commented Dec 10, 2019

Hi @wmunyan and @evgenyz,

The xmltextfilecontent54_object doesn't have an instance entity because it uses XPath, which is an existing W3C-specified language, and XPath can already be used to traverse or specify virtually any set of XML paths. So there isn't really any reason to add an instance to the xmltextfilecontent54_object.

YAMLPath, however, seems to be very lightly-specified in comparison to XPath. That raises an interoperability issue.

I would point out that over in the macos schema, we have the relatively new plist511_object. A plist file can be specified in multiple different ways -- as a binary file, and XML file, or an ASCII file (which format is actually quite similar to YAML). If there is a standard XML conversion process for YAML files, it would be possible to perform that conversion then use an XPath to retrieve one or more nodes. Then instead of a YAMLPath, the yamlfilecontent_object could specify an XPath for node selection.

Is that a good idea, or a horrible idea?

@Memnarch
Copy link

It sound like you want to convert YAML to XML and then use XPath on it?
I'd not prefer that, as it is more resource demanding. Something to access the YAML, which is a superset of JSon, is something i'd prefer.

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 17, 2019

There is no official "mapping" of YAML to XML and, moreover, as XML does not have some concepts of YAML (maps, anchors, scalar types), the conversion process will most likely cause interoperability issues as well. Also XPath for such an XML would look very ugly because it would have to take into account all the extra information encoded in attributes.

In theory we could use JSONPath as the addressing mechanism, but it is also not quite well-specified and quite not feature-complete form the point of view of YAML.

Do you have any proposals on how we could make YAML Path to be more heavily-specified?

@solind
Copy link

solind commented Dec 18, 2019

Hi @evgenyz, the XPath 1.0 specification was the product of a W3C working group and can be found here: https://www.w3.org/TR/1999/REC-xpath-19991116/

From the reference in your OP, it seems like it might be difficult to divorce yamlpath from its implementation in Python.

At least there is a specification document for YAML:
https://yaml.org/spec/1.2/spec.html

There also seems to be some effort at achieving a YAML/XML binding standard -- with an XSL available for converting from XML to YAML:
https://yaml.org/xml

It would probably be simpler to describe a generic XML binding for YAML-serialized data, then leverage XPath, than to fully describe yamlpath in a specification, but I'm open to pursuing either avenue.

Without a mature specification, or a mature set of implementations (I have only found the single implementation of yamlpath, in Python), I think we will find ourselves spending a lot of time struggling with content interoperability.

@DavidRies
Copy link
Member

What about using JSONpath? It looks like there are a variety of mature implementations and for this purpose isn't YAML identical to JSON? YAML is technically a superset of JSON, but I'm not sure the extra YAML features are relevant (the most used is comments which don't exist in JSON).

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 18, 2019

@DavidRies It is possible in theory, but I have a couple of concerns about it:

  • the link you gave is actually "the specification", which is pretty loose (just like YAML Path);
  • it lacks YAML-specific features (e.g. anchors), JSON is a subset and JSONPath is a subset as well;
  • implementations are usually tied to JSON parsing libraries, which are of no use for YAML;
  • a yamlfilecontent test (and probe) with JSONPath would look ugly and half-baked to my taste.

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 18, 2019

Hey, @solind!
Yes, the XML to YAML conversion is pretty straightforward, but it is not easy to reverse: YAML documents could be very complex, and types and anchors could matter.

I'm working on a formal specification of a subset of YAML Path right now, trying to make it simple but still powerful in order to contain the possible complexity of the OVAL-related implementations.

@solind
Copy link

solind commented Dec 18, 2019

Hi @evgenyz, I look forward to seeing it!

Some of the conventions in yamlpath are ambiguous, e.g., foo.0 is the first element under foo, but, what if there is an element named 0?

--
  foo:
    bar: baz
    0: aha!

Also, for the regular expressions that may be included in yamlpath, you'll potentially want to refer to the regex limitations in other OVAL checks, e.g., the textfilecontent54_object.

These are the kinds of issues that will need to be dealt with rigorously.

@evgenyz
Copy link
Contributor Author

evgenyz commented Dec 18, 2019

@solind foo.0 (as well as /foo/0, foo[0]) will bring aha! because this path segment is interpreted as an index for underlying sets (arrays) or as a numeric key for underlying dictionaries (maps). It probably should also apply to a string key '0' in a map (when defined as a path segment, like /foo/0 or foo.0), but I'm yet to figure out this particular case.

I'm still deciding if regex-based filter should be available in this subset. Rationale: if you have to traverse your structured configuration file with regular expressions then there is something wrong with your configuration file.

@solind
Copy link

solind commented Dec 18, 2019

@evgenyz In the first two examples of anchored values from the link in your OP, it says these are equivalent:

aliases[0] (explicit array element number)
aliases.0 (implicit array element number in dot-notation)

But in my example YAML, they are not necessarily equivalent. Edge-cases like this have to be explored in a thorough specification document. It was just an example of under-specification that popped into my head. I am confident there will be many similar issues.

I don't necessarily disagree about your argument concerning structured configurations and the suitability of regular expressions for exploring them. I have seen my share of garbage xmlfilecontent_objects, though. I could see some joker wanting to search a YAML file for values containing regular expressions like (md5|sha1)...

@solind
Copy link

solind commented Dec 18, 2019

@evgenyz Oops, never mind about my edge case, as you said it's a map reference not an array value reference, that makes sense. Nevertheless, that wasn't very explicit (I missed it anyhow) in that document.

@evgenyz
Copy link
Contributor Author

evgenyz commented Jan 10, 2020

So here is our subset proposal: https://github.com/evgenyz/libyaml-yamlpath-filter/wiki/YAML-Path-(OVAL-subset).

What is different from original YAML Path specification:

  • no Array-of-Hashes Pass-Through Selection. Rationale: this behaviour can cause surprising results in complex documents easily allowing to shoot in someone's leg;
  • no Collectors. It might be seen as a good idea to have it, but OVAL itself does have a decent mechanism of result aggregation and filtering;
  • no Regular Expression matches operator. Rationale: there is a way to match a value against a regex in OVAL, and searching for a key with regexes is not a good idea;
  • added possibility to omit either of indexes in a slice filer (behaves like in Python). Rationale: there is no way to select the last element via the slice syntax.

The spec is work in progress. I'm going to add more examples (including cases someone else might bring here) and highlight the differences in the document after I finish checking the details of the Python implementation.

@isimluk
Copy link

isimluk commented Jan 23, 2020

Good discussion here. And good job @evgenyz on the spec and initial implementation!

Specifying yaml-path is huge achievement. This effort outgrows other OVAL proposals here in scope. On top of that, I believe that yaml-path may grow in usefulness outside oval in years to come.

That being said yaml-path is very valuable, yet not mature enough to be widely accepted yet. Governance wise this may yield us (OVAL community) some challenges.

The easiest way to approach these challenges may be to donate the code to the OVAL-Community github account under permissive license. So the review process is done with the same level of diligence that is applied to the other oval proposals. WDYT?

@matejak
Copy link
Contributor

matejak commented Feb 24, 2020

The easiest way to approach these challenges may be to donate the code to the OVAL-Community github account under permissive license. So the review process is done with the same level of diligence that is applied to the other oval proposals. WDYT?

It is not totally clear whether @isimluk meant the actual C code, or to the YAMLPath specification that would serve as part of the test specification when he referred to "code".
In any way, we are developing both in the yaml-filter repository - code is there, and the specification is in the wiki. As of now, it is published under MIT license, let us know whether that is good enough for you.

@isimluk
Copy link

isimluk commented Mar 2, 2020

In any way, we are developing both in the yaml-filter repository - code is there, and the specification is in the wiki. As of now, it is published under MIT license, let us know whether that is good enough for you.

Good enough for me. Thank You!

@evgenyz
Copy link
Contributor Author

evgenyz commented Mar 3, 2020

I've updated the issue and this PR with actual links to the reference implementation and changes to schemas we made during the implementation. Feedback is very welcome.

@solind
Copy link

solind commented Mar 4, 2020

Thanks for putting this information together. I have been in back-to-back conferences but I am hoping to get a chance to review this early next week and post feedback.

@solind
Copy link

solind commented Mar 9, 2020

Two questions about the specification link you provided, @evgenyz:

  1. With slices, default values (when they are not specified) would appear to be: [0, <length>, 1], is that correct?
  2. The JSON sample doesn't seem to contain an anchor. So how does the example magically associate .foo[1][&bar] with the value, True? Or, is that example not applicable to the JSON?

Is there a declarative specification for anchors anywhere (as opposed to the by-example documentation, which is all I have seen)? I ask because, nothing tells me the following is legal or illegal:

foo:
  - bar: True
    arr: [1, &fizz 2, 3, 4]
  - baz: False
    other_arr: *fizz

Implying: .foo.baz.other_arr == .foo.bar.arr[1::] == [2, 3, 4]

I don't personally work with YAML very much, so, I have no intuition for these conventions.

Thanks!

@evgenyz
Copy link
Contributor Author

evgenyz commented Mar 16, 2020

Sorry for the late response, I was on a vacation and missed notification from GH.

To answer your questions:

  1. technically it is [0:MAX_INT:1] but the idea is to return everything in case we have [:] or [::] for slice syntax;
  2. anchors are only sensible for YAML documents, path containing anchor segment would not match anything in a JSON document.

Node Anchors in specs: https://yaml.org/spec/1.2/spec.html#id2785586. It is a legal example, anchor & can precede pretty much anything in the YAML document, and alias * then can be used anywhere the referred element itself is eligible.

Validator/parser/converter: https://yaml-online-parser.appspot.com/ (handy tool for when you are in doubt about YAML document).

@matejak
Copy link
Contributor

matejak commented May 4, 2020

@solind David, is there something that we can do to move this proposal forward?

@evgenyz
Copy link
Contributor Author

evgenyz commented May 4, 2020

BTW, the reference implementation has landed in OpenSCAP v1.3.3 (soon to be available in Fedora, other distros would pick it up at their own pace).

@solind
Copy link

solind commented May 4, 2020

My anchor example (from my question) doesn't actually seem to work in the yaml-online-parser. This highlights my only concern, which is that we need to be able to say: "here is a specification you can reference" so that implementors can be assured to have the ability to develop interoperating implementations.

We could say that OpenSCAP's implementation is the reference implementation. But libyaml + yaml_filter is on the order of 20k lines of code. That's a lot of code to have to re-implement in the absence of a decent specification. It may delay adoption.

@evgenyz
Copy link
Contributor Author

evgenyz commented May 4, 2020

My anchor example (from my question) doesn't actually seem to work in the yaml-online-parser.

How exactly it does not work there?
Here is what I get as JSON if I put your YAML there:

{
  "foo": [
    {
      "arr": [
        1, 
        2, 
        3, 
        4
      ], 
      "bar": true
    }, 
    {
      "other_arr": 2, 
      "baz": false
    }
  ]
}

The "other_arr" in second element of "foo" has integer value 2, exactly what you were referring to by the &fizz anchor. It is just so happened that your implication is not entirely correct. For your implication to work you have to write your YAML like that:

foo:
  - bar: True
    arr: &fizz [1, 2, 3, 4]
  - baz: False
    other_arr: *fizz

But both examples are absolutely correct!

This highlights my only concern, which is that we need to be able to say: "here is a specification you can reference" so that implementors can be assured to have the ability to develop interoperating implementations.

We could say that OpenSCAP's implementation is the reference implementation. But libyaml + yaml_filter is on the order of 20k lines of code. That's a lot of code to have to re-implement in the absence of a decent specification. It may delay adoption.

The YAML specification is stable, mature and widely-recognized. There is a plethora (the list is under Projects) of YAML parser implementations in a lot of languages. Implementer is free to use any of them, including MIT-licensed cross-platform libyaml library, written in C.

The only novelty here is the YAML addressing mechanism (yaml-filter as we call it), which is a subset of a cross between YAML Path and JSONPath, intentionally with a very limited capabilities (but sufficient in our opinion), in less than 700 lines of C (sic!) code with the only dependency of libc. Implementing it in Java or Python would take maybe 200 LOC or less; it is really, really simple.

@evgenyz
Copy link
Contributor Author

evgenyz commented May 4, 2020

To be perfectly clear, your implication that the reference .foo.baz.other_arr would be equal to the array element .foo.bar.arr[1::] and, at the same time, would be an alias of [2, 3, 4] is possible with this YAML document:

foo:
  - bar: True
    arr: [1, &fizz [1, 2, 3, 4], 3, 4]
  - baz: False
    other_arr: *fizz

Which is probably not what you initially had in mind. But, anyway, this is also a valid YAML example.

@solind
Copy link

solind commented May 4, 2020

With respect, @evgenyz, if you look closer at my example/question, you'll see I was really asking whether an anchor could point to a list element, and if so, whether that element would become the head of a list. When you answered that it was a legal example, I (incorrectly) assumed you meant that in answer to my (follow-on) question.

Surely the YAML path mechanism and C implementation in OpenSCAP is simple enough, provided that in whatever language an implementor chooses, there happens to be a YAML parser with an event model similar enough to libyaml's to make the re-implementation simple.

I am at this very moment evaluating snakeyaml in Java for this purpose, but implementors in Go or C# may not be so fortunate.

I would like to know whether anyone else in our community thinks referencing the C implementation with the libyaml dependency will suffice in place of a specification. I have already agreed that we could.

@solind
Copy link

solind commented Aug 19, 2020

I'm not sure where it is heading, can you please give an example?

Okay, let's take my proposal from a few weeks ago (which you didn't like), in which our YAML-path is used to select a whole node, which we simply select as a text block:

    <ind-sc:yamlfilecontent_item id="1">
      <ind-sc:path>/tmp</ind-def:path>
      <ind-sc:filename>some.yaml</ind-def:filename>
      <ind-sc:yamlpath>$.foo[:]['bar','baz']</ind-def:yamlpath>
      <ind-def:value_of datatype="string">bar: True
baz: True</ind-sc:value-of>
    </ind-sc:yamlfilecontent_item>

We could also add a new textvariablecontent_test/object/state. This would be exactly like the textfilecontent54_test, except that instead of referencing a file, it references a variable, e.g.:

<ind:textvariablecontent_test id="oval:something:tst:1" state_operator="AND" ...>
    <ind:object object_ref="oval:something:obj:1" />
    <ind:state state_ref="oval:something:ste:1" />
    <ind:state state_ref="oval:something:ste:2" />
</ind:textvariablecontent_test>

<ind:textvariablecontent_object id="oval:something:obj:1" version="1">
    <ind:behaviors ignore_case="false" multiline="true" singleline="false"/>
    <ind:var_ref>oval:something:var:1</ind:var_ref>
    <ind:pattern>^(bar|baz):\s.+$</ind:pattern>
    <ind:instance operation="greater than or equal">1</ind:instance>
</ind:textvariablecontent_object>

<ind:textvariablecontent_state id="oval:something:state:1" version="1">
    <ind:value_of datatype="int">bar: True</ind:value_of>
</ind:textvariablecontent_state>

<ind:textvariablecontent_state id="oval:something:state:2" version="1">
    <ind:value_of datatype="int">baz: True</ind:value_of>
</ind:textvariablecontent_state>

<oval-def:local_variable id="oval:something:var:1">
    <oval-def:object_component object_ref="oval:0:obj:5" item_field="value_of"/>
</oval-def:local_variable>

This would be a potentially useful test anyway. But, if the two are then used together, it allows us to leverage the simplicity of the textfilecontent_test, and narrow down what we're doing with YAML-path to make it a pure node selection mechanism. Wouldn't this combination make it possible to implement virtually any needed test against a YAML configuration file?

It is not necessarily an elegant solution, but I think it is a relatively simple one for us to specify.

Alternatively, is it possible to add EntityStateMapType and co. to the independent schema along with the yamlfilecontent_test and move it later to the core?

Unfortunately I don't believe it's possible, as it breaks the isolation between the platform schemas.

@solind
Copy link

solind commented Aug 19, 2020

Alternatively, is it possible to add EntityStateMapType and co. to the independent schema along with the yamlfilecontent_test and move it later to the core?

Let me revise this ... actually I am not sure. Perhaps it could work... I'll have to think about it.

@evgenyz
Copy link
Contributor Author

evgenyz commented Sep 2, 2020

Alternatively, is it possible to add EntityStateMapType and co. to the independent schema along with the yamlfilecontent_test and move it later to the core?

Let me revise this ... actually I am not sure. Perhaps it could work... I'll have to think about it.

So, did you come to any conclusion? As it seems to me, there is nothing that would prevent adding types we need to the independent schema (keeping in mind that the test would be added there as well, and nothing else would use these types for now). Except for maybe some ugliness of the approach. But we are making trade-offs here, I would prefer ugly schema over ugly content.

@solind
Copy link

solind commented Sep 2, 2020

Hi @evgenyz , I am sorry I did not, but thank you for reminding me about this!

On the one hand, there are certainly instances of simple datatypes that exist in the platform schemas only. But in those cases, restrictions are imposed on "string" datatypes. In this case, we will want to remove a restriction from the record datatype. Is it possible for a subclass to remove a restriction? I do not know!

There is also the problem that we will be restricted from modifying the ComplexDatatypeEnumeration (as it's in a common schema), so technically these must still be records. The benefit is that we can leverage the language's existing tooling around the record datatype.

If you put together a proposed schema modification, I will test it in our product and see if it can work, or if there's something I don't know that will get in the way! If it's possible in XML/XSD, it should also be possible in Java/JAXB.

@matejak
Copy link
Contributor

matejak commented Sep 3, 2020

I see that you explore possibilities of referencing the core schema. But wouldn't it be possible to copy-paste the datatype definition from the core schema (except that case limitation that we want to get rid of), and simply redefine it in the independent platform schema?
I think that this kind of a controlled code duplication in the schema could be a lesser evil - everybody would know what's going on, and this could get deprecated in the future, i.e. in SCAP 1.4, when the core schema is updated.

@evgenyz
Copy link
Contributor Author

evgenyz commented Sep 4, 2020

So, here is what we are dealing with (oval-definitions-schema.xsd):

    <xsd:complexType name="EntityStateRecordType">
        <xsd:annotation>
            <xsd:documentation>The EntityStateRecordType defines an entity that consists of a number of uniquely named fields. This structure is used for representing a record from a database query and other similar structures where multiple related fields must be collected at once. Note that for all entities of this type, the only allowed datatype is 'record' and the only allowed operation is 'equals'. During analysis of a system characteristics item, each field is analyzed and then the overall result for elements of this type is computed by logically anding the results for each field and then applying the entity_check attribute.</xsd:documentation>
            <xsd:documentation>Note the datatype attribute must be set to 'record'.</xsd:documentation>
            <!-- 
                NOTE: The restriction that the only allowed datatype is 'record' is enforced by scheamtron rules placed on each entity that uses this type. 
                This is due to the fact that this type is developed as an extension of the oval-def:EntityStateComplexBaseType. This base type declares a datatype attribute. to restrict the 
                datatype attribute to only allow 'record' would need a restriction. We cannot do both and xsd:extension and an xsd:restriction at the same time.
            -->
            <xsd:documentation>Note the operation attribute must be set to 'equals'.</xsd:documentation>
            <xsd:documentation>Note the var_ref attribute is not permitted and the var_check attribute does not apply.</xsd:documentation>
            <xsd:documentation>Note that when the mask attribute is set to 'true', all child field elements must be masked regardless of the child field's mask attribute value.</xsd:documentation>
        </xsd:annotation>
        <xsd:complexContent>
            <xsd:extension base="oval-def:EntityStateComplexBaseType">
                <xsd:sequence>
                    <xsd:element name="field" type="oval-def:EntityStateFieldType" minOccurs="0"
                        maxOccurs="unbounded"/>
                </xsd:sequence>
            </xsd:extension>
        </xsd:complexContent>
    </xsd:complexType>
    <xsd:complexType name="EntityStateFieldType">
        <xsd:annotation>
            <xsd:documentation>The EntityStateFieldType defines an element with simple content that represents a named field in a record that may contain any number of named fields. The EntityStateFieldType is much like all other entities with one significant difference, the EntityStateFieldType has a name attribute</xsd:documentation>
            <xsd:documentation>The required name attribute specifies a unique name for the field. Field names are lowercase and must be unique within a given parent record element. When analyzing system characteristics an error should be reported for the result of a field that is present in the OVAL State, but not found in the system characteristics Item.</xsd:documentation>
            <xsd:documentation>The optional entity_check attribute specifies how to handle multiple record fields with the same name in the OVAL Systems Characteristics file. For example, while collecting group information where one field is the represents the users that are members of the group.  It is very likely that there will be multiple fields with a name of 'user' associated with the group.  If the OVAL State defines the value of the field with name equal 'user' to equal 'Fred', then the entity_check attribute determines if all values for field entities must be equal to 'Fred', or at least one value must be equal to 'Fred', etc.</xsd:documentation>
            <xsd:documentation>Note that when the mask attribute is set to 'true' on a field's parent element the field must be masked regardless of the field's mask attribute value.</xsd:documentation>
        </xsd:annotation>
        <xsd:simpleContent>
            <xsd:extension base="xsd:anySimpleType">
                <xsd:attribute name="name" use="required">
                    <xsd:annotation>
                        <xsd:documentation>A string.</xsd:documentation>
                    </xsd:annotation>
                    <xsd:simpleType>
                        <xsd:restriction base="xsd:string">
                            <xsd:pattern value="[^A-Z]+"/>
                        </xsd:restriction>
                    </xsd:simpleType>
                </xsd:attribute>
                <xsd:attributeGroup ref="oval-def:EntityAttributeGroup"/>
                <xsd:attribute name="entity_check" type="oval:CheckEnumeration" use="optional"
                    default="all"/>
            </xsd:extension>
        </xsd:simpleContent>
    </xsd:complexType>

The EntityStateRecordType is enumerated in (oval-common-schema.xsd):

     <xsd:simpleType name="ComplexDatatypeEnumeration">
          <xsd:annotation>
               <xsd:documentation>The ComplexDatatypeEnumeration simple type defines the complex legal datatypes that are supported in OVAL. These datatype describe the values of individual entities where the entity has some complex structure beyond simple string like content.</xsd:documentation>
          </xsd:annotation>
          <xsd:restriction base="xsd:string">
          <xsd:enumeration value="record">
               <xsd:annotation>
                    <xsd:documentation>The record datatype describes an entity with structured set of named fields and values as its content. The only allowed operation within OVAL for record values is 'equals'. Note that the record datatype is not currently allowed when using variables.</xsd:documentation>
               </xsd:annotation>
          </xsd:enumeration>
          </xsd:restriction>
     </xsd:simpleType>
     <xsd:simpleType name="DatatypeEnumeration">
          <xsd:annotation>
               <xsd:documentation>The DatatypeEnumeration simple type defines the legal datatypes that are used to describe the values of individual entities. A value should be interpreted according to the specified type. This is most important during comparisons. For example, is '21' less than '123'? will evaluate to true if the datatypes are 'int', but will evaluate to 'false' if the datatypes are 'string'. Another example is applying the 'equal' operation to '1.0.0.0' and '1.0'. With datatype 'string' they are not equal, with datatype 'version' they are.</xsd:documentation>
          </xsd:annotation>
          <xsd:union memberTypes="oval:SimpleDatatypeEnumeration oval:ComplexDatatypeEnumeration"/>
     </xsd:simpleType>

Subclassing EntityStateRecordType is impossible because we would have to add the subclass back to oval-common-schema.xsd. Subclassing EntityStateFieldType is futile because, while we could add another attribute like 'key' without any restrictions, we can't get rid of the 'name' attribute of the base class. The state would look like:

<state>
   <value datatype="record">
      <case_sensitive_field type="ind-def:EntityStateCaseSensitiveFieldType" name="" key="SomeKey" datatype="boolean">True</case_sensitive_field>
   </value>
</state>

And I'm not even sure if this is a 100% legit construct.

IMHO this is less awkward:

<state>
   <value datatype="record">
      <field name="^some^key" datatype="boolean">True</field>
   </value>
</state>

@evgenyz
Copy link
Contributor Author

evgenyz commented Feb 1, 2021

We have another improvement for this test, but it seems to be not very in the spirit of the OVAL. We are thinking about adding <content> entity as an alternative to <filepath> and <file> to be able to check a YAML document contained in a variable. This will help dealing with russian doll style YAML files where other YAML documents are stored as strings. What do you think about it?

Example:
YAML

yaml: "{foo: bar}"

OVALs

<ind:yamlfilecontent_test>
    <ind:object object_ref="object_2" />
    <ind:state state_ref="state_2" />
</ind:yamlfilecontent_test>

<ind:yamlfilecontent_object id="object_1">
    <ind:file>example.yaml</ind:file> <!-- yaml: "{foo: bar}" -->
    <ind:yamlpath>$.yaml</ind:yamlpath>
</ind:yamlfilecontent_object>

<ind:yamlfilecontent_object id="object_2">
    <ind:content var_ref="var_1">
    <ind:yamlpath>$.foo</ind:yamlpath>
</ind:yamlfilecontent_object>

<ind:yamlfilecontent_state id="state_2">
    <ind:value datatype="record">
        <ind:field name="#" datatype="string" operation="equal">bar</ind:field>
    </ind:value>
</ind:yamlfilecontent_state>

<local_variable id="var_1" datatype="string">
    <object_component object_ref="object_1" item_field="value" record_field="#"/>
</local_variable>

@evgenyz
Copy link
Contributor Author

evgenyz commented Feb 1, 2021

@solind Also, since we think that we have (apart from the improvement above) an implementation more or less sufficient for general use, I would like to ask you to summarize what we are missing for this proposal to get some traction in the direction of being standardized?

The test even had some baptism by fire in the Compliance as Code project (in OCP4 data streams, where it is used pretty
intensively: https://github.com/ComplianceAsCode/content/tree/master/ocp4).

@solind
Copy link

solind commented Feb 1, 2021

Hi @evgenyz - I actually like the idea of variable content, and I think it could be similarly useful if we added an analogous construct to the textfilecontent54_test. However, should we standardize on any safeguards to insure that the YAML content doesn't become very large? Or is that not really a concern in your experience.

I still have implementing YAMLPath in Java on my "to do" list, but I don't know when I'll get around to it. Hopefully I will have time to do it soon after our upcoming release. Two qualifying implementations are needed for a proposal to move from "develop" into "stable" in OVAL. However, given the scope, I don't know if I can identify offhand what might be missing until I actually attempt implementing it, and so we may sort of need a second implementation to finish hashing it out.

If I were to make our implementation an open source BSD-licensed or Apache-licensed project, would you be willing to help test it, to compare with the OSCAP implementation?

@evgenyz
Copy link
Contributor Author

evgenyz commented Feb 1, 2021

Hey!

I actually like the idea of variable content, and I think it could be similarly useful if we added an analogous construct to the textfilecontent54_test. However, should we standardize on any safeguards to insure that the YAML content doesn't become very large? Or is that not really a concern in your experience.

Well, so far implementing https://github.com/OpenSCAP/yaml-filter/wiki/YAML-Path-Definition does not require loading the whole document into memory. Probe could work with very huge documents, given that there is no path like $[:][:][:]...5000_times...[:] in the object. And even if there is such a thing, it would actually cause more problems on the OVAL side, pretty similar to a text file with a veeeeeery long line I guess.

If I were to make our implementation an open source BSD-licensed or Apache-licensed project, would you be willing to help test it, to compare with the OSCAP implementation?

Yeah, sure.

@solind
Copy link

solind commented Feb 1, 2021

I meant, specifically, to keep the OVAL variables from becoming too large. File size isn't the issue, but we don't want megabytes of text encapsulated in variable values.

@evgenyz
Copy link
Contributor Author

evgenyz commented Feb 1, 2021

As far as I understand right now nothing in OVAL forbids one to capture a one-line file of size say 1 GB into a variable using object_component. I recon this is already a problem, no matter if this variable would be used anywhere or wouldn't.

@evgenyz
Copy link
Contributor Author

evgenyz commented Feb 1, 2021

So, if <local_variable> spec. would say that it can only hold 100 MB of data, I would not be objecting :)

@solind
Copy link

solind commented Feb 1, 2021

You're right, I suppose it's already a possible problem!

Perhaps instead of having a fixed value defined in the specification, we could add a "maxsize" attribute to the local_variable, and define how to handle situations where the maxsize is exceeded (e.g., generate an error? truncate?)

@solind
Copy link

solind commented Feb 1, 2021

Of course... that doesn't solve the problem of the OVAL item having a 1GB sub-element.

@evgenyz
Copy link
Contributor Author

evgenyz commented Mar 18, 2021

OpenSCAP's implementation of the latest addition (<content>): OpenSCAP/openscap#1711

@wmunyan
Copy link
Contributor

wmunyan commented Mar 18, 2021

Just to add my 2 cents, I am in favor of the content element in this test as well, and I agree with the contributors here that it would be useful in textfilecontent and xmlfilecontent checks, as well.

@solind if you're working on something that would have an open-source license to it for evaluating YAMLPath, I can certainly try to contribute as well as test things out. Let me know how I might help.

@evgenyz
Copy link
Contributor Author

evgenyz commented Mar 18, 2021

@solind if you're working on something that would have an open-source license to it for evaluating YAMLPath, I can certainly try to contribute as well as test things out. Let me know how I might help.

Meanwhile you can play with OpenSCAP's implementation of it and maybe give us all some feedback or improvement proposals: https://github.com/OpenSCAP/yaml-filter. It even has a binary similar to jq to play with documents and paths.

@wmunyan
Copy link
Contributor

wmunyan commented Apr 26, 2021

From Area Supervisors meeting: Perhaps make a note in the schema documentation that the content element is meant to either (a) hold YAML content snippets inline, or to refer to YAML content using the @var_ref attribute.

@wmunyan
Copy link
Contributor

wmunyan commented Apr 26, 2021

Hi gang, sorry I had one other question in my notes that I forgot to ask about. One of the stipulations in this test indicated the fact that record field element names had to be lowercase. Do we know why that restriction was put in place back in the day? I was curious to know if removing that restriction was a viable option? I know that is a big change to a core schema, so I dont want to make rash decisions...

@evgenyz
Copy link
Contributor Author

evgenyz commented Apr 28, 2021

I was curious to know if removing that restriction was a viable option? I know that is a big change to a core schema, so I dont want to make rash decisions...

The change would be backward-compatible. And it's not like it has a lot of sense (there are multiple ways to break LDAP even without A...Z). But still it would be a change in the spec, so the decision here is administrative in nature.

@evgenyz
Copy link
Contributor Author

evgenyz commented Apr 28, 2021

From Area Supervisors meeting: Perhaps make a note in the schema documentation that the content element is meant to either (a) hold YAML content snippets inline, or to refer to YAML content using the @var_ref attribute.

Well, filename, yamlpath and other entities can refer to a variable using @var_ref. It's a base type feature, nothing special about it in yamlfilecontent.

Anyhow, The content element specifies the YAML document body. It also could reference a variable containing the document using var_ref attribute. Note that "equals" is the only valid operator for the content entity. is the up-to-date description. How should I reword it?

@wmunyan
Copy link
Contributor

wmunyan commented Apr 28, 2021

@evgenyz thanks for pointing out the current description of the content element. In my opinion, what you have is good.

@wmunyan wmunyan linked a pull request May 10, 2021 that will close this issue
@wmunyan
Copy link
Contributor

wmunyan commented May 10, 2021

PR has been merged; resolving this issue

@wmunyan wmunyan closed this as completed May 10, 2021
@vanderpol vanderpol changed the title Proposal to add new YAML content test Add yamlfilecontent test to independent schema Sep 27, 2024
@vanderpol vanderpol changed the title Add yamlfilecontent test to independent schema Add new yamlfilecontent test to independent schema Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Add to Existing Schema A proposal for the addition of a new Test/Object/State to an existing OVAL schema Platform Independent Issue related to the Platform Independent schema
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants