-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new yamlfilecontent test to independent schema #91
Comments
is there a native library? I know there is a good set of admins who are not willing to install python on a (server) machine. |
You are asking for a library that is native relative to what? The link is about the YAML Path definition. All OVAL-enabled tools would be free to implement it in any language and framework their authors see it fit. |
First off, I think this would be a great addition to OVAL. My first recommendation would be to restrict the The The modified content example would look like this:
The above object would then yield 2 items, the first with Make sense? Feedback most certainly welcome. |
There is no |
The xmltextfilecontent54_object doesn't have an instance entity because it uses XPath, which is an existing W3C-specified language, and XPath can already be used to traverse or specify virtually any set of XML paths. So there isn't really any reason to add an instance to the xmltextfilecontent54_object. YAMLPath, however, seems to be very lightly-specified in comparison to XPath. That raises an interoperability issue. I would point out that over in the macos schema, we have the relatively new plist511_object. A plist file can be specified in multiple different ways -- as a binary file, and XML file, or an ASCII file (which format is actually quite similar to YAML). If there is a standard XML conversion process for YAML files, it would be possible to perform that conversion then use an XPath to retrieve one or more nodes. Then instead of a YAMLPath, the yamlfilecontent_object could specify an XPath for node selection. Is that a good idea, or a horrible idea? |
It sound like you want to convert YAML to XML and then use XPath on it? |
There is no official "mapping" of YAML to XML and, moreover, as XML does not have some concepts of YAML (maps, anchors, scalar types), the conversion process will most likely cause interoperability issues as well. Also XPath for such an XML would look very ugly because it would have to take into account all the extra information encoded in attributes. In theory we could use JSONPath as the addressing mechanism, but it is also not quite well-specified and quite not feature-complete form the point of view of YAML. Do you have any proposals on how we could make YAML Path to be more heavily-specified? |
Hi @evgenyz, the XPath 1.0 specification was the product of a W3C working group and can be found here: https://www.w3.org/TR/1999/REC-xpath-19991116/ From the reference in your OP, it seems like it might be difficult to divorce yamlpath from its implementation in Python. At least there is a specification document for YAML: There also seems to be some effort at achieving a YAML/XML binding standard -- with an XSL available for converting from XML to YAML: It would probably be simpler to describe a generic XML binding for YAML-serialized data, then leverage XPath, than to fully describe yamlpath in a specification, but I'm open to pursuing either avenue. Without a mature specification, or a mature set of implementations (I have only found the single implementation of yamlpath, in Python), I think we will find ourselves spending a lot of time struggling with content interoperability. |
What about using JSONpath? It looks like there are a variety of mature implementations and for this purpose isn't YAML identical to JSON? YAML is technically a superset of JSON, but I'm not sure the extra YAML features are relevant (the most used is comments which don't exist in JSON). |
@DavidRies It is possible in theory, but I have a couple of concerns about it:
|
Hey, @solind! I'm working on a formal specification of a subset of YAML Path right now, trying to make it simple but still powerful in order to contain the possible complexity of the OVAL-related implementations. |
Hi @evgenyz, I look forward to seeing it! Some of the conventions in yamlpath are ambiguous, e.g., foo.0 is the first element under foo, but, what if there is an element named 0?
Also, for the regular expressions that may be included in yamlpath, you'll potentially want to refer to the regex limitations in other OVAL checks, e.g., the textfilecontent54_object. These are the kinds of issues that will need to be dealt with rigorously. |
@solind I'm still deciding if regex-based filter should be available in this subset. Rationale: if you have to traverse your structured configuration file with regular expressions then there is something wrong with your configuration file. |
@evgenyz In the first two examples of anchored values from the link in your OP, it says these are equivalent:
But in my example YAML, they are not necessarily equivalent. Edge-cases like this have to be explored in a thorough specification document. It was just an example of under-specification that popped into my head. I am confident there will be many similar issues. I don't necessarily disagree about your argument concerning structured configurations and the suitability of regular expressions for exploring them. I have seen my share of garbage xmlfilecontent_objects, though. I could see some joker wanting to search a YAML file for values containing regular expressions like |
@evgenyz Oops, never mind about my edge case, as you said it's a map reference not an array value reference, that makes sense. Nevertheless, that wasn't very explicit (I missed it anyhow) in that document. |
So here is our subset proposal: https://github.com/evgenyz/libyaml-yamlpath-filter/wiki/YAML-Path-(OVAL-subset). What is different from original YAML Path specification:
The spec is work in progress. I'm going to add more examples (including cases someone else might bring here) and highlight the differences in the document after I finish checking the details of the Python implementation. |
Good discussion here. And good job @evgenyz on the spec and initial implementation! Specifying yaml-path is huge achievement. This effort outgrows other OVAL proposals here in scope. On top of that, I believe that yaml-path may grow in usefulness outside oval in years to come. That being said yaml-path is very valuable, yet not mature enough to be widely accepted yet. Governance wise this may yield us (OVAL community) some challenges. The easiest way to approach these challenges may be to donate the code to the OVAL-Community github account under permissive license. So the review process is done with the same level of diligence that is applied to the other oval proposals. WDYT? |
It is not totally clear whether @isimluk meant the actual C code, or to the YAMLPath specification that would serve as part of the test specification when he referred to "code". |
Good enough for me. Thank You! |
I've updated the issue and this PR with actual links to the reference implementation and changes to schemas we made during the implementation. Feedback is very welcome. |
Thanks for putting this information together. I have been in back-to-back conferences but I am hoping to get a chance to review this early next week and post feedback. |
Two questions about the specification link you provided, @evgenyz:
Is there a declarative specification for anchors anywhere (as opposed to the by-example documentation, which is all I have seen)? I ask because, nothing tells me the following is legal or illegal:
Implying: I don't personally work with YAML very much, so, I have no intuition for these conventions. Thanks! |
Sorry for the late response, I was on a vacation and missed notification from GH. To answer your questions:
Node Anchors in specs: https://yaml.org/spec/1.2/spec.html#id2785586. It is a legal example, anchor Validator/parser/converter: https://yaml-online-parser.appspot.com/ (handy tool for when you are in doubt about YAML document). |
@solind David, is there something that we can do to move this proposal forward? |
BTW, the reference implementation has landed in OpenSCAP v1.3.3 (soon to be available in Fedora, other distros would pick it up at their own pace). |
My anchor example (from my question) doesn't actually seem to work in the yaml-online-parser. This highlights my only concern, which is that we need to be able to say: "here is a specification you can reference" so that implementors can be assured to have the ability to develop interoperating implementations. We could say that OpenSCAP's implementation is the reference implementation. But libyaml + yaml_filter is on the order of 20k lines of code. That's a lot of code to have to re-implement in the absence of a decent specification. It may delay adoption. |
How exactly it does not work there? {
"foo": [
{
"arr": [
1,
2,
3,
4
],
"bar": true
},
{
"other_arr": 2,
"baz": false
}
]
} The "other_arr" in second element of "foo" has integer value 2, exactly what you were referring to by the foo:
- bar: True
arr: &fizz [1, 2, 3, 4]
- baz: False
other_arr: *fizz But both examples are absolutely correct!
The YAML specification is stable, mature and widely-recognized. There is a plethora (the list is under Projects) of YAML parser implementations in a lot of languages. Implementer is free to use any of them, including MIT-licensed cross-platform libyaml library, written in C. The only novelty here is the YAML addressing mechanism (yaml-filter as we call it), which is a subset of a cross between YAML Path and JSONPath, intentionally with a very limited capabilities (but sufficient in our opinion), in less than 700 lines of C (sic!) code with the only dependency of libc. Implementing it in Java or Python would take maybe 200 LOC or less; it is really, really simple. |
To be perfectly clear, your implication that the reference foo:
- bar: True
arr: [1, &fizz [1, 2, 3, 4], 3, 4]
- baz: False
other_arr: *fizz Which is probably not what you initially had in mind. But, anyway, this is also a valid YAML example. |
With respect, @evgenyz, if you look closer at my example/question, you'll see I was really asking whether an anchor could point to a list element, and if so, whether that element would become the head of a list. When you answered that it was a legal example, I (incorrectly) assumed you meant that in answer to my (follow-on) question. Surely the YAML path mechanism and C implementation in OpenSCAP is simple enough, provided that in whatever language an implementor chooses, there happens to be a YAML parser with an event model similar enough to libyaml's to make the re-implementation simple. I am at this very moment evaluating snakeyaml in Java for this purpose, but implementors in Go or C# may not be so fortunate. I would like to know whether anyone else in our community thinks referencing the C implementation with the libyaml dependency will suffice in place of a specification. I have already agreed that we could. |
Okay, let's take my proposal from a few weeks ago (which you didn't like), in which our YAML-path is used to select a whole node, which we simply select as a text block:
We could also add a new textvariablecontent_test/object/state. This would be exactly like the textfilecontent54_test, except that instead of referencing a file, it references a variable, e.g.:
This would be a potentially useful test anyway. But, if the two are then used together, it allows us to leverage the simplicity of the textfilecontent_test, and narrow down what we're doing with YAML-path to make it a pure node selection mechanism. Wouldn't this combination make it possible to implement virtually any needed test against a YAML configuration file? It is not necessarily an elegant solution, but I think it is a relatively simple one for us to specify.
Unfortunately I don't believe it's possible, as it breaks the isolation between the platform schemas. |
Let me revise this ... actually I am not sure. Perhaps it could work... I'll have to think about it. |
So, did you come to any conclusion? As it seems to me, there is nothing that would prevent adding types we need to the independent schema (keeping in mind that the test would be added there as well, and nothing else would use these types for now). Except for maybe some ugliness of the approach. But we are making trade-offs here, I would prefer ugly schema over ugly content. |
Hi @evgenyz , I am sorry I did not, but thank you for reminding me about this! On the one hand, there are certainly instances of simple datatypes that exist in the platform schemas only. But in those cases, restrictions are imposed on "string" datatypes. In this case, we will want to remove a restriction from the record datatype. Is it possible for a subclass to remove a restriction? I do not know! There is also the problem that we will be restricted from modifying the ComplexDatatypeEnumeration (as it's in a common schema), so technically these must still be records. The benefit is that we can leverage the language's existing tooling around the record datatype. If you put together a proposed schema modification, I will test it in our product and see if it can work, or if there's something I don't know that will get in the way! If it's possible in XML/XSD, it should also be possible in Java/JAXB. |
I see that you explore possibilities of referencing the core schema. But wouldn't it be possible to copy-paste the datatype definition from the core schema (except that case limitation that we want to get rid of), and simply redefine it in the independent platform schema? |
So, here is what we are dealing with (oval-definitions-schema.xsd): <xsd:complexType name="EntityStateRecordType">
<xsd:annotation>
<xsd:documentation>The EntityStateRecordType defines an entity that consists of a number of uniquely named fields. This structure is used for representing a record from a database query and other similar structures where multiple related fields must be collected at once. Note that for all entities of this type, the only allowed datatype is 'record' and the only allowed operation is 'equals'. During analysis of a system characteristics item, each field is analyzed and then the overall result for elements of this type is computed by logically anding the results for each field and then applying the entity_check attribute.</xsd:documentation>
<xsd:documentation>Note the datatype attribute must be set to 'record'.</xsd:documentation>
<!--
NOTE: The restriction that the only allowed datatype is 'record' is enforced by scheamtron rules placed on each entity that uses this type.
This is due to the fact that this type is developed as an extension of the oval-def:EntityStateComplexBaseType. This base type declares a datatype attribute. to restrict the
datatype attribute to only allow 'record' would need a restriction. We cannot do both and xsd:extension and an xsd:restriction at the same time.
-->
<xsd:documentation>Note the operation attribute must be set to 'equals'.</xsd:documentation>
<xsd:documentation>Note the var_ref attribute is not permitted and the var_check attribute does not apply.</xsd:documentation>
<xsd:documentation>Note that when the mask attribute is set to 'true', all child field elements must be masked regardless of the child field's mask attribute value.</xsd:documentation>
</xsd:annotation>
<xsd:complexContent>
<xsd:extension base="oval-def:EntityStateComplexBaseType">
<xsd:sequence>
<xsd:element name="field" type="oval-def:EntityStateFieldType" minOccurs="0"
maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="EntityStateFieldType">
<xsd:annotation>
<xsd:documentation>The EntityStateFieldType defines an element with simple content that represents a named field in a record that may contain any number of named fields. The EntityStateFieldType is much like all other entities with one significant difference, the EntityStateFieldType has a name attribute</xsd:documentation>
<xsd:documentation>The required name attribute specifies a unique name for the field. Field names are lowercase and must be unique within a given parent record element. When analyzing system characteristics an error should be reported for the result of a field that is present in the OVAL State, but not found in the system characteristics Item.</xsd:documentation>
<xsd:documentation>The optional entity_check attribute specifies how to handle multiple record fields with the same name in the OVAL Systems Characteristics file. For example, while collecting group information where one field is the represents the users that are members of the group. It is very likely that there will be multiple fields with a name of 'user' associated with the group. If the OVAL State defines the value of the field with name equal 'user' to equal 'Fred', then the entity_check attribute determines if all values for field entities must be equal to 'Fred', or at least one value must be equal to 'Fred', etc.</xsd:documentation>
<xsd:documentation>Note that when the mask attribute is set to 'true' on a field's parent element the field must be masked regardless of the field's mask attribute value.</xsd:documentation>
</xsd:annotation>
<xsd:simpleContent>
<xsd:extension base="xsd:anySimpleType">
<xsd:attribute name="name" use="required">
<xsd:annotation>
<xsd:documentation>A string.</xsd:documentation>
</xsd:annotation>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:pattern value="[^A-Z]+"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attributeGroup ref="oval-def:EntityAttributeGroup"/>
<xsd:attribute name="entity_check" type="oval:CheckEnumeration" use="optional"
default="all"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType> The <xsd:simpleType name="ComplexDatatypeEnumeration">
<xsd:annotation>
<xsd:documentation>The ComplexDatatypeEnumeration simple type defines the complex legal datatypes that are supported in OVAL. These datatype describe the values of individual entities where the entity has some complex structure beyond simple string like content.</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="record">
<xsd:annotation>
<xsd:documentation>The record datatype describes an entity with structured set of named fields and values as its content. The only allowed operation within OVAL for record values is 'equals'. Note that the record datatype is not currently allowed when using variables.</xsd:documentation>
</xsd:annotation>
</xsd:enumeration>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="DatatypeEnumeration">
<xsd:annotation>
<xsd:documentation>The DatatypeEnumeration simple type defines the legal datatypes that are used to describe the values of individual entities. A value should be interpreted according to the specified type. This is most important during comparisons. For example, is '21' less than '123'? will evaluate to true if the datatypes are 'int', but will evaluate to 'false' if the datatypes are 'string'. Another example is applying the 'equal' operation to '1.0.0.0' and '1.0'. With datatype 'string' they are not equal, with datatype 'version' they are.</xsd:documentation>
</xsd:annotation>
<xsd:union memberTypes="oval:SimpleDatatypeEnumeration oval:ComplexDatatypeEnumeration"/>
</xsd:simpleType> Subclassing <state>
<value datatype="record">
<case_sensitive_field type="ind-def:EntityStateCaseSensitiveFieldType" name="" key="SomeKey" datatype="boolean">True</case_sensitive_field>
</value>
</state> And I'm not even sure if this is a 100% legit construct. IMHO this is less awkward: <state>
<value datatype="record">
<field name="^some^key" datatype="boolean">True</field>
</value>
</state> |
We have another improvement for this test, but it seems to be not very in the spirit of the OVAL. We are thinking about adding Example: yaml: "{foo: bar}" OVALs <ind:yamlfilecontent_test>
<ind:object object_ref="object_2" />
<ind:state state_ref="state_2" />
</ind:yamlfilecontent_test>
<ind:yamlfilecontent_object id="object_1">
<ind:file>example.yaml</ind:file> <!-- yaml: "{foo: bar}" -->
<ind:yamlpath>$.yaml</ind:yamlpath>
</ind:yamlfilecontent_object>
<ind:yamlfilecontent_object id="object_2">
<ind:content var_ref="var_1">
<ind:yamlpath>$.foo</ind:yamlpath>
</ind:yamlfilecontent_object>
<ind:yamlfilecontent_state id="state_2">
<ind:value datatype="record">
<ind:field name="#" datatype="string" operation="equal">bar</ind:field>
</ind:value>
</ind:yamlfilecontent_state>
<local_variable id="var_1" datatype="string">
<object_component object_ref="object_1" item_field="value" record_field="#"/>
</local_variable> |
@solind Also, since we think that we have (apart from the improvement above) an implementation more or less sufficient for general use, I would like to ask you to summarize what we are missing for this proposal to get some traction in the direction of being standardized? The test even had some baptism by fire in the Compliance as Code project (in OCP4 data streams, where it is used pretty |
Hi @evgenyz - I actually like the idea of variable content, and I think it could be similarly useful if we added an analogous construct to the textfilecontent54_test. However, should we standardize on any safeguards to insure that the YAML content doesn't become very large? Or is that not really a concern in your experience. I still have implementing YAMLPath in Java on my "to do" list, but I don't know when I'll get around to it. Hopefully I will have time to do it soon after our upcoming release. Two qualifying implementations are needed for a proposal to move from "develop" into "stable" in OVAL. However, given the scope, I don't know if I can identify offhand what might be missing until I actually attempt implementing it, and so we may sort of need a second implementation to finish hashing it out. If I were to make our implementation an open source BSD-licensed or Apache-licensed project, would you be willing to help test it, to compare with the OSCAP implementation? |
Hey!
Well, so far implementing https://github.com/OpenSCAP/yaml-filter/wiki/YAML-Path-Definition does not require loading the whole document into memory. Probe could work with very huge documents, given that there is no path like
Yeah, sure. |
I meant, specifically, to keep the OVAL variables from becoming too large. File size isn't the issue, but we don't want megabytes of text encapsulated in variable values. |
As far as I understand right now nothing in OVAL forbids one to capture a one-line file of size say 1 GB into a variable using |
So, if |
You're right, I suppose it's already a possible problem! Perhaps instead of having a fixed value defined in the specification, we could add a "maxsize" attribute to the local_variable, and define how to handle situations where the maxsize is exceeded (e.g., generate an error? truncate?) |
Of course... that doesn't solve the problem of the OVAL item having a 1GB sub-element. |
OpenSCAP's implementation of the latest addition ( |
Just to add my 2 cents, I am in favor of the @solind if you're working on something that would have an open-source license to it for evaluating YAMLPath, I can certainly try to contribute as well as test things out. Let me know how I might help. |
Meanwhile you can play with OpenSCAP's implementation of it and maybe give us all some feedback or improvement proposals: https://github.com/OpenSCAP/yaml-filter. It even has a binary similar to |
From Area Supervisors meeting: Perhaps make a note in the schema documentation that the |
Hi gang, sorry I had one other question in my notes that I forgot to ask about. One of the stipulations in this test indicated the fact that record |
The change would be backward-compatible. And it's not like it has a lot of sense (there are multiple ways to break LDAP even without A...Z). But still it would be a change in the spec, so the decision here is administrative in nature. |
Well, Anyhow, |
@evgenyz thanks for pointing out the current description of the |
PR has been merged; resolving this issue |
Abstract
Nowadays a lot of applications store configuration in YAML or JSON formats. It would be helpful for security content authors to have a straightforward test for elements of these configurations, like the one for XML-formatted configuration files, xmlfilecontent. Because of the nature of YAML documents (they are kind of a database) certain parts of the proposed behavior are also similar to ldap57 and sql57 tests.
Link to Proposal
#90
Additional Context
The proposal concentrate effort around YAML format (as JSON is a subset of YAML) featuring YAML Path/JSONPath subset as the addressing mechanism.
Reference Implementation
YAML Path/JSONPath subset (for libyaml): https://github.com/OpenSCAP/yaml-filter.
Probe (yamfilecontent): https://github.com/OpenSCAP/openscap (branch: maint-1.3), updated version will be available in v1.3.4.
Examples: https://github.com/OpenSCAP/openscap/tree/maint-1.3/tests/probes/yamlfilecontent
Example of YAML configuration:
and test:
Due to the limitations imposed on the record and it's child, field (name attribute must not contain A-Z and should not be empty), entities the test has some workarounds, which could be dropped in the newer version of the core schema (and the test):
Inline or Var-Referenced YAML Documents
There is a way to use YAML documents captured from other objects in the system or just hard-coded to the test:
This could help dealing with russian doll style YAML files where other YAML documents are stored as strings.
Type Handling
As it was already mentioned, this test is pretty similar to xmlfilecontent except for one thing: type handling. There are no types in XML files, everything is a string. In contrast, YAML (and JSON) documents come with the concept of explicitly and implicitly typed data.
The reference implementation uses YAML 1.2 Core Schema to infer types of scalars in the document. It then matches type of the scalar with OVAL type provided in yamlfilecontent_state and then, if types are equal, compares the values according to the type comparison rules. If types are different the value is considered as not matching, and no type coercion of any kind is performed.
Reasoning: #91 (comment)
The text was updated successfully, but these errors were encountered: