-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Discussion of the new XML processing feature #3178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
XML parserdoes ModSecurity use the libxml2 SAX parser? When I read Wild cardsOne very frustrating thing at the moment is that JSON targets can't be matched using wild cards. For example, if I wanted to match / exclude JSONwhile the argument names in libmodsecurity3 are much better than in v2, the numbering of list elements is a potential problem for when you want to match any element in a potentially large list. A rule would need exclude the complete array or enumerate every entry in the list, which isn't possible. This will always give attackers a way to bypass such a rule by simply placing the payload at the first index that isn't being inspected. GeneralI suggest focusing on one feature at a time. For me, that would currently be giving the user the full ability to inspect XML contents and nodes. Other features can be added later on. IMO, one really good feature is often much more valuable than many of mediocre quality. |
mod_security2 uses https://gitlab.gnome.org/GNOME/libxml2/ |
On the long term, it looks logical to have the same behaviour in v2 & v3, and also with XML & JSON. |
Remark about v2 JSON parsing: you also have arrays appearing in the ARGS names, but without index, like "json.b.array.a1". |
About limiting the parsing: This could be a gain in case of huge payload. |
About Wildcards in JSON: regex in targets is definitely a must |
Another feature asked by many people is the possibility to parse a JSON from a custom variable (like the value of a cookie, maybe after base64-decode - yes, everybody thinks about JWT). |
|
yes. Both version uses libxml2.
Sorry, but I don't understand this. Where is the
yes - the question is here that what ModSecurity might want to inspect? Eg. is it necessary the tags' arguments?
Why? This works for me:
the request:
then I see in the log:
but don't see the args with name May be this is something you need.
Ahm, I think I see what you mean. Needs to figure out the correct handling.
In case of content do you think about the whole XML raw content? Now the demand is that we need the XML's key:value pairs, especially generate exclusions. (Regarding to nodes.)
Sorry, what "other features" do you think about? |
I don't know how many users expects it - a feature request received in private, which is quite urgent, but they use the mainline mod_security2, so we can't provide it as a "custom feature", because later that would lead collisions. But yes, your question is legal - this is not the most priority, but regarding the circumstances, we can't avoid that. |
I meant to write |
It works with |
Okay, pro: security reason, con: can't reference to an exact item. This is why I started this discussion to collect pros and cons for functionalities. |
Ah, I see - thanks. |
Not raw necessarily. I just think that being able to inspect values and nodes properly has to be the most important thing. Raw sounds expensive, maybe that's not necessary. |
You wrote additional validation, for example. |
Thanks. But why do you think that reading this keyword it sounds to you like ModSecurity does not use libXML? |
SAX, not libxml. |
libxml2 has a SAX parser, so XML can be streamed. For streams, a depth limit doesn't make much sense (I think). |
Raw appears in But for your idea: could you write an example? |
This seems to me like a new transformation - may be we should open a new discussion for that. |
Tree based XML parser:
Stream based XML parser (SAX):
Examples of what access could look likeAccess any node below a given level with a certain name:
Access any attribute below a given level with a certain name:
Access to prelude:
Access to trailer:
In short, raw access is probably a good idea, but there should be other options, as applying a regular expression to a huge string is never a good idea. Usually, the scope can be reduced significantly, provided that it's possible to access everything. |
Why would you want check the second item only, knowing that I could evade your check by switching both items? |
I don't want to check only the second item - I would expect that the engine check all items in the list. |
These are called in libxml2 DOM parser and SAX parser. May be we should inspect the efficiency and performance of both parsers.
Ah, I see, so with these prefixing we can divide the the nodes' key:pair list but also we can access to attributes. Thanks for the idea, we should take a look.
Right. But first, as you wrote, we should start with one function, then we can continue with the others. |
I made a small example which could be the base of future XML parser. The parser uses libxml2's SAX parser, the newest version (v2 - the old methods and structures are deprecated). Please take a look at that: https://github.com/airween/saxparser_example You should try that with other XML's, and make investigations with Valgrind or another memory testers. Any feedback is welcome. |
Unfortunately there wasn't any comment, so here is my plan:
Possible feature:
|
So the syntax without index is better. In v2, json.b.array.a1 will match all keys named "a1" in the array. |
I can't decide if it is better or not, but the fact is that we can't solve indexing with SAX. |
Don't you think it's possible to abuse Could SecParseXMLintoArgs take a regular expression to somehow filter what would end up in the |
The default
It could but I think that would be a risk, I mean if the rule writer wants to control what arguments could be processed, then the bypass is more possible. Anyway, if there will be a demand later, we can add this feature. |
I propose to have this discussion in a separate thread and to extend it to all ARGS. There are very good arguments about white-listing/black-listing ARGS parsing, even for other other encodings. This should probably be discussed with other SecLang projects. |
A new idea came up during development, which does not modify the original plan but extends it. Consider you want to parse the XML nodes into With the original concept you have to create an exclusion to turn off It would be easy to add a third option to
Please share your ideas. |
The Assuming that a user has CRS (or some other rule set) deployed then if they're using the With
|
I like this proposal too. Here is another thing, I am not sure we thought about. With XML you can have args in elements, but you can also have them in attributes. See https://www.w3schools.com/xml/xml_attributes.asp I think CRS covers both with the |
@theseion already hinted at it here.
So using |
Describe the bug
It's not a bug but a discussion about a new feature, how can we extend the XML processing.
There is a feature request from a customer that we should extend the engines' XML parsing capability. Of course, we should add this request to both engine with same behavior.
Current behavior
Consider this payload:
This payload will appear in current state in the engines:
(mod_security2)
(libmodsecurity3)
(lines from debug.logs)
Problem
The problem is that exclusions for sub-parts and specific nodes does not work. See the example:
because the XML variable holds the concatenated node values, not a key:value pairs like JSON. Therefore it's impossible to create any exclusion against any rules.
Possible solution
Consider this converted strcture (XML to JSON):
This payload will expanded like this:
(mod_security2)
(libmodsecurity3)
The idea is to transform the XML structure in a similar way.
Example:
(libmodsecurity3)
Possible risks
How can we avoid/handle the risks?
We can put the decision in the hands of the user, whether he wants to see the new collection under the
ARGS
or not - so introduce a new configuration keyword, eg.SecParseXMLintoArgs
(consider the optional runtime config, eg.ctl:parseXMLintoArgs
)As in case of JSON, introduce a new configuration keyword which controls the maximum number of XML levels that can be analyzed, eg.
SecRequestBodyXMLDepthLimit
(see SecRequestBodyJSONDepthLimit)More todo's
We have to:
For the last item: the behavior of JSON parsing in two versions are different. Consider the payload
{"a":1,"b":[{"a1":"a1val"},{"a1":"a2val"}]}
(see that there is a list!) which is equivalent with this XML:which produces these results:
(mod_security2)
(libmodsecurity3)
Note, that please check the list items with the same keys! I think we should follow the libmodsecurity3's behavior - but the the XML and JSON won't be compatible. (Which implies the next question: do we want to align the mod_security2's behavior?)
Any feedback are welcome!
The text was updated successfully, but these errors were encountered: