Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple security issues in ezxml #2119

Closed
sebastic opened this issue Oct 3, 2021 · 19 comments · Fixed by #2170
Closed

Multiple security issues in ezxml #2119

sebastic opened this issue Oct 3, 2021 · 19 comments · Fixed by #2170

Comments

@sebastic
Copy link
Contributor

sebastic commented Oct 3, 2021

As reported by Moritz Muehlenhoff in Debian Bug #989360:

Multiple security issues were found in ezxml, which netcdf bundles:

CVE-2021-31598:
https://sourceforge.net/p/ezxml/bugs/28/

CVE-2021-31348 / CVE-2021-31347:
https://sourceforge.net/p/ezxml/bugs/27/

CVE-2021-31229:
https://sourceforge.net/p/ezxml/bugs/26/

CVE-2021-30485:
https://sourceforge.net/p/ezxml/bugs/25/

CVE-2021-26222:
https://sourceforge.net/p/ezxml/bugs/22/

CVE-2021-26221:
https://sourceforge.net/p/ezxml/bugs/21/

CVE-2021-26220:
https://sourceforge.net/p/ezxml/bugs/23/

CVE-2019-20202:
https://sourceforge.net/p/ezxml/bugs/17

CVE-2019-20201
https://sourceforge.net/p/ezxml/bugs/16

CVE-2019-20200:
https://sourceforge.net/p/ezxml/bugs/19

CVE-2019-20199:
https://sourceforge.net/p/ezxml/bugs/18

CVE-2019-20198:
https://sourceforge.net/p/ezxml/bugs/20

CVE-2019-20007:
https://sourceforge.net/p/ezxml/bugs/13

CVE-2019-20006:
https://sourceforge.net/p/ezxml/bugs/15

CVE-2019-20005:
https://sourceforge.net/p/ezxml/bugs/14

@opoplawski
Copy link
Contributor

It does seem unfortunate that netcdf is using an xml parser that hasn't been updated in over 15 years.

@DennisHeimbigner
Copy link
Collaborator

I found nothing that came close in size -- 1 .c and 1 .h files -- and with a usable license.

@DennisHeimbigner
Copy link
Collaborator

I went thru this list again to see what was new: https://awesomeopensource.com/projects/xml-parser
Possible replacements: xtmlractor, xml

@sebastic
Copy link
Contributor Author

Why not just link to libxml2? Don't include an XML library in your source tree.

@DennisHeimbigner
Copy link
Collaborator

And and yet another burden to our users?
There is a burden when depending on other libraries.
Some are positive since we do not have to maintain them,
Some are negative -- our users have to install it and we have to keep up with API changes.
For something as trivial as an XML parser, I personally do not think the burden of libxml
is worth it; it is overkill for our limited use.

@opoplawski
Copy link
Contributor

opoplawski commented Oct 12, 2021

I would argue that there is nothing particularly simple about parsing XML, particularly in a safe manner. Also, packaging/building libraries is what distributions and sysadmins are meant to do and is not rocket science. It's the 21st century, let's take advantage of it.

@DennisHeimbigner
Copy link
Collaborator

Sorry, and I do not mean to be rude. But my experience is that despite what we wish were true,
the reality is different. Maintaining a system across multiple platforms: at least Linux, BSD, OSX,
Windows, Cygwin, Mingw and various supercomputers is a difficult task. It is unfortunate but true
that not very many libraries work across that many platforms. Libxml might be one of those, I do not know.

My recent experience trying to support the aws-sdk-cpp library across just Linux and Windows is a good example. It is overkill, but I am forced to use it. After much pain, I finally got it to build on a specific version of Ubuntu, but not on earlier versions of Ubuntu. I still do not have it working on Windows. God knows what will happen with other versions of Linux, or OSX.

It would be nice if the C eco-system were as clean as Python or Java, but it is not.
With respect to your comment: "...It's the 21st century, let's take advantage of it."
I wish the C eco-system was in the 21st century but it is not.
Unidata netcdf-c needs to be very careful about external dependencies.

@sebastic
Copy link
Contributor Author

libxml2 is everywhere where GNOME is, this includes Linux, BSD, OSX, and Windows:

http://xmlsoft.org/downloads.html

It's a good example of a cross-platform library.

Your experience says more about the (lack of) portability of aws-sdk-cpp and means nothing with respect to libxml2.

@sebastic
Copy link
Contributor Author

With respect to dependencies, hdf5 is a very problematic one. It does not support building both parallel and serial versions at the same time, and required far too much customization in its Debian package. There is also no public VCS nor bugtracker, bitbucket.hdfgroup.org & jira.hdfgroup.org are recent improvements but are still not good enough.

@WardF
Copy link
Member

WardF commented Oct 12, 2021

<thinking out loud>

The primary issue here is that the bundled XML interface has a number of CVE's, and is very out of date.

The ideal solution would be to link against an external interface, libxml2. On *nix systems with modern package management, installing dependencies when installing a package becomes essentially a non-issue; [package manager] install libnetcdf would ensure libxml2 were installed along with everything else.

The confounding factor, as usual, is Windows. The link provided above, thanks @sebastic, leads me through a couple levels of links and ultimately takes me to 32-bit Windows versions from 2011, and 64-bit versions from 2015. Not the solution we are looking for.

On Windows, we provide binary installers for netCDF which bundle libcurl, libz, libhdf5, etc, because otherwise the burden of installing the appropriate dependencies becomes non-trivial for the end user. And while we are often able to walk non-developers through the fairly rote process of installing from source on *nix systems, I really shudder to imagine trying to do the same on Windows.

So the situation, in summary, is as follows:

  1. There are CVEs in the existing bundled parser; this needs to be addressed.
  2. Can we easily compile libxml2 as part of our process on Windows, similar to how we bundle libhdf5, etc? I don't have an answer to this yet, but I will investigate. If we can, great, we can continue the discussion about what that would take from our side. If we can't, it becomes a much tricker issue to link against libxml2.
  3. In the short term, can we switch to one of the more modern parsers that @DennisHeimbigner suggested? This may be the path of least resistance to solve point 1, while evaluating point 2. @DennisHeimbigner, did either xtmlractor or xml leap out at you?

I think we are all on the same page when it comes to the urgency to address these CVEs. The issue, as always, is how do we solve this in a way that doesn't leave our Windows/Visual Studio users in a tough spot.

<\thinking out loud>

@DennisHeimbigner
Copy link
Collaborator

  1. I already fixed the ezxml bugs as part of an upcoming PR.
  2. For our purposes, libxml is overkill; we only need very simple xml parsing.
  3. xmltractor and xml both are still pretty old (2017?); no idea what bugs they contain.

@DennisHeimbigner
Copy link
Collaborator

Sorry to get my back up over this. I will switch to libxml if you want. But I just
cannot see that it is worth it for the limited use we make of a read-only XML parser.

@WardF
Copy link
Member

WardF commented Oct 12, 2021

@DennisHeimbigner No issue, I was just outlining my thoughts above while I was looking at the issue. The burden of dependencies on our users is a big one, and the lack of any sort of universal package manager for binary installation makes it a particularly sticky issue for Windows. Thanks for getting the issues as outlined resolved!

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Oct 12, 2021
## Examine and fix ezxml errors

re: Issue Unidata#2119

Multiple security issues were found in ezxml (see above Issue).

* CVE-2021-31598
* CVE-2021-31348 / CVE-2021-31347
* CVE-2021-31229
* CVE-2021-30485
* CVE-2021-26222
* CVE-2021-26221
* CVE-2021-26220
* CVE-2019-20202
* CVE-2019-20201
* CVE-2019-20200
* CVE-2019-20199
* CVE-2019-20198
* CVE-2019-20007
* CVE-2019-20006
* CVE-2019-20005

In addition, moved ezxml to libdispatch.

## Examine and fix selected  oss-fuzz detected errors

Note that most of these errors are in the libsrc .m4 generated
code so fixing them is difficult. It would nice if we could tell
oss-fuzz to skip those files. They are old and crufty and
probably need a complete refactor.

Issue|Status
-----|------
35382|Fixed; old bug
35398|Closed by OSS-Fuzz
35442|Guarantee alloc > 0 or error; Old bug
35721|Assert failure; ok
35992|Fixed; old bug
36038|Fixed; old bug
36129|Unfixed; old bug
36229|Fixed by adding assert; old bug
37476|Unfixed; old bug
37824|Assert Failure; ok
38300|Closed by OSS-Fuzz
38537|Unfixed; old bug
38658|Unfixed; old bug
38699|Fixed maybe; old bug
38772|Nature of error is unclear, suspect that it results from using too large a type.
39248|Need more information
39394|Unfixed; old bug
@DennisHeimbigner
Copy link
Collaborator

THis issue can be closed

@WardF WardF closed this as completed Oct 25, 2021
@e4t
Copy link
Contributor

e4t commented Oct 29, 2021

I'm not sure about closing this. I found several issues:

  1. The upstream patches only cover three of the reported issues - namely
  2. The fix for CVE-2021-31598 is bogus, it causes most of the well-formed and valid XML data I fed to it to to fail, moreover it calls exit(-1) when the test succeeds. This is most likely not desired when run from within netcdf()
  3. The remaining CVEs are not addressed at all: To fix them, I went ahead and performed a deeper analysis of the ezXML code:
    • The EzXML code is in a PoC state. It assumes that:
      • XML data passed to it is always valid - from this it makes assumptions of the data to follow often without checking. This will lead to out-of-bound memory accesses.
      • memory allocation never fails.
    • Moreover, it seems the code was written with the goal to minimize LOCs. To achieve this, it frequently uses nested assignments without checking the return values of the innermost function calls. This nesting also makes the code harder to read and understand.
    • Most inner functions do not have a way to report back failure and pass this information all the way up to the caller.
    • CVE-2019-20201 bug#16, CVE-2019-20198 bug#20, CVE-2021-31229 bug#26 all have the same root cause, however, the affected code is not used in netcdf-c. These are therefore irrelevant.
    • CVE-2019-20006 bug#15, CVE-2019-20202 bug#17, CVE-2021-31598 bug#28 all share the same root cause.
    • CVE-2021-31229 bug#26 points to an entire class of problems arising from not checking whether strspn() and strcspn() are returning the desired offset or the offset to the end of the string. In many cases it is just assumed that the string will continue if more data is expected.
    • CVE-2021-26221 bug#21, CVE-2021-26222 bug#22 and CVE-2021-26220 bug#23 represent the missing tests for failed memory (re)allocations mentioned above.
    • CVE-2019-20005 bug#14 cannot be reproduced with the reproducer provided.
    • I've fixed the issues not already addressed (see my comments in the sourceforge tickets above as well ass pull request CVEs - Issue #2119 #2133) and ported these fixes to netcdf-c. So far, for the classes of problems listed above these fixes cover the reproducers supplied in the bugs, they do not yet address all other occurrences.

With all this said, how much do these issues matter in the context of netcdf-c? So far, this code is only used in the context of DAP4 to transmit information from a dap4 server to a client using netcdf-c.
It can be expected that the XML data passed down from a DAP4 server is valid when talking to a legitimate server. In this scenario most of the reported issues seem to be irrelevant. As long as the DAP4 server is confined to a secured private network, no issues are to be expected - however, there are DAP4 servers available on the 'public' internet. If these networks provide authentication (https) and the user has authentication enabled (libcurl default), the risk of a spoofed address or man-in-the-middle may be low therefore, most of the issues won't be relevant. However, some servers may not offer https, or may use a self-signed certificate which may not be installed on the client system, thus, users may be tempted to disable authentication for convenience.
The missing result checks on [re/m]alloc() are different: the missing checks will cause the program to fail in some random way without signalling an out-of-memory condition back to the user. This is not a problem that is limited to large data provided thru an insecure connection, this problem can be caused by improperly set application limits as well. How much this is an issue depends on how much effort is done in the rest of the code to handle this situation.
One may ask how relevant is this and how much damage may occur. I don't feel competent to answer this question and therefore would rather error on the safe side.
With all this in mind, it should be considered if the use of a well-curated XML library should not be offered as a build time option at least.

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Oct 30, 2021
re: Unidata#2117
re: Unidata#2119

* Modify libsrc to allow byte-range reading of netcdf-3 files in private S3 buckets; this required using the aws sdk. Also add a test case.
* The aws sdk can sometimes cause problems if the Awd::ShutdownAPI function is not called. So at optional atexit() support to ensure it is called. This is disabled for Windows.
* Add documentation to nczarr.md on how to build and use the aws sdk under windows. Currently it builds, but testing fails.
* Switch testing from stratus to the Unidata bucket on S3.
* Improve support for the s3: url protocol.
* Add a s3 specific utility code file: ds3util.c
* Modify NC_infermodel to attempt to read the magic number of byte-ranged files in S3.

## Misc.

* Move and rename the core S3 SDK wrapper code (libnczarr/zs3sdk.cpp) to libdispatch since it now used in libsrc as well as libnczarr.
* Add calls to nc_finalize in the utilities in case atexit is disabled.
* Add header only json parser to the distribution rather than as a built source.
@DennisHeimbigner
Copy link
Collaborator

Thanks, I should have thought of that. Just because I do not want to depend on libxml2
does mean I should disallow its use if available.

@e4t
Copy link
Contributor

e4t commented Nov 1, 2021

Thanks, I should have thought of that. Just because I do not want to depend on libxml2 does mean I should disallow its use if available.

I've looked into this a bit over the weekend. It seems to be doable and libxml2 looks like a good option: it is a DOM parser like ezxml. Other parsers, like xpat, are streaming parsers using callbacks which would make an integration extremely difficult.
I cannot promise when it will be done as I'm only able to work on it on the side.

@DennisHeimbigner
Copy link
Collaborator

Don't bother. I already have start on this.

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Nov 2, 2021
re: Unidata#2119

H/T to [Egbert Eich](https://github.com/e4t) and [Bas Couwenberg](https://github.com/sebastic) for this PR.

It is undesirable to make netcdf be dependent on the availability
of libxml2, but it is desirable to allow its use if available.

In order to do this, a wrapper API (include/ncxml.h) was constructed
that supports either ezxml or libxml2 as the implementation.
Additionally, the xml support code was moved to a new directory
netcdf-c/libncxml.

Primary changes:
* Create a new sub-directory named netcdf-c/libncxml to hold all the xml implementation code.
* Move ezxml.c and ezxml.h to libncxml
* Create a wrapper API -- include/ncxml.h
* Create an implementation, ncxml_ezxml.c to support use of ezxml.
* Create an implementation, ncxml_xml2.c to support use of libxml2.
* Add a check for libxml2 in configure.ac and CMakeLists.txt
* Modify libdap to use the wrapper API instead of ezxml directly.

Misc. Other Changes:
* Change include/netcdf_json.h from built source to be part of the distribution.
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Dec 23, 2021
re: PR Unidata#2139
re: PR Unidata#2169
re: PR Unidata#2146
re: Issue Unidata#2119

Found the product tinyxml2 at https://github.com/leethomason/tinyxml2.git
and replaced ezxml with it. Tinyxml2 is about twice the LOC of ezxml,
but at least is it still being maintained, and I can use it out of the box.
It is C++ rather than C, but we seem to have reached the point that we can
include C++ code with only minor compile flag changes. Untested on Mac OS.
Added instructions to the end of libncxml/Makefile.am on how to upgrade
to a later version of tinyxml2.

This PR obsoletes the use of ezxml (re PRs Unidata#2146 and https://github.com/Unidata/netcdf-c/issue/2119).
@Dave-Allured
Copy link
Contributor

It seems that libxml2 is falling away from current maintenance.
https://gitlab.gnome.org/GNOME/libxml2/-/issues/319

I started #2169 to consider the choice for the default XML parser for netcdf. Currently this is tinyxml2 versus libxml2. Commenters, if you have any insights related to this choice, please add to #2169. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants