-
Notifications
You must be signed in to change notification settings - Fork 8
Providing Input to XMLUnit
All core parts of XMLUnit use a single abstraction for "pieces of XML"
they are supposed to work on. For Java this is
javax.xml.transform.Source and for .NET we've created
Org.XmlUnit.ISource which basically adds a wrapper around an
XmlReader.
For Java many implementations of said interface are part of the Java class library, for .NET we've added the corresponding
-
ReaderSource- just wraps an existingXmlReader -
DOMSource- creates aSourcefrom anXmlNode -
StreamSource- creates aSourcefrom aTextReader,Streamor a string holding an URI -
LinqSource- creates aSourcefrom anXNode
At the time of this writing there is no XML-Serialization based
equivalent of JAXBSource for .NET.
In order to make it easier to create instances of Source or
ISource there
a builder, that provides a fluent API.
CommentLessSource is a decorator of a different source and provides
XML that consists of the original source's content with all comments
removed.
Use this wrapper if you want XMLUnit to ignore comments.
This is class is used under the covers if you tell DiffBuilder to
ignore comments.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlWhitespaceStrippedSource instead - see below.
WhitespaceStrippedSource is a decorator of a different source that
removes all empty text nodes and trims the remaining text nodes.
If you only want to remove all "element content whitespace", i.e. text
content between XML elements that is just an artifact of "pretty
printing" XML then you should use
ElementContentWhitespaceStrippedSource instead.
Empty text nodes are removed:
<element>
</element>becomes
<element></element>Text Nodes are stripped:
<element>
foo
</element>becomes
<element>foo</element>If the XML content has been created in memory rather than been deserialized from an external source it could contain adjacent Text nodes so that
<element>
foo
bar
</element>could become
<element>foobar</element>or
<element>
foo
bar
</element>depending on how the document has been structured. In order to get
more control the input had to be normalized (using
Document.normalize() or XmlDocument.Normalize()) before wrapping
it in a WhitespaceStrippedSource - or by using an additional
NormalizedSource wrapper.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlWhitespaceNormalizedSource instead - see below.
WhitespaceNormalizedSource is a decorator of a different source that
replaces all whitespace characters found in Text nodes with Space
characters and collapses consecutive whitespace characters into a
single Space.
<element>a
b
</element>becomes
<element>a b </element>NormalizedSource performs XML normalization on the wrapped document.
This means adjacent text nodes are merged to single nodes and empty
Text nodes removed (recursively). For Java when wrapping a Document
rather than a Node additional normalizations may be preformed - see
XmlNode.Normalize
for .NET and
Node#normalize
as well as
Document#normalizeDocument
for Java.
When reading documents a parser usually puts the document into normalized form anyway. You will only need to perform XML normalization on DOM trees you have created programmatically.
When using XMLUnit.NET of version 2.10.0 or later, you may want to use
XmlElementContentWhitespaceStrippedSource instead - see below.
ElementContentWhitespaceStrippedSource is a decorator of a different
source that removes all text nodes solely consisting of whitespace.
The main use of this decorator is to remove all "element content whitespace", i.e. text content between XML elements that is just an artifact of "pretty printing" XML.
This class has been added with XMLUnit 2.6.0.
Empty text nodes are removed:
<element>
</element>becomes
<element></element>Text Nodes are not stripped:
<element>
foo
</element>remains
<element>
foo
</element>With the Helper Class Input you can generate Input.Builder to create Source instances.
Source source = Input.fromFile("file:/..../test.xml").build();or with XSL transformations:
Source source = Input.byTransforming(Input.fromFile("file:/..../test.xml"))
.withStylesheet(Input.fromFile("file:/..../test.xsl"))
.build();In .NET the code Examples are very similar, see API:
Java: http://www.xmlunit.org/api/java/master/org/xmlunit/builder/Input.html
.NET: http://www.xmlunit.org/api/net/master/Org.XmlUnit.Builder/Input.html
A special case is the helper method Input.from(Object).
This generic method creates a Builder instance depending of the type of the given Object:
| Java type | .NET type | Description |
|---|---|---|
| org.xmlunit.builder.Input.Builder | Org.XmlUnit.Builder.Input.IBuilder | Builder to create an XML-Source. |
| javax.xml.transform.Source | Org.XmlUnit.ISource | XML-Source |
| org.w3c.dom.Document | System.Xml.XmlDocument | dom Document |
| org.w3c.dom.Node | System.Xml.XmlNode | dom Node |
| - | System.Xml.Linq.XDocument | Linq Document |
| - | System.Xml.Linq.XNode | Linq Node |
| byte[] | byte[] | byte[] which is an XML-Content. |
| String | string | String which is an XML-Content. |
| java.io.File | - | File which contains XML. |
| java.net.URL | - | URL to an XML |
| java.net.URI | System.Uri | URI to an XML |
| java.io.InputStream | System.IO.Stream | Stream from an XML. |
| java.nio.channels.ReadableByteChannel | System.IO.TextReader | ReadableByteChannel or TextReader of an XML |
| A Jaxb Object | - | Object which can be transformed to XML by javax.xml.bind.JAXB.marshal(...) |
This method simplifies the API of DiffBuilder and CompareMatcher
which can accept nearly any Object as input to generate a valid Source.
Whenever you parse XML there is the danger of being vulnerable to XML External Entity Processing - XXE for short.
When passing input to XMLUnit the input is tranformed to a DOM
document with the help of a DocumentBuilder most of the time. Prior
to XMLUnit for Java 2.6.0 the DocumentBuilder used by default was
not configured to prevent XXE as Java's defaults are
vulnerable. Starting with XMLUnit 2.6.0 the default DocumentBuilder
is configured according to OWASP's XXE Prevention
Cheat
Sheet.
This means if you want to protect yourself against XXE and you use a
version of XMLUnit prior to 2.6.0 you have to explicitly set a
DocumentBuilderFactory that is configured properly. Likewise if you
rely on DTD loading or expansion of external entities you must provide
an explicit DocumentBuilderFactory when using XMLUnit 2.6.0 or
later.
If you use the legacy module, XXE prevention is disabled by
default. Starting with XMLUnit 2.6.0 the XMLUnit class has a new
setEnableXXEPrevention method that can be used to enable it.
When using .NET 4.5.2 or newer the default settings used by
XMLUnit.NET have always been safe according to OWASP's XXE Prevention
Cheat
Cheet. Prior
to XMLUnit.NET 2.6.0 there have been a few places where XmlDocument
is used and did not explicitly disable the XmlResolver which means
these places have been vulnerable.
If you rely on XmlDocument loading external entities you will need
to provide an XmlResolver of your own startting with XMLUnit.NET
2.6.0.
The XML specification has a very limited set of characters it considers whitespace while Unicode knows a lot more whitespace characters.
Some of the sources provided by XMLUnit are used to ignore whitespace differences - they use the trim/Trim methods of the
String class respectively. For Java trim's idea of whitespace is compatible with the XML definition (it also removes some control characters which would be illegal inside an XML document). For .NET things are different, though, Trim uses Unicode's definition of
whitespace and thus may hide differences in non-XML whitespace.
Starting with XMLUnit.NET 2.10.0 new sources XmlWhitespaceStrippedSource, XmlWhitespaceNormalizedSource,
and XmlElementContentWhitespaceStrippedSource have been added that only act on whitespace by XML's definition.
This means Java's WhitespaceStrippedSource acts more like .NET's XmlWhitespaceStrippedSource than WhitespaceStrippedSource -
and the same is true for the other sources. "Fixing" the original .NET sources would have broken too many existing
tests, so new types have been added.
Java 11 introduces a new strip method to the String class that acts like .NET's Trim and could be used to implement Source
types that act like .NET's WhitespaceStrippedSource, WhitespaceNormalizedSource, and ElementContentWhitespaceStrippedSource
respectively.
- Overview
- General Concepts
- Comparing XML
- Validating XML
- Utilities
- Migrating from XMLUnit 1.x to 2.x
- Known Issues