Skip to content

Fast XML minifier / compressor / uglifier with a command-line

License

Notifications You must be signed in to change notification settings

kristian/minify-xml

Repository files navigation

minify-xml

minify-xml is a lightweight and fast XML minifier for NodeJS with a command line.

Existing XML minifiers, such as pretty-data often do a pretty (phun intended) bad job minifying XML in usually only removing comments and whitespace between tags. minify-xml on the other hand also includes minification of tags, e.g. by collapsing the whitespace between multiple attributes and further minifications, such as the removal of unused namespace declarations. minify-xml is based on regular expressions and thus executes blazingly fast.

Online

Use this package online to minify XMLs in your browser, visit:

Minify-X.ML (https://minify-x.ml/)

Installation

npm install minify-xml -g

Usage

import minifyXML from "minify-xml";

const xml = `<Tag xmlns:used = "used_ns" xmlns:unused = "unused_ns">
    <!--
        With the default options all comments will be removed, whitespace in
        tags, like spaces between attributes, will be collapsed / removed and
        elements without any content will be collapsed to empty tag elements
    -->
    <AnotherTag  attributeA  =  "..."  attributeB  =  "..."  >  </AnotherTag  >

    <!--
        Also any unused namespaces declarations will be removed by default,
        used namespaces however will be shortened to a minimum length possible
    -->
    <used:NamespaceTag  used:attribute  =  "..."  >
        any valid element content is left unaffected (strangely enough = " ... "
        and even > are valid characters in XML, only &lt; must always be encoded)
    </used:NamespaceTag  >

    <![CDATA[<FakeTag attr = "content in CDATA tags is not minified"></FakeTag>]]>
</Tag>`;

console.log(minifyXML(xml));

This outputs the minified XML:

<Tag xmlns:u="used_ns"><AnotherTag attributeA="..." attributeB="..."/><u:NamespaceTag u:attribute="...">
        any valid element content is left unaffected (strangely enough = " ... "
        and even > are valid characters in XML, only &lt; must always be encoded)
    </u:NamespaceTag><![CDATA[<FakeTag attr = "content in CDATA tags is not minified"></FakeTag>]]></Tag>

Alternatively a Node.js Transform stream can be provided to minify XML streams, which is especially helpful for very large files (> 2 GiB, which is the maximum Buffer size in Node.js on 64-bit machines):

import { minifyStream as minifyXMLStream } from "minify-xml";

fs.createReadStream("sitemap.xml", "utf8")
    .pipe(minifyXMLStream())
    .pipe(process.stdout);

Similar to streams, Node.js 15 introduced an asynchronous stream.pipeline API that with stream/promises utilizes promises. This way you can utilize the advantages of the streaming API (namely no file size limit) in conjunction with the convenience of using a modern promise based API:

import { minifyPipeline as minifyXMLPipeline } from "minify-xml";

await minifyXMLPipeline(fs.createReadStream("catalogue.xml", "utf8"), process.stdout, { end: false });

Options

You may pass in the following options when calling minify:

import { minify as minifyXML, minifyStream as minifyXMLStream } from "minify-xml";
minifyXML(`<tag/>`, { ... });
minifyXMLStream({ ... });
  • removeComments (default: true): Remove comments like <!-- ... -->.

  • removeWhitespaceBetweenTags (default: true): Remove whitespace between tags like <anyTag /> <anyOtherTag />. Can be limited to tags only by passing the string "strict", otherwise by default other XML constructs as the prolog <?xml ... ?>, processing instructions <?pi ... ?>, the document type declaration <!DOCTYPE ... >, CDATA sections <![CDATA[ ... ]]> and comments <!-- ... --> will be also considered as tags.

  • considerPreserveWhitespace (default: true): Do consider the xml:space="preserve" attribute or <pre> tags in any namespace when removeWhitespaceBetweenTags. If set to true and xml:space="preserve" is specified, whitespace between tags like <anyTag xml:space="preserve"> </anyTag> will not be removed.

  • collapseWhitespaceInTags (default: true): Collapse whitespace in tags like <anyTag attributeA = "..." attributeB = "..." />.

  • collapseEmptyElements (default: true): Collapse empty elements like <anyTag anyAttribute = "..."></anyTag>.

  • trimWhitespaceFromTexts (default: false): Remove leading and tailing whitespace in elements containing text only or a mixture of text and other elements like <anyTag> Hello <anyOtherTag/> World </anyTag>.

  • collapseWhitespaceInTexts (default: false): Collapse whitespace in elements containing text or a mixture of text and other elements (useful for (X)HTML) like <anyTag>Hello World</anyTag>.

  • collapseWhitespaceInProlog (default: true): Collapse and remove whitespace in the xml prolog <?xml version = "1.0" ?>.

  • collapseWhitespaceInDocType (default: true): Collapse and remove whitespace in the xml document type declaration <!DOCTYPE DocType >

  • removeSchemaLocationAttributes (default: false): Remove any xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes <anyTag xsi:schemaLocation = "..." />

  • removeUnnecessaryStandaloneDeclaration (default: true): Remove an unnecessary standalone declaration in the xml prolog <?xml version = "1.0" standalone = 'yes' ?>. Note that according to the W3C standalone has "no meaning" and thus is removed, in case there are no external markup declarations.

  • removeUnusedNamespaces (default: true): Remove any namespaces from tags, which are not used anywhere in the document, like <tag xmlns:unused="any_uri" />. Notice the word anywhere here, the minifier not does consider the structure of the XML document, thus namespaces which might be only used in a certain sub-tree of elements might not be removed, even though they are not used in that sub-tree.

  • removeUnusedDefaultNamespace(default: true): Remove default namespace declaration like <tag xmlns="any_uri"/> in case there is no tag without a namespace in the whole document.

  • shortenNamespaces (default: true): Shorten namespaces, like <tag xmlns:namespace="any_namespace"> to a minimal length, e.g. <tag xmlns:n="any_namespace">. First an attempt is made to shorten the existing namespace to one letter only (e.g. namespace is shortened to n), in case that letter is already taken, the shortest possible other namespace is used.

  • ignoreCData (default: true): Ignore any content inside of CData tags <![CDATA[ any content ]]>.

For stream processing following additional options can be supplied:

  • streamMaxMatchLength (default: 262144, 256 KiB): The maximum size of matches between chunks. See replacestream for a detailed explanation.

Stream Limitations

Note that the default streamMaxMatchLength was deliberately chosen as high as a multiple of the Node.js default stream buffer size (the default buffer size for readable streams is 16 KiB, for file system streams it is 64 KiB), as the stream option is specifically meant to be used with very large files / read streams and a larger streamMaxMatchLength will result in a more accurate minification, because some very large tags might require to be read into the buffer all at once to be minified.

On 32-bit machines the maximum buffer size in Node.js is 1 GiB and 2 GiB on 64-bit machines (see this issue). Minify XML can handle strings up to that size and using the minify function should be preferred over the minifyStream option. For larger files / streams the streaming API has to be used, which comes with certain limitations, because no prior knowledge can be obtained for the minification (mainly because we assume we can read the stream only once, an option to obtain the required information by e.g. first parsing a file and then minifying it might be added some time in future). For now the options removeUnusedNamespaces, removeUnusedDefaultNamespace, shortenNamespaces and ignoreCData cannot be used with the streaming API and calling the minifyStream function with these options enabled, will result in an error.

Further multiple buffers of the set size, will be created for each minification option enabled (sometimes a minification requires even multiple buffers / replacements). Thus enabling more options will also allocate more memory depending on the streamMaxMatchLength option and in case the file / read stream is generally larger than the buffer size set. As the input will be pumped through all minification as a stream, roughly 1.5 * n * buffer size will get allocated. E.g. the default buffer size of 256 KiB with all default options enabled for streaming, will for instance result in 11 buffers / replacements to be made, so 11 * 256 KiB = 2.75 MiB is to be allocated if the input stream is 256 KiB or larger.

CLI

You can run minify-xml from the command line to minify XML files:

minify-xml sitemap.xml
minify-xml blog.atom --in-place
minify-xml view.xml --output view.min.xml
minify-xml db.xml --stream > out.xml

Use any of the options above like:

minify-xml index.html --collapse-whitespace-in-texts --ignore-cdata false

Author

XML minifier by Kristian Kraljić. Original package and CLI by Mathias Bynens.

Bugs

Please file any issues on Github.

License

This library is dual licensed under the MIT and Apache 2.0 licenses.

About

Fast XML minifier / compressor / uglifier with a command-line

Resources

License

Stars

Watchers

Forks

Packages

No packages published