Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skeleton archive unarchive steps #55

Merged
merged 3 commits into from
Mar 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/main/xml/bibliography.xml
Original file line number Diff line number Diff line change
Expand Up @@ -329,5 +329,7 @@ Internet Engineering Task Force. July, 2005.</bibliomixed>
459</citetitle>. <biblioid class="doi">10.1109/DSN.2002.1028931</biblioid>.
P. Koopman. June 2002.
</bibliomixed>


<bibliomixed xml:id="zip"><abbrev>ZIP</abbrev>
<citetitle xlink:href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">.ZIP File Format Specification</citetitle>.</bibliomixed>
</bibliography>
1 change: 1 addition & 0 deletions steps/src/main/xml/references.xml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
<bibliomixed xml:id="tagsoup"/>
<bibliomixed xml:id="bib.uuid"/>
<bibliomixed xml:id="bib.sha"/>
<bibliomixed xml:id="zip"/>
</bibliolist>
</section>
<section xml:id="informative-references">
Expand Down
2 changes: 2 additions & 0 deletions steps/src/main/xml/specification.xml
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ linkend="rfc2119"/>.</para>

<xi:include href="steps/add-attribute.xml"/>
<xi:include href="steps/add-xml-base.xml"/>
<xi:include href="steps/archive.xml"/>
<xi:include href="steps/cast-content-type.xml"/>
<xi:include href="steps/compare.xml"/>
<xi:include href="steps/count.xml"/>
Expand Down Expand Up @@ -195,6 +196,7 @@ linkend="rfc2119"/>.</para>
<xi:include href="steps/text-replace.xml"/>
<xi:include href="steps/text-sort.xml"/>
<xi:include href="steps/text-tail.xml"/>
<xi:include href="steps/unarchive.xml"/>
<xi:include href="steps/unescape-markup.xml"/>
<xi:include href="steps/unwrap.xml"/>
<xi:include href="steps/uuid.xml"/>
Expand Down
118 changes: 118 additions & 0 deletions steps/src/main/xml/steps/archive.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<section xmlns="http://docbook.org/ns/docbook" xmlns:p="http://www.w3.org/ns/xproc"
xmlns:e="http://www.w3.org/1999/XSL/Spec/ElementSyntax" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="c.archive">

<title>p:archive</title>

<para>The <code>p:archive</code> step outputs on its <port>result</port> port an archive (usually binary) document,
for instance a ZIP file. A specification of the contents of the archive must be specified in a manifest XML document
on the <port>manifest</port>. The contents of the archive itself can come from documents provided on the
<port>source</port> port, from a URI, from documents specified inline in the manifest or any combination of these.
The step produces a report on the <port>report</port> port, which contains the manifest, amended with additional
information about the archiving. </para>

<p:declare-step type="p:archive">
<p:input port="source" primary="true" content-types="*/*" sequence="true"/>
<p:input port="manifest" content-types="application/xml" sequence="false"/>
<p:output port="result" primary="true" content-types="application/*" sequence="false"/>
<p:output port="report" content-types="application/xml" sequence="false"/>
<p:option name="format" as="xs:QName" required="false" select="'zip'"/>
<p:option name="parameters" as="map(xs:Qname, item()*)" required="false"/>
</p:declare-step>

<para>The <code>p:archive</code> step takes the document appearing on its <port>manifest</port> port as a
specification for an archive file. It outputs this archive on its <port>result</port> port.</para>

<para>The format of the archive can be specified using the <option>format</option> option. Implementations
<rfc2119>must</rfc2119> support the <biblioref linkend="zip"/> format, specified with the value <code>zip</code>.
<impl>It is <glossterm>implementation-defined</glossterm> what other formats are supported.</impl></para>

<para>The <option>parameters</option> can be used to supply parameters to control the archiving. <impl>The semantics
of the keys and the allowed values for these keys are <glossterm>implementation-defined</glossterm>.</impl>
<error code="C0079">It is a <glossterm>dynamic error</glossterm> if the map <option>parameters</option> contains an
entry whose key is defined by the implementation and whose value is not valid for that key.</error></para>

<para>The <port>report</port> port outputs a copy of the manifest, optionally amended with additional attributes
and/or elements. <impl>The semantics of any additional attributes, elements and their values are
<glossterm>implementation-defined</glossterm>.</impl>
</para>

<section xml:id="cv.request">
<title>Specifying an archive manifest</title>

<para>An archive manifest is represented by a <tag>c:archive</tag> root element.</para>

<note role="editorial">
<para>TBD: Specify <tag>c:archive</tag> root element using schemas. Proposal:</para>
<programlisting><![CDATA[<c:archive> <c:file>* </c:archive>]]></programlisting>
</note>
<!--<e:rng-pattern name="..."/>-->

<para>The <code>c:archive</code> root element may contain additional <glossterm>implementation-defined</glossterm>
attributes.</para>

<para>All entries in the archive must be present as <tag>c:file</tag> child elements:</para>

<note role="editorial">
<para>TBD: Specify <tag>c:file</tag> elements using schemas. Proposal:</para>
<programlisting><![CDATA[<c:entry name="..." href?="..." compression-method?="..."> ...optional contents... </c:entry>]]></programlisting>
</note>
<!--<e:rng-pattern name="..."/>-->

<para>The <code>name</code> attribute specifies the name of the entry in the archive. It <rfc2119>must</rfc2119> be
specified as a relative path.</para>
<para>The optional <code>href</code> attribute is interpreted as follows:</para>
<itemizedlist>
<listitem>
<para><error code="D0064">It is a <glossterm>dynamic error</glossterm> if the <option>href</option> attribute is
present and its value is not a valid <type>xs:anyURI</type>.</error></para>
</listitem>
<listitem>
<para>When the <tag>c:file</tag> elements has any child nodes, it is ignored.</para>
</listitem>
<listitem>
<para>The <code>p:archive</code> step checks the documents appearing on its <port>source</port> port for any
documents with exactly the same base URI as the contents of the <code>href</code> attribute. If any such
documents are found, the <emphasis>first</emphasis> of these is used as entry for the archive.</para>
</listitem>
<listitem>
<para>If the above doesn't apply, the value of the <code>href</code> attribute is interpreted as a URI and the
document is loaded from this.</para>

<para><error code="D0011">It is a <glossterm>dynamic error</glossterm> if the resource referenced by the
<option>href</option> option does not exist, cannot be accessed or is not a file</error></para>
<para> If the <option>href</option> option is relative, it is made absolute against the base URI of the
manifest.</para>
</listitem>
<listitem>
<para><error code="TBDTBD">It is a <glossterm>dynamic error</glossterm> if the <code>href</code> attribute is
not specified and the <tag>c:file</tag> element has no child nodes.</error></para>
</listitem>
</itemizedlist>

<para>The <code>compression-method</code> attribute specifies how the entry should be compressed. <impl>The default
compression method is <glossterm>implementation-defined</glossterm>. </impl>Implementations
<rfc2119>must</rfc2119> support no compression, specified with the value <code>none</code>. <impl>It is
<glossterm>implementation-defined</glossterm> what other compression methods are supported.</impl></para>

<para>When the <code>c:file</code> element has any child nodes this is taken as the contents of the archive's entry.
The <code>href</code> attribute is ignored in this case.</para>

<para>The <code>p:archive</code> step should strive to retain the order of the <tag>c:file</tag> elements when
constructing the archive. For instance, an e-book in EPub format has a non-compressed entry that must be first in
the archive. It should be possible to construct such an archive using <code>p:archive</code>.</para>

<para>The <code>c:file</code> elements may contain additional <glossterm>implementation-defined</glossterm>
attributes.</para>
<note role="editorial">
<para>Do we need to say anything about serialization options for XML contents?</para>
<para>Not sure whether JSON needs more specifications</para>
</note>

</section>

<simplesect>
<title>Document properties</title>
<para feature="archive-preserves-none">No document properties are preserved.</para>
</simplesect>
</section>
78 changes: 78 additions & 0 deletions steps/src/main/xml/steps/unarchive.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
<section xmlns="http://docbook.org/ns/docbook" xmlns:p="http://www.w3.org/ns/xproc"
xmlns:e="http://www.w3.org/1999/XSL/Spec/ElementSyntax" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="c.unarchive">

<title>p:unarchive</title>

<para>The <code>p:unarchive</code> step outputs on its <port>result</port> port either a manifest file describing the
contents of an archive (for instance entries in a ZIP file) or specific entries in an archive.</para>

<p:declare-step type="p:unarchive">
<p:input port="source" primary="true" content-types="*/*" sequence="false"/>
<p:output port="result" primary="true" content-types="*/*" sequence="true"/>
<p:option name="include-filter" as="xs:string" e:type="RegularExpression" required="false"/>
<p:option name="exclude-filter" as="xs:string" e:type="RegularExpression" required="false"/>
<p:option name="format" as="xs:QName" required="false" select="'zip'"/>
<p:option name="parameters" as="map(xs:Qname, item()*)" required="false"/>
</p:declare-step>

<para>The <code>p:unarchive</code> step takes the document appearing on its <port>source</port> port as an archive
(for instance a zip file). Depending on which options are set it either outputs a description of the contents of the
archive as an XML document or specific entries (files) from the archive.</para>

<para>The format of the archive can be specified using the <option>format</option> option. Implementations
<rfc2119>must</rfc2119> support the <biblioref linkend="zip"/> format, specified with the value <code>zip</code>.
<impl>It is <glossterm>implementation-defined</glossterm> what other formats are supported.</impl></para>

<para>The <option>parameters</option> can be used to supply parameters to control the unarchiving. <impl>The semantics
of the keys and the allowed values for these keys are <glossterm>implementation-defined</glossterm>.</impl>
<error code="C0079">It is a <glossterm>dynamic error</glossterm> if the map <option>parameters</option> contains an
entry whose key is defined by the implementation and whose value is not valid for that key.</error></para>

<para>If present, the value of the <option>include-filter</option> or <option>exclude-filter</option> option
<rfc2119>must</rfc2119> be a whitespace separated list of regular expressions as specified in <biblioref
linkend="xpath31-functions"/>, section 7.61 “<literal>Regular Expression Syntax</literal>”.</para>

<para>If neither the <option>include-filter</option> option nor the <option>exclude-filter</option> option is
specified, the <code>p:unarchive</code> step outputs on its <port>result</port> port a description of the contents of the
archive, as specified below.</para>

<para>If the <option>include-filter</option> option or the <option>exclude-filter</option> option is specified, the
<code>p:archive</code> step outputs on the <port>result</port> port the entries from the archive that conform to the
following rules:</para>
<itemizedlist>
<listitem>
<para>If any <option>include-filter</option> pattern matches an archive entry's name, the entry is included in the
output.</para>
</listitem>
<listitem>
<para>If any <option>exclude-filter</option> pattern matches an archive entry's name, the entry is excluded in
the output.</para>
</listitem>
<listitem>
<para>If both options are provided, the include filter is processed first, then the exclude filter. </para>
</listitem>
<listitem>
<para>Names of entries in archives are always relative names. For instance, the name of a file called
<code>xyz.xml</code> in a <code>specs</code> subdirectory in an archive is called in full
<code>specs/xyz.xml</code> (and not <code>/specs/xyz.xml</code>).</para>
</listitem>
</itemizedlist>
<para>As a result: an item is included if it matches (at least) one of the <option>include-filter</option> values and
none of the <option>exclude-filter</option> values.</para>
<note role="editorial">
<para>What about the base URIs of these documents?</para>
</note>

<section >
<title>Archive content specification</title>
<note role="editorial">
<para>TBD. Like the manifest of <code>p:archive</code> but no <code>@href</code>?</para>
</note>
</section>

<simplesect>
<title>Document properties</title>
<para feature="archive-preserves-none">No document properties are preserved.</para>
</simplesect>
</section>