-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
p:archive #3
Comments
Sounds like a good proposal to me. I have a few comments/additions:
|
|
The |
Ah ok. I think currently the matching between manifest entries and input documents was done by base URI matching. I think if there’s an input document with a URI that is also referred to in the manifest, the step will not try to read it from disk or network, but use the document with the same base URI that was specified on the input port. Can’t we keep it like that? It needs to be specified mor explicitly though. |
Sorry, that's exactly what I don't like nor use in the current setup. I would like to have a manifest that exactly spells out where all the content comes from (URI, pipe, inline document). The base-uri feature thing can stay, no problem. But I would like some more enhanced capabilities in the manifest. I will try to come up with a more complete proposal somewhere in the coming weeks (if time permits). Base it on this one so we can talk details. Ok? |
I’ll wait for some more people to weigh in. The current approach works for me, it just needs better documentation/specification. I’m against introducing a new protocol. |
But it doesn't work for me. So let's see if we can make something that works for both (add stuff without changing the current workings). No new protocol. Ok. But a |
There is now a skeleton archive step with PR #55. This needs to be discussed and enhanced. |
The proposal above says that the Using Calabash's With Calabash's |
Given that a manifest is always necessary and documents-to-zip-not (they could be inlined in the manifest or on disk) we could change the behavior to:
For how I use it (the current p:zip) this would be very appropriate and simplify my code. I never supply document-to-zip through a port, they always come from disk. So I usually produce a manifest only as input for the zip/archive step... |
@eriksiegel I think that is the way to go (with two ports). And I would like to see two port on p:uncompress also, so the manifest is clearly separated from the documents to flight in or out. With regard to the port names, I have no special preferences, but a hint: To remember that one port is named "manifest" is easy, so may be documents on port "source"/"result" and manifest on "manifest". Just saying. |
Sure. But what is the primary port? I'm advocating that the primary should be the manifest port (on p:archive at least). And if we call the primary port |
I remember that someone recently wrote:
I sometimes wish that Are the reasons for making |
LOL |
If you want the manifest port to be primary, you are right: we need another port name. Never thought of this preference for XProc 3.0. In 1.0 would seem natural to me, but in XProc 3.0 I do not expect the content always to come from disk. But ok: This is just a guess. So I agree: manifest on source, content on another port. |
Ah, now I see what you mean. I have a slight preference for keeping the non-primary manifest input port. But yes, if source documents continue to be read from disk if they appear in the manifest but not on the source port, the source port’s “optionality” is higher than the manifest port’s. Another thought: Can we try to make the manifest port optional? If, on the source port, we have documents with the following base URIs:
and an option <c:manifest xmlns:c="http://www.w3.org/ns/xproc-step">
<c:entry name="img/image.png"
href="file:///C:/home/joe/foo/img/image.png"/>
<c:entry name="js/script.js"
href="file:///C:/home/joe/foo/js/script.js"/>
<c:entry name="img/index.html"
href="file:///C:/home/joe/foo/index.html"/>
<c:entry name="css/styles.css"
href="file:///C:/home/joe/foo/css/styles.css"/>
</c:manifest> If |
We could do that. But: It wouldn't be a use-case for me. In all my many XProc pipelines I always create a manifest and couldn't do without, so I would probably never use this. Having said that, its just me, so if other people think this is useful, I will not stand in the way. I would prefer the source port=manifest option. Anybody else an opinion about this? |
You couldn’t do this in 1.0 (for non-XML documents), therefore you always had to create a manifest. I think that 3.0 can change the way that (some) people create zip files. |
|
@xml-project: “would object” → “wouldn’t object”? In any case, something for the next conference call. |
of course "would not object". Sorry. |
The custom
Effectively the manifest is defined already by the base URIs of the source document (the other complexity it encapsulates is finding a location to save the temporary zip file, since this step is intended to run in the context of a web server). <p:declare-step type="z:zip-sequence" name="zip-sequence">
<p:input port="source" sequence="true"/>
<p:output port="result"/>
<p:input port="parameters" kind="parameter"/>
<!-- create a zip manifest -->
<!-- convert each document in the sequence into a c:entry of a c:zip-manifest -->
<p:for-each>
<p:template>
<p:input port="template">
<p:inline>
<c:entry name="{substring-after(base-uri(), 'file:/')}" href="{base-uri()}"/>
</p:inline>
</p:input>
</p:template>
</p:for-each>
<!-- wrap entries into a manifest -->
<p:wrap-sequence wrapper="c:zip-manifest" name="manifest"/>
<!-- get global parameters to find a safe place to write a temp file -->
<p:parameters name="global-parameters">
<p:input port="parameters">
<p:pipe step="zip-sequence" port="parameters"/>
</p:input>
</p:parameters>
<p:group>
<!-- We need an absolute URI for the temporary zip file, based on the "realPath" parameter -->
<p:variable name="zip-file-name" select="
concat(
'file:',
/c:param-set/c:param[@name='realPath'][@namespace='tag:conaltuohy.com,2015:servlet-context']/@value,
'/zip-sequence.zip'
)
">
<p:pipe step="global-parameters" port="result"/>
</p:variable>
<!-- zip up the sequence of documents according to the manifest and stash it in the temporary file -->
<zip name="zip" xmlns="http://exproc.org/proposed/steps" command="create">
<p:with-option name="href" select="$zip-file-name"/>
<p:input port="source">
<p:pipe step="zip-sequence" port="source"/>
</p:input>
<p:input port="manifest">
<p:pipe step="manifest" port="result"/>
</p:input>
</zip>
<!-- create a request document to read the temporary file back in -->
<p:identity>
<p:input port="source">
<p:inline>
<c:request method="get"/>
</p:inline>
</p:input>
</p:identity>
<p:add-attribute match="/c:request" attribute-name="href">
<p:with-option name="attribute-value" select="$zip-file-name"/>
</p:add-attribute>
<!-- Read ZIP file back in. NB explicit dependency on preceding step -->
<p:http-request cx:depends-on="zip" xmlns:cx="http://xmlcalabash.com/ns/extensions"/>
</p:group>
</p:declare-step> |
I would be interested to see an example of @eriksiegel's use case, to better understand why it "always requires a manifest". |
My 2 cents on the port naming question: If the step has a |
|
My take for question 1: I remember us deciding p:archive is a standard, not optional step... |
Ah ok, apparently I missed that. |
Gerrit has provided new prose, we're assuming that integrates everything from this issue. We're opening a new issue to track comments on the (now current) proposal. |
Here is a proposal for a p:archive step as discussed at the XProc workshop after XML London. The step is based on Calabashs
pxp:zip
but with some changes:The files to be zipped can be declared as XML manifest. The files itself may arrive as a sequence on the
source port
. If the manifest didn't match the documents with theirbase uri
s, the step tries to load them from disk or over another supported protocol, such as http.As a shorthand it should be possible to provide a list of paths to directories and files to be zipped with the
paths
option. Using thepaths
option while a zip manifest arrives at the input port should result in an dynamic error. Theresult
port provides ac:archive
document containing a list of the zipped files. In case of errors, thereport
output should provide ac:errors
document. Default format is zip, but implementations may provide other archive formats which can be addressed with theformat
option.This is the expected input of the
source
port to create a zip file which has a valid OCF structure (think of EPUB files). Please note that you can set particular compression methods and levels for each file.As a shorthand, it should also be possible to leave the
source
port empty and just pass a list of files/directories top:zip
. Then files and directories (including their subdirectories/files) should be added. In this case we expect that the user has already stored it's files in the appropriate file structure.You can add serialization options, for example to just add a directory and direct the XProc processor to create a EPUB-conformant zip structure. The options should be provided by the XProc implementation. On the other hand, it might be helpful to standardize global keys such as
compression-method
andcompression-level
The
file-list
output should look like this.The
result
port provides the Zip document represented byc:document-properties
The text was updated successfully, but these errors were encountered: