Minutes

Minutes of the 2017-06-12 F2F meeting

Hangout URL: https://hangouts.google.com/call/uktgwhx5yzbmddm5miplexymluy

Participants

Andy Bunce (ABu)
Achim Berndzen (ABe)
Geert Bormans (GB)
Hans Hübner (HH)
David Maus (DM)
Ari Nordström (AN)
Martin Kraetke (MK)
Gerrit Imsieke (GI)
Nic Gibson (NG)
Kourosh Mojar (KM)
Charles Foster (CF)

Agenda

Progress Report
Arbitrary Document Support
Step Library
Diagnostics (time permitting)
W3C Community Group

Progress Report

Spec

ABe reports:

XPath 3.1 New repo

Issues no are grouped by topical labels (for ex., diagnostics

Organising the work, sorting out irrelevant issues

Not much progress since ~April since Norm was quite busy with other things

Test Suite

DM reports:

Wrote some tests

Found some issues with it (how to test UUID?)

Will look into making XSpec usable for this.

ABe: Sandro Cirully told me that the Norwegians have some XSpec/XProc stuff. ⇒ Ask Jostein

Steps

GB reports:

No new steps have been included (except Martin’s generic report step that isn’t included yet)

Infrastructure: File step proposals as issues in the steps repo

Transfer any proposals for steps from the 3.0-specification repo to said steps repo.

Documentation

Christophe and Matthieu working on this. No report today since they’re not present.

ABe: They collected some documentation pieces. We should link to them from xproc.org (and update existing links / remove inaccurate/outdated/duplicated info)

KM: It would be helpful if you could comment on these resources.

NG: Consider using github pages

DM: Just maintain a list with links on github.

⇒ ABe to send a reminder to Christophe tomorrow

Procedure

GB: Revive the monthly report

ABe: 2nd Tuesday of each month

(This document serves as this report for tomorrow; documentation group might send an individual report)

GI: File reports in the 3.0-specification wiki and point to it on xproc-dev

Arbitrary Documents Support

ABe reports:

Get rid of XML-centric approach

Document as 1. representation of data and 2. properties (MIME type, base uri)

Adopt XQuery document model (may contain more than one top-level element, text node)

(Have to be regular XML docs for p:store and validation though)

Property lookup: Current proposal: by p:document-properties()

Question: What is the argument of this function?

Non-XML documents are represented as an empty document

The question whether to let arbitrary XDM items flowing between steps arose again in an issue

ABe: What is the content type of a non-doc XDM item (xs:base64binary, map, …)?

GI: What is the distinction between input ports and options if any XDM value may arrive on a port? Will we treat them differently or just call them differently?

ABe: We’d have to map XDM items to content types (content type / MIME type / media type used interchangeably here) (Norm’s proposal: content-type=“x.vnd/xdm-item”)

CSV, JSON, Markdown as special text files that need either transparent or explicit transformations (from document with text node to map or XML). Not every representation qualifies as a document.

NG: Are there use cases for treating arbitrary XDM items as documents?

ABe: Should the content of non-XML/non-text docs be accessible through XPath?

Should there be p:… functions to access the data and convert it to hexBinary, base64binary, etc.? Henry’s objection was: If the doc is a large video, the variable that holds it might use up all main memory.

GI: Maybe represent arbitrary documents not merely as empty documents, but as /c:document-properties documents with the map that p:document-properties() would return as attributes (map key = attribute name). This document is read-only (you can’t apply p:add-attribute to it), but the properties can be accessed in XPath expressions.

Example for reading a JSON file and converting it to a map (with p:parse-json() as an XProc function on the stub document; it calls fn:parse-json()):

<p:load href="some.json" document-properties="map{'content-type':'application/json'}"/>
<p:group>
  <p:variable name="json-as-map" select="p:parse-json(/)" as="item()*"/>
  …
</p:group>

(Background for the document-properties attribute, which is included here only for demonstration purposes. The processor will most likely determine the content type of a local file by extension or by content sniffing, see TIKA below.)

GI: Postpone streaming to XProc 4.0

CF: You can have streaming now with fn:fold-left() and a function that splits base64binary into chunks (not regarding implementation details for the moment)

DM et al.: How do we know when an XML document arrives on an input port whether it’s a c:document-properties stub or an actual XML document with the same top-level element name?

In order to be safe, call p:document-properties() on the document.

HH, DM: Can you introduce something like p:is-stub(/) as xs:boolean? Proposals for better names for this shortcut function are welcome.

Step library

In principle, standardize EXProc steps

Align with what EXPath file module offers

Augment test suite accordingly (currently no EXProc steps covered)

Add full and partial (content type based, wildcard based) archive extraction (resulting in sequences of documents with appropriate document properties)

CF: Consider supporting tar and other archive formats.

ABe: Consider writing an extension step for this.

KM: Look at how the BaseX archive module does it.

ABe: Maybe not create a step for each of the BaseX functions.

GI: BaseX doesn’t rely on Zip manifests. NG: Yes, I want to keep manifests.

Introduce a compatibility option (EPUB, …). Or use (some of) the serialization options (media-type, version, …)

In the absence of such an option, look at the mimetype file

Explain what “EPUB compatibility” (or media-type="application/epub+zip") means: Not compressing the mimetype file and putting it first.

Or don’t specify behaviour for any target format peculiarities at all and leave it to the implementations.

Goal is to save users work, not to make sure the output files are valid EPUBs etc.

Call the option 'hint' or 'pragma'.

Can’t we manipulate that already with the manifest?

Almost – we need to stipulate that the files must be stored in manifest document order.

ABe: General suggestion: Move all serialization options to maps. No-one disagrees.

CF: Support XQuery 3.1 and XSLT 3.0 serialization spec

NG: Writing an archive isn’t in there

⇒ MK will submit augmented zip/unzip proposals

CF: In order to use external processors (written in Java), how do you specify an XQJ driver location, for example.

GB: Steps that talk to relational DBs.

ABe: There should be processor configuration options.

NG, GB: Sometimes you need to select connection pools, drivers, etc. dynamically.

(at this point, we decided to deal with creating a new community group first, see below)

Calabash extension steps that should be standardized as optional steps unless noted otherwise:

structured text format (Markdown, AsciiDoc, Textile, …) as a generic replacement for cx:asciidoctor
cx:collection-manager: Update after the 2017-06-15 conf call: We could standardize it, but we could also let the processor determine collections based on common document properties.
cx:css-formatter
cx:eval will probably become p:run
cx:get-cookies, cx:set-cookies should probably be part of HTTP responses/requests (as maps)
cx:java-properties: Maybe use XSLT’s fn:system-property, or extend the standardized pos:info
cs:mathml-to-svg: Leave it out because of Jeuclid bugs. Maybe revive it if there’s a more reliable library
cx:message. There are superseding proposals already
cx:metadata-extractor. If it’s only for images, maybe use another local name? Or extend it to supporting other (ultimately, any) file types, using for ex. TIKA? Not augment the c:document-properties stub by default with these metadata, only upon explicit request. Maybe add an option to make these metadata explicit when loading the images.
cx:namespace-delete
cx:pegdown – merge with cx:asciidoctor to something generic
Discussion about cx:pretty-print. Consensus not to standardize it
Discussion about cx:rdfa, cx:rdf-load, cx:rdf-store, cx:sparql. Consensus not to standardize them or to postpone standardization
cx:report-errors: Maybe not standardize it and use standard XProc means? Ask Norm. Also consider synchronizing it with the new message step.
cx:send-mail: ABe suggests that SMTP and IMAP should be added to p:http-request (but not mandatory). Postpone both the outbound and inbound part, have vendor-specific extension steps for the time being
cx:until-unchanged: Ask Norm about the original purpose. Maybe generalize it?
cx:uri-info: Ask Norm about the purpose. Maybe already covered by p:http-request?
cx:wait-for-update
cxu:compare: Merge it with p:compare. Supply options as maps, thereby allowing different diff tools with different options to be supported.

CF proposes interfacing with message queue systems through AMQP

HH: It’s complicated

Maybe as vendor-specific extensions

Community Group

Consensus:

Trash the “data pipelining use cases community group”
Create a new one: XProc Next, modeled after XPath Next

AN: Add “write specifications” to the goals of XProc Next

Operate under W3C patent policy (good thing™)

⇒ AN will propose this group and be chair once it’s established, GI will be deputy chair.

Any other business

Schedule

Meeting before or after XML Prague 2018, during which at least the language spec should be finalized

But we need another F2F before (with Norm). Maybe around the XML Summer School in September, or in Frankfurt, which is more convenient for the French colleagues.

Update after the 2017-06-15 conf call: Maybe Sept. 9/10 or the following week? Norm will be in the UK, but can also travel to Frankfurt (or Cologne will also be convenient for people from France, Benelux, or UK)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly