-
Notifications
You must be signed in to change notification settings - Fork 0
Minutes
Hangout URL: https://hangouts.google.com/call/uktgwhx5yzbmddm5miplexymluy
- Andy Bunce (ABu)
- Achim Berndzen (ABe)
- Geert Bormans (GB)
- Hans Hübner (HH)
- David Maus (DM)
- Ari Nordström (AN)
- Martin Kraetke (MK)
- Gerrit Imsieke (GI)
- Nic Gibson (NG)
- Kourosh Mojar (KM)
- Charles Foster (CF)
- Progress Report
- Arbitrary Document Support
- Step Library
- Diagnostics (time permitting)
- W3C Community Group
ABe reports:
XPath 3.1 New repo
Issues no are grouped by topical labels (for ex., diagnostics
Organising the work, sorting out irrelevant issues
Not much progress since ~April since Norm was quite busy with other things
DM reports:
Wrote some tests
Found some issues with it (how to test UUID?)
Will look into making XSpec usable for this.
ABe: Sandro Cirully told me that the Norwegians have some XSpec/XProc stuff. ⇒ Ask Jostein
GB reports:
No new steps have been included (except Martin’s generic report step that isn’t included yet)
Infrastructure: File step proposals as issues in the steps repo
Transfer any proposals for steps from the 3.0-specification repo to said steps repo.
Christophe and Matthieu working on this. No report today since they’re not present.
ABe: They collected some documentation pieces. We should link to them from xproc.org (and update existing links / remove inaccurate/outdated/duplicated info)
KM: It would be helpful if you could comment on these resources.
NG: Consider using github pages
DM: Just maintain a list with links on github.
⇒ ABe to send a reminder to Christophe tomorrow
GB: Revive the monthly report
ABe: 2nd Tuesday of each month
(This document serves as this report for tomorrow; documentation group might send an individual report)
GI: File reports in the 3.0-specification wiki and point to it on xproc-dev
ABe reports:
Get rid of XML-centric approach
Document as 1. representation of data and 2. properties (MIME type, base uri)
Adopt XQuery document model (may contain more than one top-level element, text node)
(Have to be regular XML docs for p:store and validation though)
Property lookup: Current proposal: by p:document-properties()
Question: What is the argument of this function?
Non-XML documents are represented as an empty document
The question whether to let arbitrary XDM items flowing between steps arose again in an issue
ABe: What is the content type of a non-doc XDM item (xs:base64binary, map, …)?
GI: What is the distinction between input ports and options if any XDM value may arrive on a port? Will we treat them differently or just call them differently?
ABe: We’d have to map XDM items to content types (content type / MIME type / media type used interchangeably here) (Norm’s proposal: content-type=“x.vnd/xdm-item”)
CSV, JSON, Markdown as special text files that need either transparent or explicit transformations (from document with text node to map or XML). Not every representation qualifies as a document.
NG: Are there use cases for treating arbitrary XDM items as documents?
ABe: Should the content of non-XML/non-text docs be accessible through XPath?
Should there be p:…
functions to access the data and convert it to hexBinary, base64binary, etc.? Henry’s objection was: If the doc is a large video, the variable that holds it might use up all main memory.
GI: Maybe represent arbitrary documents not merely as empty documents, but as /c:document-properties
documents with the map that p:document-properties()
would return as attributes (map key = attribute name). This document is read-only (you can’t apply p:add-attribute
to it), but the properties can be accessed in XPath expressions.
Example for reading a JSON file and converting it to a map (with p:parse-json()
as an XProc function on the stub document; it calls fn:parse-json()
):
<p:load href="some.json" document-properties="map{'content-type':'application/json'}"/>
<p:group>
<p:variable name="json-as-map" select="p:parse-json(/)" as="item()*"/>
…
</p:group>
(Background for the document-properties attribute, which is included here only for demonstration purposes. The processor will most likely determine the content type of a local file by extension or by content sniffing, see TIKA below.)
GI: Postpone streaming to XProc 4.0
CF: You can have streaming now with fn:fold-left()
and a function that splits base64binary into chunks (not regarding implementation details for the moment)
DM et al.: How do we know when an XML document arrives on an input port whether it’s a c:document-properties
stub or an actual XML document with the same top-level element name?
In order to be safe, call p:document-properties()
on the document.
HH, DM: Can you introduce something like p:is-stub(/) as xs:boolean
? Proposals for better names for this shortcut function are welcome.
In principle, standardize EXProc steps
Align with what EXPath file module offers
Augment test suite accordingly (currently no EXProc steps covered)
Add full and partial (content type based, wildcard based) archive extraction (resulting in sequences of documents with appropriate document properties)
CF: Consider supporting tar and other archive formats.
ABe: Consider writing an extension step for this.
KM: Look at how the BaseX archive module does it.
ABe: Maybe not create a step for each of the BaseX functions.
GI: BaseX doesn’t rely on Zip manifests. NG: Yes, I want to keep manifests.
Introduce a compatibility option (EPUB, …). Or use (some of) the serialization options (media-type, version, …)
In the absence of such an option, look at the mimetype file
Explain what “EPUB compatibility” (or media-type="application/epub+zip") means: Not compressing the mimetype file and putting it first.
Or don’t specify behaviour for any target format peculiarities at all and leave it to the implementations.
Goal is to save users work, not to make sure the output files are valid EPUBs etc.
Call the option 'hint' or 'pragma'.
Can’t we manipulate that already with the manifest?
Almost – we need to stipulate that the files must be stored in manifest document order.
ABe: General suggestion: Move all serialization options to maps. No-one disagrees.
CF: Support XQuery 3.1 and XSLT 3.0 serialization spec
NG: Writing an archive isn’t in there
⇒ MK will submit augmented zip/unzip proposals
CF: In order to use external processors (written in Java), how do you specify an XQJ driver location, for example.
GB: Steps that talk to relational DBs.
ABe: There should be processor configuration options.
NG, GB: Sometimes you need to select connection pools, drivers, etc. dynamically.
(at this point, we decided to deal with creating a new community group first, see below)
Calabash extension steps that should be standardized as optional steps unless noted otherwise:
- structured text format (Markdown, AsciiDoc, Textile, …) as a generic replacement for
cx:asciidoctor
-
cx:collection-manager
: Update after the 2017-06-15 conf call: We could standardize it, but we could also let the processor determine collections based on common document properties. cx:css-formatter
-
cx:eval
will probably becomep:run
-
cx:get-cookies
,cx:set-cookies
should probably be part of HTTP responses/requests (as maps) -
cx:java-properties
: Maybe use XSLT’sfn:system-property
, or extend the standardizedpos:info
-
cs:mathml-to-svg
: Leave it out because of Jeuclid bugs. Maybe revive it if there’s a more reliable library -
cx:message
. There are superseding proposals already -
cx:metadata-extractor
. If it’s only for images, maybe use another local name? Or extend it to supporting other (ultimately, any) file types, using for ex. TIKA? Not augment thec:document-properties
stub by default with these metadata, only upon explicit request. Maybe add an option to make these metadata explicit when loading the images. cx:namespace-delete
-
cx:pegdown
– merge withcx:asciidoctor
to something generic - Discussion about
cx:pretty-print
. Consensus not to standardize it - Discussion about
cx:rdfa
,cx:rdf-load
,cx:rdf-store
,cx:sparql
. Consensus not to standardize them or to postpone standardization -
cx:report-errors
: Maybe not standardize it and use standard XProc means? Ask Norm. Also consider synchronizing it with the new message step. -
cx:send-mail
: ABe suggests that SMTP and IMAP should be added top:http-request
(but not mandatory). Postpone both the outbound and inbound part, have vendor-specific extension steps for the time being -
cx:until-unchanged
: Ask Norm about the original purpose. Maybe generalize it? -
cx:uri-info
: Ask Norm about the purpose. Maybe already covered byp:http-request
? cx:wait-for-update
-
cxu:compare
: Merge it withp:compare
. Supply options as maps, thereby allowing different diff tools with different options to be supported.
CF proposes interfacing with message queue systems through AMQP
HH: It’s complicated
Maybe as vendor-specific extensions
Consensus:
- Trash the “data pipelining use cases community group”
- Create a new one: XProc Next, modeled after XPath Next
AN: Add “write specifications” to the goals of XProc Next
Operate under W3C patent policy (good thing™)
⇒ AN will propose this group and be chair once it’s established, GI will be deputy chair.
Meeting before or after XML Prague 2018, during which at least the language spec should be finalized
But we need another F2F before (with Norm). Maybe around the XML Summer School in September, or in Frankfurt, which is more convenient for the French colleagues.
Update after the 2017-06-15 conf call: Maybe Sept. 9/10 or the following week? Norm will be in the UK, but can also travel to Frankfurt (or Cologne will also be convenient for people from France, Benelux, or UK)