[DBIS seminar, 4 november 2014]
SynopsX is a collective project whose aim is to propose a web application built on BaseX to easily expose and publish XML textual sources and data (mostly encoded in TEI http://tei-c.org/).
Initiated by the Atelier des Humanités Numériques (Digital Humanities Workshop) from École Normale Supérieure in Lyon, SynopsX gathers various french projects in the fields of history and literary studies.
The 6 people here to participate a coworking with BaseX are all engaged in different projects but all share the same conception about what should be what we call an "instrumented corpus" :
Saying that, we mean :
- corpus you can more than publish, but expose on the web
- giving to the researcher or to visitors tools to explore and visualise the datas
- and text distribution that allow you to enrich and link the data on the Linked Open Data
<?xml version="1.0" encoding="UTF-8"?>
<body xmlns="http://www.tei-c.org/ns/1.0" n="spleenEtIdeal">
<div type="longPoem">
<head>Les Phares</head>
<lg type="stanza">
<l n="1">Rubens, fleuve d'oubli, jardin de la paresse,</l>
<l n="2">Oreiller de chair fraîche où l'on ne peut aimer,</l>
<l n="3">Mais où la vie afflue et s'agite sans cesse,</l>
<l n="4">Comme l'air dans le ciel et la mer dans la mer ;</l>
<!-- nœud commentaire -->
<lg type="stanza">
<l n="5">Léonard de Vinci, miroir profond et sombre,</l>
<l n="6">Où des anges charmants, avec un doux souris</l>
<l n="7">Tout chargé de mystère, apparaissent à l'ombre</l>
<l n="8">Des glaciers et des pins qui ferment leur pays ;</l>
<gap reason="sampling" quantity="9" unit="stanza"/>
<div type="longPoem">
<head>La Muse malade</head>
<gap reason="sampling" quantity="4" unit="stanza"/>
TEI is a common model for all the different publication projects engaged in SynopsX.
Some of the specific historical datas we are dealing with need specific lemmanisation.
In that use case, the XQuery Full-text standard implementation in BaseX could be very usefull to built by ourselves this language treatments.
- for distribution and re-use
- for enrichment
The rich fonctionalities of XQuery update are expected to be used to implement an annotating client as annotator.js to be able to annotate the sources directly in XML-TEI.
But XQuery Update could also use to manipulate collections of texts in the data base for bunk changes, etc.
TGIR Huma-Num's infrastructure is composed of :
- 12 To for storage capacity for the Information Systems
- 120 To for general secured storage
- 100 computational cores
- 18 servers
- the TGIR Huma-Num http://www.huma-num.fr is a very large facility which aims to facilitate the digital turn in humanities and social sciences. TGIR Huma-Num coordinates the participation of France in the European digital research infrastructure for the arts and humanities DARIAH http://www.dariah.fr
The scope of SynopsX is on the one hand to help a single researcher to easily publish, explore and expose their XML data, and on the other hand for ITs teams to collaborate, mutualize and genericize their efforts. The project is candidate to join the facilities offered to researchers in arts and humanities by the TGIR Huma-Num.
- research by projects
- code quality & best practices
- not reinventing the wheel
- sustainability
- maintainability
- reusability
