Skip to content

DH words vademecum

ChiaraPalladino edited this page Dec 14, 2017 · 1 revision

A short vademecum* of scary words in Digital Humanities Chiara Palladino

Annotation: a scholarly statement establishing a connection between a string in a text (a name, a place, a word, anything of interest) and an external authority that defines, disambiguates and specifies it (e.g. a dictionary, a lexicon, a gazetteer, a prosopography). Linked Open Data requires a specific computational format of annotation that facilitates interchange, which is called Open Annotation. Here, Open Annotation is the connector between the source (named conventionally target) and the external reference related to the source (named conventionally body). The body is generally identified with a URI. An annotation can be structurally internal (inline) or external (stand-off): if inline, it is contained in the same file that hosts the source (for example, TEI markup is inline); if external, it is contained in a different place than the source.

API (Application Programming Interface): a set of tools that ensures the interoperability of the data produced by a specific project into other projects. APIs allow to join together online resources to enable mashups and additional applications using the same data.

CSV (Comma-Separated Values): Text files that store data in tabular format: each line on the file is a record, and corresponds to a line in a table; each record has several fields, delimited by commas, or another field separator (e.g. semi columns): each comma indicates a column in a table. CSV is a very simple format, generally used for database files: it is the equivalent of more commonly handled Excel files, and can be easily opened and modified with any Excel reader. A CSV text file looks like this:

Euboea;http://pleiades.stoa.org/places/540775;Euboea Ins.;38.53;23.87;NATURAL_FEATURE;;VERIFIED;;;;;

Andros;http://pleiades.stoa.org/places/589693;Andros Ins.;37.85;24.86;NATURAL_FEATURE;;VERIFIED;;;;;

Tenos;http://pleiades.stoa.org/places/590074;Tenos Ins.;37.607;25.114;NATURAL_FEATURE;;VERIFIED;;;;;

EpiDoc (Epigraphic Documents in TEI XML): EpiDoc is a specific subset of TEI, that is, a selection of TEI tags used and standardized specifically to allow the representation of scholarly and educational editions of ancient documents. It stems from the preliminary assumption that ancient texts, particularly text-bearing objects, need specific standards of representation that bear particular attention to the history and materiality of the text, and to the meaningful encoding of editorial observations (traditionally signaled in print editions through the use of sigla). It was originally developed as a standard for the publication of digital editions of ancient inscriptions, but its domain has expanded to include papyri and manuscripts.

Further reading: The EpiDoc Guidelines (latest version): http://www.stoa.org/epidoc/gl/latest/ ; EpiDoc website: https://sourceforge.net/p/epidoc/wiki/Home/

Gazetteer: online gazetteers are authoritative repositories of names of places and locations identified by means of unique and stable URIs. They can be (but not necessarily) associated with geographical coordinates. One of the most important online gazetteers for the Ancient world is, for example, Pleiades (https://pleiades.stoa.org/).

Geo-annotation and geo-tagging: geo-tagging indicates the marking of a geographical entity (a location, place-name, toponym, anything acknowledged as a spatial entity) on the source, by means of a “tag” or a “markup”. Geo-annotation is the additional task of associating that entity to an external reference, generally a Gazetteer, which provides a unique, stable identifier for that entity and any additional information associated to that entity in the external reference. Geo-annotation ensures that the marked entity complies to a canonical reference system, and is uniquely identified and disambiguated by any other entity bearing the same name or characteristics. If the system used is LOD-compliant, the annotation will also make that information available for any other user.

GeoJSON: a format for encoding various types of geographic features by using JSON. Geographic features are encoded with properties and types, including their spatial extent (e.g. coordinates). It is a very simple and frequent format in Spatial Humanities and databases. A GeoJSON document looks like this:

{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}

Geo-reference: geo-reference is the general task of associating geographical coordinates (longitude and latitude) to a place or name mentioned in a source. It can be based on already available databases of locations and coordinates (such as gazetteers), but it can also be completely manual, derived from the assignment of a geographical location to a given place name.

GIS (Geographic Information System): a system of technologies and tools designed to represent and analyze spatial information displayed on modern cartographic maps.

Further reading: “A gentle GIS introduction” on QGIS: http://docs.qgis.org/2.14/en/docs/gentle_gis_introduction/

GitHub: an online collaborative platform that allows storage and editing of code. GitHub is particularly useful for collective projects, as it is essentially free (in its base version), allows easy sharing of code, use of embedded text editors, writing of documentation, and especially the tracking of the various versions of a document. Website: https://github.com/.

HTML (HyperText Markup Language): a markup language that is used for the display of online content. Like XML, is a type of descriptive markup, but it differs from XML because it has more to do with style and presentation aspects of the content, while XML is a language that is functional to preserving the structural aspects of the document. As such, HTML can be enriched with a variety of style features that make the presentation of content more attractive and rich, depending on the variety and purpose of the source represented.

JSON (JavaScript Object Notation): a subset of the Programming Language named JavaScript, it is a language that provides a “universal”, generalized data structure that can be supported by any programming language. It is, therefore, considered interchangeable with programming languages and it is not specific to any of them: it can be used by programmers of Python, Java, C, Perl, and so on, with virtually few adaptations. Structurally, it is made up of a collection of name/value pairs often realized as an object, and of an ordered list of values often realized as an array. A basic JSON document generally looks like this:

myObj = {
"name":"John",
"age":30,
"cars": {
"car1":"Ford",
"car2":"BMW",
"car3":"Fiat"
}
}

Further reading: “Introducing JSON”: http://www.json.org/index.html; basic JSON tutorial on W3Schools: https://www.w3schools.com/js/js_json_intro.asp.

KML (Keyhole Markup Language): a markup language specific to the expression of geographical information, particularly locations and coordinates. It is a derivate of XML, therefore uses the same standard and is based on nested elements and attributes: in the code, KML is indicated with the appropriate namespace. KML is especially used in Google Earth: this means that if you want to represent your own document in Google Earth, you will need to have it in KML format. A KML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>Simple placemark</name>
<description>Attached to the ground. Intelligently places itself at the height of the underlying terrain.</description>
<Point>
<coordinates>-122.0822035425683,37.42228990140251,0</coordinates>
</Point>
</Placemark>
</kml>

Further readings: KML Reference: https://developers.google.com/kml/documentation/kmlreference ; KML tutorial: https://developers.google.com/kml/documentation/kml_tut

Linked Open Data (LOD): One of the crucial components of the Semantic Web, it is an ensemble of technologies and standards that allow to introduce vaste connectivity between clusters of content related to a specific concepts. It provides certain rules and protocols that enable to link together all the resources that comply to them, by means of the things that they have in common, not by a shared semantic vocabulary. When several projects comply their standard to Linked Open Data, the user is able to access all their resources at one time, by means of the information that they have in common.

Further Reading: T. Berners Lee, Linked Data: https://www.w3.org/DesignIssues/LinkedData.html

Map Tile: The smaller subcomponent of web maps: map tiles are square images of minor extension (typically 256x256 pixels) placed side by side in order to create an extended web map. They allow web maps to have zooming options up to the street level, and have a specific coordinate scheme which allows easy use, download and display options. Map tiles are generally used for further display of geographical information in browsers, desktop and mobile applications, and several other technologies.

Further reading: How web maps work, by the MapBox Team https://www.mapbox.com/help/how-web-maps-work/.

Markup Language: The word markup was used in the past to describe typographical instructions, in order to instruct a typist on how a particular document should be printed. All printed text have an “implicit” markup, which is visible on the page in the form of phenomena such as capitalization, italics, bold, punctuation marks, spaces. In Computer Science, we define markup language or markup encoding as a set of rules that specify the structural aspects of a document, giving instructions on how to display its layout in an electronic environment. A markup language makes the “typographic instructions” explicit, whose visible effects are displayed on a webpage: such instructions may relate to the structure of the document or to more semantically meaningful aspects (such as italics, when used to indicate a special type of emphasis). Depending on which aspects are privileged, there are several types of markup languages (e.g. HTML, XML, Markdown, etc.).

Further reading: “Markup language” on Wikipedia: https://en.wikipedia.org/wiki/Markup_language.

Metadata: literally, “data about data”, it indicates a set of information that provides a context for a document in the form of a set of resource descriptions. In other words, metadata about a document provide the essential information about its author, date, editor, etc., with a system similar to physical archives and libraries. In Digital Humanities, metadata are essentially referenced to as resource descriptions belonging to a common vocabulary and with a core set of elements, essential to the description of any document accessible online. This core set has been defined by the Dublin Core Metadata Initiative (DCMI).

Further reading: “Metadata basics” on the Dublin Core: http://dublincore.org/metadata-basics/.

Named Entities: distinct from Named Entities in markup languages (= special characters that are forbidden in those languages, and are therefore replaced by specific strings of symbols and numbers), Named Entities in Spatial Humanities can also define anything that is specifically identified by a definition, and has a specific value: people (groups, titles, ethnics, personal names…) and places (cities, settlements, countries, etc.) are named entities, but also chronological periods (e.g. Middle Ages, Cenozoic), dates, titles of works and citations.

NER (Named Entitiy Recognition): the process of identifying proper nouns, isolating them from the other strings in a text, and then disambiguating them attaching them to specific and unique identities (e.g. Alexandria is a place in Egypt in that specific source, and not a person named Alexandria, or any Alexandria in Asia). Named Entity Recognition is generally performed through automatic and computational processes, but can also be performed manually through annotation.

Ontology: A concept inherited from Philosophy, where it denotes the study of being, to Computer Science, where it indicates practices for representing knowledge in an organized form and expressed in machine-readable languages. Ontologies are used to model the structure of a domain knowledge in a formal way, by means of the organization of its relevant entities and relations. In the Semantic Web, ontologies are used to specify standard conceptual vocabularies: they define semantically meaningful information according to a shared vocabulary, which associates specific labels and inherited properties to any element of a source.

Further reading: Tom Gruber, “Ontology (Computer Science)”, in Encyclopedia of Dabase Systems, http://tomgruber.org/writing/ontology-definition-2007.htm.

Plain Text Format (.txt): *a file representing only readable characters, with no additional style specification.

Raster data: Typically used in GIS Applications, a raster in its simplest form is a matrix of cells organized as tabular data. Each cell contains a value representing information. Rasters can be aerial photographs, satellite images, maps, and can represent various kinds of real-world phenomena, such as temperatures, elevation, land-use, demographics. Because they are especially flexible, raster data can be used to provide a map layer for further spatial analysis, thematic maps, surface maps of dynamic phenomena (e.g. climate changes).

Further reading: “What is the difference between Vector and Raster Data?”, a useful discussion on gis.stackexchange.com: https://gis.stackexchange.com/questions/7077/what-are-raster-and-vector-data-in-gis-and-when-to-use; “GIS data types: Vector and Raster”: http://gisgeography.com/spatial-data-types-vector-raster/ ; “Raster Data” in QGIS: https://docs.qgis.org/2.6/en/docs/gentle_gis_introduction/raster_data.html

RDF (Resource Description Framework): The basic relational model of Linked Open Data. It is an abstract and generic data model aimed at representing and storing any information instance in the form of a subject-predicate-object triple. It does not provide any kind of semantic information on the content of the statement, which is served by other predefined vocabularies, typically expressed as ontologies.

Further reading: W3C RDF primer: https://www.w3.org/TR/rdf11-primer/.

Schema: The set of predefined rules against which an XML document has to be validated. A schema designs the “syntax” of the design of an XML document, namely, specifies which elements and attributes can be used, how they can or must be nested, how many of them can be used, etc.

TEI (Text Encoding Initiative): TEI is properly a consortium, whose task is to maintain and perfect a standard for the representation of texts in digital form, and to provide guidelines for the community that uses it. Generally, however, TEI indicates a particular subset stemming from XML markup, that is, a collection of XML tags selected and designed with meaningful specific features in order to allow the representation of documents in the digital world. TEI is specifically designed for Humanities documents, and its purpose is to facilitate their representation by providing some fixed guidelines that take into account the specificities of several types of primary sources (manuscripts, transcripts, poetry, scripts, etc.).

Further readings: the TEI Guidelines (latest version): http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html.

Triple: a statement expressed in a formal language and in three parts, namely, subject predicate and object. The subject represents the resource being talked about, the predicate the relation between subject and object, the object the content of the statement made about the resource. It is the conceptual format of the representation of concepts published on the internet, traditionally expressed in RDF.

URI (Uniform Resource Identifier): A string of characters following a specific schema, and used to identify a resource. Referencing a resource to a URI ensures a stable and unique identification to it, and enables its representation in the World Wide Web. The most common form of URI is the URL (Uniform Resource Locator).

Further reading: “Uniform Resource Identifier” on Wikipedia: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier.

URL (Uniform Resource Locator): the most common form of URI, also called web address. It identifies a web resource in the same way as a URI, but also specifies how to obtain the representation of it in the form of HTML and related code (usually by means of http, Hypertext Transfer Protocol, which is placed at the head of the URL), from a network host whose domain name is contained in the URL. An example of url is: http://example.org.

Further reading: “Uniform Resource Locator” on Wikipedia: https://en.wikipedia.org/wiki/Uniform_Resource_Locator.

URN (Uniform Resource Name): a type of URI that identifies a resource by name in a particular namespace, and does not require how or where to obtain the resource. They have the advantage of identifying a resource independently from its location, since locations may change over time; however, they need to be associated with a specific set of instructions and protocols in order to be useful. An example of URN is urn:nbn:de:bvb:19-146642 (National Bibliography Number for a document).

Further reading: “Uniform Resource Name” on Wikipedia: https://en.wikipedia.org/wiki/Uniform_Resource_Name.

Vector: Typically used in GIS Applications, a vector is another way to represent real-world information in GIS. Vector data represent features of a landscape, in the form of geometries, and provides attributes to its components, which are indicated by numerical or textual information. A geometry is made up of one or more interconnected vertices, which describe a position in space using an X, Y and Z axis. Depending on the number of vertices and their connections, geometries can be referred to as points, polylines and polygons. The use of these geometries to represent features of a landscape depends on various factors, especially the scale of the map.

Further reading: “Vector data” in QGIS: https://docs.qgis.org/2.6/en/docs/gentle_gis_introduction/vector_data.html.

XML (eXtensible Markup Language): a markup language with the specific purpose of separating structure and content of a text, making them stable, readable and at the same time flexible. Marking a document in XML makes it readable for machines and human beings, because its schema keeps the structural elements of the document stable. At the same time, however, it allows for a certain flexibility, as you can define and mark the specific format of your document and its meaningful structural elements dynamically, accordingly to the specific necessities. XML is a type of descriptive markup, that is, it is used to label parts of a document, rather than provide instructions on how it should be processed, in order to make its structure explicit. An XML document looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Further readings: TEI, “A gentle introduction to XML”: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html ; XML Basics online tutorial: https://www.w3schools.com/xml/

*Vademecum: va-de-me-cum, a handbook or a guide that is kept constantly at hand for consultation (in case you didn't know!).