RDFTables

This application and library converts delimited value/tabular files, e.g. CSV or TSV, into a variety of RDF serialisation. Configuration is done by modifying the header of the file to provide Class, Property and Datatype information. It is intended to be as brief to implement and unintrusive as possible.

Features:

RDF structure is respected and reported.
A folder can be coverted into a single or multiple files.
The table does not have to be regularly formed, i.e. sparse or ragged.
Gaps and repeated columns/rows will not cause an issue.
Standard sets of prefixes and datatypes can be added to the predefined set for consistent conversion.
Range of RDF serialisations.

File Structure

Header

Header items are delimited by the reserved "|" (pipe) character. Column 0 header is treated differently, see below. Each header column can have one to three items in the order:

Property URI (predicate) between the target Object (subject) and this column (object). This relationship can be inverted (e.g. for a Foreign Key) by starting the Property URI with "^", but will be rejected if applied to a Literal/Datatype column.
Datatype or Class. If not specified then an Object with no class is assumed. This allows Class to be inferred from the schema, asserted in another file or provided elsewhere. Class names are distinguished from Datatypes by ":", e.g. ":test" uses the base URI to form a Class for an Object, while "test" uses the base URI to form a Datatype. e.g. ":my:test" uses the prefix "my" to form a Class for an Object, while ":http://example.org/my#test" explictly forms a Class.
[OPTIONAL] The target column can be specified as an integer value (default: 0) to allow Properties to be added to Objects within the file.

Column 0

The first column contains the base URI and the Class of the column. The base URI is used as the default prefix for Objects, Properties, Classes and Datatypes that are not URIs and do not have a prefix.

Data

The first column MUST be an Object. The remainder of the data can be Objects (forming ObjectProperty relationships) or Literals (forming DatatypeProperty relationships). Gaps for columns are ignored with no warnings. Multiple columns with the same or similar items do not cause any issues.

Objects:

Explicit URIs are preserved unchanged, e.g. "http://example.org/my#ClassA".
Prefixes are expanded using the loaded prefixes, e.g. "my:ClassA" becomes "http://example.org/my#ClassA".
All other cases the base URI is applied, e.g. "ClassA" becomes "http://example.org/my#ClassA".

Objects will be created by default with rdfs:label using the local name portion of the URI, but can be switched off globally. It can also be switched on globablly for all Objects to be created as a members of the class owl:NamedIndividual, but switched off by default.

Literals:

The Datatype URI specified in the column header is applied with the data in the cell to form a Literal.

Getting Started

RDFTables can be accessed as a library using Maven etc. from Maven Central or on the command line.

<dependency>
    <groupId>io.github.galbiston</groupId>
    <artifactId>rdf-tables</artifactId>
    <version>1.0.2</version>
</dependency>

API

File and Folder conversion methods are contained in the FileReader class. Arguments follow the same conventions as the command line arguments below.

Command Line Arguments

1) Input File/Folder

--input, -i

The source for the conversion process.

2) Output File/Folder

--output, -o

The destination for the conversion process. Specifying a folder will re-use the file/s name. Combining an input folder with an output file will consolidate the output into a single file.

3) Delimiter/Separator Value

--delim, -l

The column delimiter/separator in the input file. Defaults to comma but any character string can be used except for reserved characters ":", "^" and "|". Keywords TAB, SPACE and COMMA are also supported.

4) Output Format/Serialisation

--format, -f

The file serialistion used for the RDF output.

JSON-LD json-ld
NTriples nt
NQUADS nq
RDF/JSON json-rdf
RDF/XML xml
RDF/XML PLAIN xml-plain
RDF/THRIFT thrift
TRIX trix
TRIG trig
Turtle ttl (Default)

5) Prefixes File

--prefixes, -p

A file of key=value pairs with no header (Java Properties format). Key is the prefix label and value is the URI for the prefix. Defaults to searching the input folder and current directory for "prefixes.prop".

Pre-loaded prefixes:

6) Datatypes File

--datatypes, -d

A file of key=value pairs with no header (Java Properties format). Key is the datatype label and value is the URI for the datatype. Defaults to searching the input folder and current directory for "datatypes.prop".

Pre-loaded XSD datatypes:

boolean
decimal
date
dateTime
double
duration
int
integer
nonNegativeInteger
nonPositiveInteger
positiveInteger
string
time

7) OWL NamedIndividual

--named, -n

Boolean value for creating OWL NamedIndividuals in the data. Default: true

8) Excluded Files

--exclude, -x

Excluded files not to be used as input from a folder.

9) Properties File

Supply the above parameters as a file:

$ java Main @/tmp/parameters

TDB Compilation

Library methods are provided for compiling to TDB graphstore by named graph, see TDBBuilder.compileFolder().

Future Work

Items that can be developed based on feedback and other suggestions.

URI checking is minimal when reading the file data but errors will be thrown when adding to the RDF model.
Multiple files to separate graphs - e.g. NQUADS serialisation. Graph name would be specified as third item in column 0.
Read configuration from a property file.
SHACL integration to importa schema and validate the data structure.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
licence.txt		licence.txt
notice.md		notice.md
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDFTables

File Structure

Header

Column 0

Data

Getting Started

API

Command Line Arguments

1) Input File/Folder

2) Output File/Folder

3) Delimiter/Separator Value

4) Output Format/Serialisation

5) Prefixes File

6) Datatypes File

7) OWL NamedIndividual

8) Excluded Files

9) Properties File

TDB Compilation

Future Work

About

Releases 4

Packages

Languages

License

galbiston/rdf-tables

Folders and files

Latest commit

History

Repository files navigation

RDFTables

File Structure

Header

Column 0

Data

Getting Started

API

Command Line Arguments

1) Input File/Folder

2) Output File/Folder

3) Delimiter/Separator Value

4) Output Format/Serialisation

5) Prefixes File

6) Datatypes File

7) OWL NamedIndividual

8) Excluded Files

9) Properties File

TDB Compilation

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages