ShExStatements allows the users to generate shape expressions from simple CSV statements, CSV files and Spreadsheet. shexstatements
can be used from the command line as well as from the web interface.
Set up a virtual environment and install shexstatements
.
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install shexstatements
Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.
$ shexstatements.sh examples/language.csv
Clone the ShExStatements repository.
$ git clone https://github.com/johnsamuelwrites/ShExStatements.git
Go to ShExStatements directory.
$ cd ShExStatements
Install modules required by ShExStatements (here: installing into a virtual environment).
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.
$ ./shexstatements.sh examples/language.csv
CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.
$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"
But sometimes, users may like to specify the header. In that case, they can make use of -s
or --skipheader
to tell the generator to skip the header (firsrt line of CSV).
$ ./shexstatements.sh --skipheader examples/header/languageheader.csv
It is also possible to work with Spreadsheet files like .ods, .xls or .xlsx.
$ shexstatements.sh examples/language.ods
$ shexstatements.sh examples/language.xls
$ shexstatements.sh examples/language.xlsx
In all the above cases, the shape expression generated by ShExStatements will look like
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
wdt:P31 [ wd:Q34770 ] ;# instance of a language
wdt:P1705 LITERAL ;# native name
wdt:P17 .+ ;# spoken in country
wdt:P2989 .+ ;# grammatical cases
wdt:P282 .+ ;# writing system
wdt:P1098 .+ ;# speakers
wdt:P1999 .* ;# UNESCO language status
wdt:P2341 .+ ;# indigenous to
}
Use -j
or --shexj
to generate ShEx JSON Syntax (ShExJ) instead of default ShEx Compact syntax (ShExC).
$ ./shexstatements.sh --shexj examples/language.csv
The output will be similiar to:
{
"type": "Schema",
"start": "language",
"shapes": [
{
"type": "Shape",
"id": "language",
"expression": {
}
}
]
}
It's also possible to use application profiles of the following form
Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation
and Shape expressions can be generated using the following form
$ ./shexstatements.sh -ap --skipheader examples/languageap.csv
- Easily generate shape expressions (ShEx) from CSV files and Spreadsheets
- Simple syntax
A detailed documentation can be found here. with a number of example CSV files in the examples folder.
All the test cases can be run in the following manner
$ python3 -m tests.tests
Code coverage report can also be generated by running the unit tests using the coverage tool.
$ coverage run --source=shexstatements -m unittest tests.tests
$ coverage report -m
shexstatements
can also be accessed from a web interface.
Clone the ShExStatements repository.
$ git clone https://github.com/johnsamuelwrites/ShExStatements.git
Go to ShExStatements directory.
$ cd ShExStatements
Install modules required by ShExStatements (here: installing into a virtual environment).
$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .
Now run the application.
$ ./shexstatements.sh -r
Check the URL http://127.0.0.1:5000/
ShExStatements also has an API to generate ShEx from CSV and is described here.
Online demonstrations are also available:
- John Samuel
- Contributors
- ShExStatements: Simplifying Shape Expressions for Wikidata , John Samuel, Wiki Workshop 2021 (held at The Web Conference 2021), 14 April 2021 (PDF, Slides)
- Wikidata Community
All code are released under GPLv3+ licence. The associated documentation and other content are released under CC-BY-SA.