Name		Name	Last commit message	Last commit date
parent directory ..
files/799/987		files/799/987
Makefile		Makefile
README.md		README.md
predicates_discovered.txt		predicates_discovered.txt

README.md

govdocs1

Sample data in this directory is based on analysis of files from the govdocs1 corpus, documented here:

https://digitalcorpora.org/corpora/files

The directories are named after the identifying number of a file from the corpus. E.g. 000001.jpg would have an analysis under files/000/001. Note that because the file extensions are just suggestions from the corpus creators ("not part of the corpus"), they are not included in the directory naming structure here.

Predicates discovered

A file in this directory lists the ExifTool RDF predicates discovered from analyzing the JPEG files of the govdocs1 corpus. (These are the files extracted from files.jpeg.tar.) The file was produced by extracting the .tar file into a directory, descending into that directory, and running this command:

exiftool \
  -binary \
  -duplicates
  -recurse \
  -xmlFormat \
  . \
  | sed \
    -e "s_rdf:Description rdf:about='./_rdf:Description rdf:about='http://example.org/kb/govdocs1/_" \
    > ../files.jpeg.xml

Then, this SPARQL query run against the 527MB file files.jpeg.xml yielded the predicates:

SELECT DISTINCT ?p
WHERE {
  ?s ?p ?o .
}

The results are here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

govdocs1

govdocs1

README.md

govdocs1

Predicates discovered

Files

govdocs1

Directory actions

More options

Directory actions

More options

Latest commit

History

govdocs1

Folders and files

parent directory

README.md

govdocs1

Predicates discovered