This is a set of Kotlin libraries for reading and importing beneficial ownership data in the BODS JSON format.
The objective is to provide JVM capabilities to process BODS data, as well as enabling applications to import BODS registers with popular storage solutions.
The Open Ownership register is the reference for most of the work in these libraries, and the tools provided here support the download and ingestion of this register out of the box.
Module | API docs | Description |
---|---|---|
kbods-rdf | API Docs | RDF vocabulary and related tooling. |
kbods-read | API Docs | Download, unpack and process BODS datasets |
kbods-elasticsearch | API Docs | Import a BODS dataset to Elasticsearch |
kbods-mongo | API Docs | Import a BODS dataset to MongoDB |
The libraries all have kbods-read
as a transitive dependency, but otherwise they can be used separately or together in the same project.
dependencies {
implementation "io.resoluteworks:kbods-read:${kbodsVersion}"
implementation "io.resoluteworks:kbods-elasticsearch:${kbodsVersion}"
implementation "io.resoluteworks:kbods-mongo:${kbodsVersion}"
implementation "io.resoluteworks:kbods-rdf:${kbodsVersion}"
}
Since the operations in these libraries are mostly stateless, the design is based on Kotlin extension functions. This hopefully makes the code more readable and enables a fluent integration.
Please refer to the module's README for complete details.
// Download, unzip and import the latest Open Ownership register to Elasticsearch
BodsDownload.latest().import(
elasticsearchClient = esClient,
index = "my-index",
batchSize = 100
)
// Import a local JSONL file to Elasticsearch
val jsonlFile = File("/path/to/statemenets.jsonl")
esClient.importBodsStatements(jsonlFile, "myindex", 100)
// Download, unzip and import the latest Open Ownership register to MongoDB
val collection = mongoClient.getDatabase("mydb").getCollection("mycollection")
BodsDownload.latest().import(
collection = collection,
batchSize = 100
)
// Import a local JSONL file to MongoDB
val jsonlFile = File("/path/to/statemenets.jsonl")
collection.importBodsStatements(jsonlFile, 100)
This is the base module for reading and loading a BODS register and the rest of the libraries are built on top of this.
// Download and parse the latest Open Ownership register
BodsDownload.latest().readStatements { bodsStatement: BodsStatement ->
// Process BodsStatement
}
// Read a BODS dataset from a local file (JSONL or GZ, decompressed if required)
File("/path/to/file.jsonl.gz").useBodsStatements { statements ->
statements.forEach { statement: BodsStatement ->
// Process BodsStatement
}
}
The focus for kbods-rdf is to capture relationships between entities and relevant interests. So there are certain BODS schema definitions which don't have an RDF equivalent (yet). Because of this, you may use and RDF repository exclusively for graph-based queries, and then de-reference JSON statement details from a primary database.
Below is a very crude example on how to address this and import the BODS register in Elasticsearch and an RDF repository while only reading the JSONL dataset once.
val repository = repositoryManager.getRepository("bods-rdf")
repository.connection.use { connection ->
BodsDownload.latest().useStatementSequence { sequence ->
sequence.chunked(1000).forEach { batch ->
esClient.writeBodsStatements(batch, "myindex")
connection.add(batch.toRdf())
}
}
}