This application aims to provide
- an easy to configure,
- lightweight,
- DBMS agnostic and
- memory friendly
approach to unload data to .csv files according to the RFC4180 standard from any SQL database by using JDBC.
The general use case is the ability to extract data from a database without the need of a complex and heavy framework. This data can then be used to perform operations needed by the business such as validation, analysis, anonymization, testing, filtering and more.
The application takes an opinionated view on the format in which the data is extracted. It is kept as minimalistic as possible by cherry picking components from the spring framework to be able to extract data from a database. Nothing more, nothing less.
To configure the application the user must provide an application.yml. The below example can be found here.
spring:
datasource:
url: jdbc:mysql://localhost:3306/sakila?useSSL=false
username: root
password: helloworld
driver-class-name: com.mysql.jdbc.Driver
output-dir: target/
commit-interval: 10
page-size: 10
delimiter: "|"
quote: "\""
quote-escape: "\\"
table-definitions:
- file-name: Actor.csv
select-query: "SELECT actor_id, first_name, last_name"
from-query: "FROM Actor"
sort-column: actor_id
Datasource
The user must provide datasource credentials.
Furthermore, depending on the database, it might be required to add a jdbc driver to the class path. One way to do this is by adding a driver as a dependency using maven. The current POM file contains only a mysql driver and a hsqldb driver for test purposes. It is up to the user to add a driver corresponding to the actual database used.
<!-- START:: production database drivers -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<!-- END:: production database drivers -->
General application configuration
- The output directory will be created based on the relative path to which the jar is deployed. It assumes write permission.
- The commit interval can be used to indicate how much rows should be kept in memory before writing them to a file.
- The page size determines how many records are fetched per read from the database.
- The delimiter is the separation character between columns.
- The quote is the character used to surround the content of a column if a column contains the delimiter.
- The quote-escape is the character used to escape an AS-IS quote character if a column contains the quote character.
Specific application configuration
For each table the user wishes to unload, a set of four configurations are needed:
- The desired file name followed by the .csv extension.
- The select part of the prepared statement query.
- The from part of the prepared statement query.
- The column used to order by.
After setting up an application.yml and possibly adding the correct jdbc driver to the classpath, the application can be packaged using maven.
The application can be executed by running the jar. The application can be made aware of its configuration by pointing it in the right direction by using spring.config.location as an argument.
java -jar jdbc-unload-1.0.0.jar --spring.config.location=yourlocation/application.yml
- Spring Boot 1.5.3.RELEASE official documentation
- Accessing Relational Data using JDBC with Spring guide
- Spring Batch JdbcPagingItemReader