Skip to content

kxrtiswithak/DataMigrationProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MySQL Workbench Icon

🔥 Data Migration Project sparta_badge 🔥

Data Migration Project for Sparta Global

About ❓

As part of my training with Sparta Global, I was assigned the task of reading a CSV file and writing it to a SQL database. The CSV file contained a employee details, one a sample of 10,000 entries, the other being 65,000 rows long.

MySQL Workbench and Server were used to monitor and run the database locally, hence the use of the icon.

Dependencies 💻

MVC Architecture 🗼

I adopted an MVC (model view controller) architecture for this project, to organise my packages and program In an easy to understand for other to interpret.

Take for example the controller package, which contains the CSVReader and EmployeeManager classes: both act as intermediaries between the model and view packages, processing the data and then outputting in a presentable manner.

I also created start and util packages for classes that did not fit within the other three packages (start for classes starting the program, util for utility classes used throughout the project).

Threading 👕

One of the requirements our trainer wanted was for the application to use multiple threads. I achieved this by reading in a portion of the CSV file), then writing it to the database. Depending on how many threads the user specifies would determine the size of each portion of records was written in one go e.g. if writing the sample file containing 10k rows using 5 threads, then approximately 2000 would be written at a time (not accounting for duplicates).

I also incorporated batching into my program, whereby instead of only executing one query at a time, they could be added to a batch and ran all at once. Both batching and threading could be toggled by the user (inputting 0 threads and a boolean for batching), something I utilised when carrying out performance tests.

Performance 🎭

I conducted performance testing using parametrized tests using junit params. I created a CSV of a variety of number of threads, as both batched and not batched. There is currently a bug in the testing that causes performances to be wildly inaccurate, this being related to threads and the nature in which the test calls upon the program.

Sample

Large

DAO & DTO 📋

In order to interact with the database, I created a Data Access Object, containing methods to create a table and insert into it, as well as selecting and printing it out.

A Data Transfer Object was used to store the data from the CSV file in a compatible format for the database, something espeically important for the Date field.

Future Enhancements 🛃

  • There is work to be done on fixing the performance tests to behave accurately and provide relevant data.
  • The program lacks functionality testing as well, an instrumental aspect required to confirm the system works as it should and handles all sorts of situations in the correct way.
  • Implementing ExecutorService for more intelligent thread management, thus improving performance would also be a feature I would look to add.
  • Refactoring of CSVReader would also be on the cards, since I believe it has multiple responsibilities, thus not adhering to SOLID principles

About

Data Migration Project for Sparta Global

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages