Skip to content

File Partitioner of random data that simulates HDFS (version 1) data node behaviours but storing locally.

Notifications You must be signed in to change notification settings

ronald-smith-angel/files-partitioner-HDFS-like

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Files Partitioner (HDFS Study Purposes)

This project has some functions to repartion files in chunks using randomly generated data. This an example created to understand locally the HDFS (first version) version main concepts: block size, metadata store and repartition.

Core approach

This is the general approach totally implemented and validated using the test in:

sentiance.service.FolderTest

This approach basically manages the Master DataSet Information as json serialized information in disk.

Is important to note that this json does not contain the files bytes but contains a dictionary with the folders and partitions paths and metadata.

In the Partitioner class you could find MAX_DISK_ALLOCATION that is maximum allocated value for all the values in disk.

If the information becomes larger and does not match MAX_DISK_ALLOCATION the program will take the parallel strategy (not implemented yet).

How do I get set up?

  • Run Unit Test:

    py folder_test.py

  • Run Files Generator

    py data_generation.py {path_ds} {max_value_file} {str_folders}

  • Run Files Update:

    py data_update.py {path_ds} {str_folders}

  • Run folder backup

    py data_back_up.py {input_folder} {output_folder}

About

File Partitioner of random data that simulates HDFS (version 1) data node behaviours but storing locally.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published