Imports raw JSON to Elasticsearch in a multi-thread way
We have 5 state here
- Only validating data
 - Import data to ElasticSearch without validation
- Import using single-thread
 - Import using multi-thread
 
 - Import data to ElasticSearch after validation
- Import using single-thread
 - Import using multi-thread
 
 
Install the elasticsearch package with pip :
pip install elasticsearchRead more about versions here
--data          : The data file
--check         : Validate data file
--bulk          : ElasticSearch endpoint ( http://localhost:9200 )
--index         : Index name
--type          : Index type
--import        : Import data to ES
--thread        : Threads amount, default = 1
--help          : Display help message
I suggest you check your data before ( or during ) import process
python import.py --data test_data.json --checkpython import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_namepython import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --checkpython import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --thread 16python import.py --data test_data.json --import --bulk http://localhost:9200 --index index_name --type type_name --check --thread 16We have much faster process using multi-thread way. It depends on your computer/server resources. This script used linecache to put data in RAM, so you need enough memory capacity too
- AMD Ryzen 3800X ( 8 core / 16 thread )
 - 64GB Ram ( 3000MHz / CL16 )
 - Windows 10
 - 10Gb JSON file with ~24 million objects
 - Elasticsearch v7
 
The whole process took about ~30 minutes and the usage of resources were efficient
- Fork it!
 - Create your feature branch : 
git checkout -b my-new-feature - Commit your changes : 
git commit -am 'Add some feature' - Push to the branch : 
git push origin my-new-feature - Submit a pull request :D
 
Each project may have many problems. Contributing to the better development of this project by reporting them


