Build optimised bulk file load #161

flyingsilverfin · 2021-06-23T09:32:21Z

Issue to Solve

To improve the getting-started UX, we want to enable users to have an easy way to load a large amount of data quickly (to show the speeds that are attainable), without themselves having to write a producer/multi-consumer parallelised loader. Instead, we can build this paradigm into console.

The goal is to have a way to consume a file consisting purely of insert or match-insert, which we can restrict to having one query per line. This file can be large (100s of mb or some gb's probably), so it must use multiple transactions to load the data. Compare this to the source command that we currently have within the transaction inner REPL, which is by definition of being in the transaction REPL, a single transaction.

The feature will probably live at a session or top level repl and could look like this:

> bulk-load <database> <TypeQL inserts file path> [--parallel]

Without the --parallel flag, we should sequentially load the queries in batches, because the queries may have inter-dependencies between each other. In the help menu we should print that using the --parallel flag requires that each query be independent of each other (eg. not use prior insert's results). This allows us to go the BioGrakn-Semmed migrator style of data loading with a file-reader thread piped into a blocking queue and read by multiple writer threads, which parallelise transaction batches into the server.

This loader command should also be silent, or show a progress bar, unlike the source command which prints the output of every query, making it slow not only on the extra network time and round trips for collecting the printed data, but also because printing itself is slow.

The text was updated successfully, but these errors were encountered:

flyingsilverfin added priority: medium type: feature labels Jun 23, 2021

haikalpribadi self-assigned this Jun 25, 2021

haikalpribadi added this to the TypeDB Rust Rewrite milestone Oct 29, 2021

flyingsilverfin removed this from the TypeDB: Rust Rewrite milestone Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build optimised bulk file load #161

Build optimised bulk file load #161

flyingsilverfin commented Jun 23, 2021 •

edited

Loading

Build optimised bulk file load #161

Build optimised bulk file load #161

Comments

flyingsilverfin commented Jun 23, 2021 • edited Loading

Issue to Solve

flyingsilverfin commented Jun 23, 2021 •

edited

Loading