You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To improve the getting-started UX, we want to enable users to have an easy way to load a large amount of data quickly (to show the speeds that are attainable), without themselves having to write a producer/multi-consumer parallelised loader. Instead, we can build this paradigm into console.
The goal is to have a way to consume a file consisting purely of insert or match-insert, which we can restrict to having one query per line. This file can be large (100s of mb or some gb's probably), so it must use multiple transactions to load the data. Compare this to the source command that we currently have within the transaction inner REPL, which is by definition of being in the transaction REPL, a single transaction.
The feature will probably live at a session or top level repl and could look like this:
Without the --parallel flag, we should sequentially load the queries in batches, because the queries may have inter-dependencies between each other. In the help menu we should print that using the --parallel flag requires that each query be independent of each other (eg. not use prior insert's results). This allows us to go the BioGrakn-Semmed migrator style of data loading with a file-reader thread piped into a blocking queue and read by multiple writer threads, which parallelise transaction batches into the server.
This loader command should also be silent, or show a progress bar, unlike the source command which prints the output of every query, making it slow not only on the extra network time and round trips for collecting the printed data, but also because printing itself is slow.
The text was updated successfully, but these errors were encountered:
Issue to Solve
To improve the getting-started UX, we want to enable users to have an easy way to load a large amount of data quickly (to show the speeds that are attainable), without themselves having to write a producer/multi-consumer parallelised loader. Instead, we can build this paradigm into console.
The goal is to have a way to consume a file consisting purely of
insert
ormatch-insert
, which we can restrict to having one query per line. This file can be large (100s of mb or some gb's probably), so it must use multiple transactions to load the data. Compare this to thesource
command that we currently have within thetransaction
inner REPL, which is by definition of being in the transaction REPL, a single transaction.The feature will probably live at a session or top level repl and could look like this:
Without the
--parallel
flag, we should sequentially load the queries in batches, because the queries may have inter-dependencies between each other. In the help menu we should print that using the--parallel
flag requires that each query be independent of each other (eg. not use prior insert's results). This allows us to go the BioGrakn-Semmed migrator style of data loading with a file-reader thread piped into a blocking queue and read by multiple writer threads, which parallelise transaction batches into the server.This loader command should also be silent, or show a progress bar, unlike the
source
command which prints the output of every query, making it slow not only on the extra network time and round trips for collecting the printed data, but also because printing itself is slow.The text was updated successfully, but these errors were encountered: