PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing #5

lfdversluis · 2020-06-24T11:01:39Z

Each sub-set of data and each data source can be processed in parallel. Dask can be used to parallelize this.

lfdversluis · 2020-09-05T15:49:41Z

https://joblib.readthedocs.io/en/latest/ Seems promising.

lfdversluis · 2020-09-06T08:28:06Z

Perhaps investigating if the XML file and the JSON files of Semantic Scholar / AMiner can be processed at an item-level parallelization might me interesting. With joblib linked above, file-level parallelization becomes possible, yet the JSON files are structured in such a way that each line in the file is one (standalone) JSON object. Perhaps parsing these in parallel is even faster.

lfdversluis · 2020-09-06T08:29:04Z

Setting up some benchmarks + regression tests might be a nice idea as well.

lfdversluis added enhancement New feature or request BSc project labels Jun 24, 2020

lfdversluis changed the title ~~Make AIP runnable through Dask or another platform to parallize the parsing~~ PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing Sep 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing #5

PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing #5

lfdversluis commented Jun 24, 2020

lfdversluis commented Sep 5, 2020

lfdversluis commented Sep 6, 2020

lfdversluis commented Sep 6, 2020

PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing #5

PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing #5

Comments

lfdversluis commented Jun 24, 2020

lfdversluis commented Sep 5, 2020

lfdversluis commented Sep 6, 2020

lfdversluis commented Sep 6, 2020