You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
lfdversluis
changed the title
Make AIP runnable through Dask or another platform to parallize the parsing
PERFORMANCE: Make AIP runnable through Dask or another platform to parallize the parsing
Sep 6, 2020
Perhaps investigating if the XML file and the JSON files of Semantic Scholar / AMiner can be processed at an item-level parallelization might me interesting. With joblib linked above, file-level parallelization becomes possible, yet the JSON files are structured in such a way that each line in the file is one (standalone) JSON object. Perhaps parsing these in parallel is even faster.
Each sub-set of data and each data source can be processed in parallel. Dask can be used to parallelize this.
The text was updated successfully, but these errors were encountered: