You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am using Papa Parse with worker: true and step: to be able to parse a huge file off the main thread and with streaming so that the parsed data don't need to be held in memory all at once.
I am using Dexie to store the parsed entries in IndexedDB. Right now Papa Parse is not locking up the page because it is parsing in the web worker and streaming the results, but the IPC between the main thread and the worker thread and Dexie storing the entries one-by-one is locking the page.
The proper solution here I think is to not use worker: true for the implicit worker but instead come up with my own web worker and run Papa Parse in it in worker-less but streaming mode and run Dexie in the same worker so both parsing and storing happens off the main thread but since it is streaming the memory doesn't just grow until the whole file is parsed.
I am trying to push this solution off until I have addressed other aspects of my app though because making TypeScript and the Next build process aware of my web worker is a huge pain which will take a lot of time that I would like to avoid investing for now.
My compromise that I am switching to right now is to use worker: true but no step: so no streaming. I will pay the price of the whole parsed file being in memory and will marshal it between the worker and the main thread in one message using the complete: handler which will contain the entire dataset since there is no step:. Then I can store the data with Dexie in bulk instead of row-by-row.
However with this solution I lose the ability to display progress. Previously I used results.meta.cursor divided by the File instance size (or in case of a URL I did a HEAD request to find out the Content-Length first and then handed off the URL to Papa Parse).
I would like to float the idea of introducing a progress: callback which can only exist when step: is not provided. It would be called every now and then (doesn't have to be for every row but can be) and the main thread will then be able to use it to read cursor and do the progress reporting. This callback should carry no data and the complete: callback should still be the one to relay the data in the lack of step:.
LMK WDYT and whether I missed any way to make this possible with the existing API or made a wrong assumption about how Papa Parse works invalidating the need for this.
The text was updated successfully, but these errors were encountered:
Unfortunately, I wasn't able to implement this without the PapaParse library exposing the progress and completion callbacks. I stuck with loading the whole CSV into memory at once and flushing all the rows as a group.
Hi, I am using Papa Parse with
worker: true
andstep:
to be able to parse a huge file off the main thread and with streaming so that the parsed data don't need to be held in memory all at once.I am using Dexie to store the parsed entries in IndexedDB. Right now Papa Parse is not locking up the page because it is parsing in the web worker and streaming the results, but the IPC between the main thread and the worker thread and Dexie storing the entries one-by-one is locking the page.
The proper solution here I think is to not use
worker: true
for the implicit worker but instead come up with my own web worker and run Papa Parse in it in worker-less but streaming mode and run Dexie in the same worker so both parsing and storing happens off the main thread but since it is streaming the memory doesn't just grow until the whole file is parsed.I am trying to push this solution off until I have addressed other aspects of my app though because making TypeScript and the Next build process aware of my web worker is a huge pain which will take a lot of time that I would like to avoid investing for now.
My compromise that I am switching to right now is to use
worker: true
but nostep:
so no streaming. I will pay the price of the whole parsed file being in memory and will marshal it between the worker and the main thread in one message using thecomplete:
handler which will contain the entire dataset since there is nostep:
. Then I can store the data with Dexie in bulk instead of row-by-row.However with this solution I lose the ability to display progress. Previously I used
results.meta.cursor
divided by theFile
instancesize
(or in case of a URL I did aHEAD
request to find out theContent-Length
first and then handed off the URL to Papa Parse).I would like to float the idea of introducing a
progress:
callback which can only exist whenstep:
is not provided. It would be called every now and then (doesn't have to be for every row but can be) and the main thread will then be able to use it to readcursor
and do the progress reporting. This callback should carry no data and thecomplete:
callback should still be the one to relay the data in the lack ofstep:
.LMK WDYT and whether I missed any way to make this possible with the existing API or made a wrong assumption about how Papa Parse works invalidating the need for this.
The text was updated successfully, but these errors were encountered: