You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From a first look at their code it seems they split the input file into chunks and process each chunk in a different thread. At least for plain text files. They determine the chunks by seek'ing through the file for a predetermined number of bytes, then continue until they find a newline (and deal with quoted sections that span newlines). After the first pass through the file based on seek()ing they then start the threads to do the actual work. Not sure how we would copy this idea if we want to support streaming (can't seek).
Not sure I understand why they do this. Reading the bytes from disk shouldn't take a lot of CPU/time. Stuffing stuff into a Q needs extra memory but removes the need to do complicated seeking.
How big a buffer do we need to keep N>1 consumers busy?
http://www.wise.io/tech/paratext
The text was updated successfully, but these errors were encountered: