-
Notifications
You must be signed in to change notification settings - Fork 44
db.createWriteStream() does not respect 'prefix' option #53
Comments
can you tell me more about the usecase? are you doing a bulk load of an entire db? I would be happy to merge a pull request for this. |
I have a ton of data i am trying to initialize the database with, probably like half a million or more small json inserts. The data is currently being streamed in from files on the local machine, transformed into json then inserted in the database. I am using a multilevel database which is running as a separate application. Originally I was just doing puts, and waiting for the callback before proceeding with more puts, assuming the callback would do some kind of flow control, but this was not the case. Eventually the multi-level server would crash. So now I pipe to a writestream on each sublevel. I thought this would fix the problem, but the db server is still crashing about half way through the data. This is less of an issue with sublevel and more of me trying to figure out how to pipeline my insert process without overwhelming the database or introducing some kind of artificial slowdown. |
but what is the data? Is it data you are pulling out of an SQL database? or csvs? is it from one file or many files? I ask because it's much easier to analyze a problem if I can visualize it. It sounds like this is really just a bulk loading problem - I know these can be difficult, and I don't think we have a general purpose solution yet - but I know that @maxogden knows the most about this and also @brycebaril |
data is coming from excel files(xls). Each row in a file contains some time stamped data. There are about 40k excel files with maybe 10 or so rows each. Im reading them in one file at a time and creating a stream of rows which eventually each get inserted as a unique key based on the timestamp. fortunately I only need to do this once, and i can restart the server and continue where I left off for the most part, so I can get this done, its just something I have been trying to work out. What exactly is the bulk loading problem? Are the streams that leveldb creates not full streams? Perhaps this is due to also my use of level-sec which is creating secondary indices that are also getting pushed into the database at the same time? |
Oh, now knowing that changes everything! What excel parser are you using? I was looking into this recently and ended just shelling out to https://www.npmjs.org/package/j but was to be honest, pretty horrified to read the code. I wouldn't be surprised if there was a memory leak in the xls stuff, does it still crash if you load a plain file? you have a lot of moving parts here, can you make a simple script that reproduces the problem - what sort of crash is this? can you post the error message? But to be honest, since you only need to do this once - the simplest would just to have a way that it can recover from the last successful write before the crash, if the timestamps are in order then you could just look in the db for the last timestamp, and then open the next file. |
The only excel parser i found that works on the files I have is pyspreadsheet. It has its own issues, but i've managed to work around them. multilevel is running as a separate application from everything else, and its this that is crashing. I have gotten a couple different error messages when it crashes:
and this
which might be related to this: It makes me think that the the write stream on leveldb is not pausing the incoming stream and eventually overflowing. But i dont really know the details of how it all works, I am just speculating. I might try to make a test case to duplicate this problem. |
oh, those are both do you have the latest multilevel? it looks like one of those errors comes from json-buffer. npm ls |
@daywiss could you test against j directly or against http://oss.sheetjs.com/js-xls/ or http://oss.sheetjs.com/js-xlsx/ ? If there is a file that can't be read, I'd definitely like to investigate further (especially if it is a memory leak) |
@dominictarr I have not been able to make test case to reproduce the problem sadly, though I am still experiencing this range error crash in my application. multilevel version 6.0.0, json-buffer 2.0.9 @SheetJSDev When I try to drop my xls file onto your app nothing happens, when i paste it as base64 nothing happens. Here is the base-64
|
@daywiss Thanks for sharing! This is a BIFF 2.0 file (the format of Excel 2.0 from 1987). Out of curiosity, how did you generate this file? I ask because the latest versions of Excel don't even allow you to save in the format. Just opening and trying to save the file triggers an error: |
@SheetJSDev Its generated by software that is used to monitor communications from Programmable Logic Controllers (PLC). Unfortunately I don't have a solution which can replace this software and therefore have to deal with this excel format. Just FYI it will open in the LibreOffice Calc program(excel clone). |
i have several sublevels which i need to write to with a stream. I was hoping not to have to create a write stream for each sublevel, but a writestream on the root database does not get the data into the sublevel when using the prefix option like this:
The reason I need to do this is because i need to respect back pressure from leveldb as i might be overflowing the buffer when doing a lot of inserts and causing the db to crash.
The text was updated successfully, but these errors were encountered: