Use of sqlite database on Lustre/NFS filesystems #39

IanSudbery · 2015-06-09T11:37:49Z

I recently started trying to use the pipelines in earnest on our HPC here, and have immediately run into a pretty serious problem: I would appear that one cannot use sqlite databases on a lustre filesystem (which is the file system we use here for our high performance storage). I've not yet fully determined whether this problem is caused by any access or only attempts at concurrent access, but all my pipelines invariably fail with the following error:

Traceback (most recent call last):
  File "/home/mb1ims/devel/cgat/scripts/csv2db.py", line 72, in <module>
    sys.exit(main())
  File "/home/mb1ims/devel/cgat/scripts/csv2db.py", line 67, in main
    CSV2DB.run(infile, options)
  File "/home/mb1ims/devel/cgat/CGAT/CSV2DB.py", line 328, in run
    cc = executewait(dbhandle, statement, error, options.retry)
  File "/home/mb1ims/devel/cgat/CGAT/CSV2DB.py", line 92, in executewait
    raise e
sqlite3.OperationalError: disk I/O error

My reading around the net suggests that sqlite won't really work with NFS filesystems either; although I've definately managed to get pipeline_annotations to run on an NFS location at least once (although it didn't neccessarily run all the way through without error). This seems to rule out the file systems that most people would want to use for this sort of thing.

Are there alternatives to to sqlite? I saw talk of making things work with MySQL, but I don't know how far along that is? (Plus I'd have to find a machine that could run a MySql server).

Ideas anyone?

AndreasHeger · 2015-06-09T12:42:38Z

No ideas yet, but will look into this. In principle database access is abstracted. csv2db uploads data for mysql, postgres and sqlite, while CGATReport uses sqlalchemy to connect to mysql, sqlite and postgres. However, there are many instances of within pipeline database access that are sqlite specific and will need refactoring.

Can you try setting:

[general]
jobs_limit_db=1

I expect that the problem is due to concurrent write/write or read/write access. Fortunately, our upload tasks are usually distinct from tasks that read the database. So let us try in that order:

Turn off concurrent write access (see above). Note that not all pipelines might be making full use of the jobs_limit_db parameterization.
Make sure our pipelines are sqlite/mysql/postgres agnostic. This might mean to move away from SQL statements but instead use ORM such as sqlalchemy or db.py. Or make sure we only use ANSI SQL statements.
Look into alternative data storages such as MongoDB or others.

IanSudbery · 2015-06-09T14:33:00Z

Hi Andreas,

So setting the jobs_limit doesn't help, at least on the Lustre system. Although I find it difficult to confirm that the jobs_limit is doing what it says it is, running a csv2db.py command on the commandline gives the same error. However, the command will run without error on an NFS location.

I read somewhere that sqlite confirms its writes by querying something about the low-level disk state, which can't be done on distributed file systems, I don't know how IFS gets around this.

One solution might be to store the database file on a seperate NFS filespace. This would still require restricting db jobs to 1 as concurrent NFS access attempts are unsafe in sqlite, but it might at least run. However, Pipeline.load at the moment assumes that you want to write to ./csvdb I'm going to try to modify this and give it a go. Not a longer solution at the moment because I only get 50GB NFS storage (unlimited lustre), but the alternative solution would not only require re-factoring the pipelines, but also buying/renting a machine to run as a db server.

AndreasHeger · 2015-06-09T14:39:48Z

Thanks, I see. Are there any databases installed on your systems? What do people use for data storage other than the file system?

sebastian-luna-valero · 2015-06-09T16:26:32Z

Is there any additional error/output log file from sqlite that you can look at? (apart from to Python's traceback)

That would help to narrow down the problem.

Other interesting reading:
https://www.sqlite.org/faq.html#q5
https://www.sqlite.org/faq.html#q6
https://www.sqlite.org/lockingv3.html
https://www.sqlite.org/wal.html

NB: We use ifs through NFSv3.

AndreasHeger · 2015-06-12T09:41:53Z

It seems to be something that ruffus struggles with as well:

https://code.google.com/p/ruffus/issues/detail?id=59
http://www.ruffus.org.uk/tutorials/new_tutorial/checkpointing.html

From reading around, it seems that in principle in should not be a problem in lustre. The locking mechanisms that sqlite requires are there, but might be not behaving in the way that sqlite expects.

AndreasHeger · 2015-06-12T09:43:45Z

There is this thread:
http://comments.gmane.org/gmane.comp.file-systems.lustre.user/5724
It says:

you definitely need to mount all clients with "-o flock" so the locking is coherent across all clients.

Not sure if that is an option.

IanSudbery · 2015-06-12T17:14:41Z

I can ask the HPC people here...

IanSudbery · 2015-06-19T08:28:32Z

Running the databases in a seperate localtion seems to be working so far. Don't know how well it will work if the databases start getting too large to live on my 100Gb NFS share...

IanSudbery · 2015-06-19T08:29:27Z

I asked about mounting with -o flock: apparently if slows the entire system down by a large amount.

AndreasHeger · 2015-07-01T13:18:21Z

Thanks. I have implemented the fundamentals for other database access - mysql should work, but the code will still need to develop.

IanSudbery · 2015-07-02T17:11:02Z

Hi Andreas,

At the moment things seem to be doing okay with sperating the database off in a serpate location. Not ideal, but working for the moment. May move to mysql in the future, but currently I don't have any machines I can run one on .

AndreasHeger · 2015-07-03T12:25:07Z

Ok, thanks - I will close this issue for now to be reopened.

IanSudbery added enhancement help wanted question labels Jun 9, 2015

AndreasHeger closed this as completed Jul 3, 2015

brantfaircloth mentioned this issue Feb 22, 2018

Issue creating SQlite tables faircloth-lab/phyluce#95

Closed

jonn-smith mentioned this issue Jul 22, 2020

Funcotator: java.lang.IllegalArgumentException: Unexpected value: lncRNA broadinstitute/gatk#6708

Closed

jonn-smith mentioned this issue Aug 10, 2020

"Extract our data sources" for Funcotator in MuTect2 wdl broadinstitute/gatk#6731

Open

ngalaiko mentioned this issue May 15, 2021

Google Cloud Run: cannot apply wal: disk I/O error: invalid argument benbjohnson/litestream#183

Closed

jonn-smith mentioned this issue Sep 20, 2021

GATK Funcotator [SQLITE_IOERR_LOCK] I/O error in the advisory file locking logic (disk I/O error) broadinstitute/gatk#7474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of sqlite database on Lustre/NFS filesystems #39

Use of sqlite database on Lustre/NFS filesystems #39

IanSudbery commented Jun 9, 2015

AndreasHeger commented Jun 9, 2015

IanSudbery commented Jun 9, 2015

AndreasHeger commented Jun 9, 2015

sebastian-luna-valero commented Jun 9, 2015

AndreasHeger commented Jun 12, 2015

AndreasHeger commented Jun 12, 2015

IanSudbery commented Jun 12, 2015

IanSudbery commented Jun 19, 2015

IanSudbery commented Jun 19, 2015

AndreasHeger commented Jul 1, 2015

IanSudbery commented Jul 2, 2015

AndreasHeger commented Jul 3, 2015

Use of sqlite database on Lustre/NFS filesystems #39

Use of sqlite database on Lustre/NFS filesystems #39

Comments

IanSudbery commented Jun 9, 2015

AndreasHeger commented Jun 9, 2015

IanSudbery commented Jun 9, 2015

AndreasHeger commented Jun 9, 2015

sebastian-luna-valero commented Jun 9, 2015

AndreasHeger commented Jun 12, 2015

AndreasHeger commented Jun 12, 2015

IanSudbery commented Jun 12, 2015

IanSudbery commented Jun 19, 2015

IanSudbery commented Jun 19, 2015

AndreasHeger commented Jul 1, 2015

IanSudbery commented Jul 2, 2015

AndreasHeger commented Jul 3, 2015