Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of sqlite database on Lustre/NFS filesystems #39

Closed
IanSudbery opened this issue Jun 9, 2015 · 12 comments
Closed

Use of sqlite database on Lustre/NFS filesystems #39

IanSudbery opened this issue Jun 9, 2015 · 12 comments

Comments

@IanSudbery
Copy link
Member

I recently started trying to use the pipelines in earnest on our HPC here, and have immediately run into a pretty serious problem: I would appear that one cannot use sqlite databases on a lustre filesystem (which is the file system we use here for our high performance storage). I've not yet fully determined whether this problem is caused by any access or only attempts at concurrent access, but all my pipelines invariably fail with the following error:

Traceback (most recent call last):
  File "/home/mb1ims/devel/cgat/scripts/csv2db.py", line 72, in <module>
    sys.exit(main())
  File "/home/mb1ims/devel/cgat/scripts/csv2db.py", line 67, in main
    CSV2DB.run(infile, options)
  File "/home/mb1ims/devel/cgat/CGAT/CSV2DB.py", line 328, in run
    cc = executewait(dbhandle, statement, error, options.retry)
  File "/home/mb1ims/devel/cgat/CGAT/CSV2DB.py", line 92, in executewait
    raise e
sqlite3.OperationalError: disk I/O error 

My reading around the net suggests that sqlite won't really work with NFS filesystems either; although I've definately managed to get pipeline_annotations to run on an NFS location at least once (although it didn't neccessarily run all the way through without error). This seems to rule out the file systems that most people would want to use for this sort of thing.

Are there alternatives to to sqlite? I saw talk of making things work with MySQL, but I don't know how far along that is? (Plus I'd have to find a machine that could run a MySql server).

Ideas anyone?

@AndreasHeger
Copy link
Member

No ideas yet, but will look into this. In principle database access is abstracted. csv2db uploads data for mysql, postgres and sqlite, while CGATReport uses sqlalchemy to connect to mysql, sqlite and postgres. However, there are many instances of within pipeline database access that are sqlite specific and will need refactoring.

Can you try setting:

[general]
jobs_limit_db=1

I expect that the problem is due to concurrent write/write or read/write access. Fortunately, our upload tasks are usually distinct from tasks that read the database. So let us try in that order:

  1. Turn off concurrent write access (see above). Note that not all pipelines might be making full use of the jobs_limit_db parameterization.
  2. Make sure our pipelines are sqlite/mysql/postgres agnostic. This might mean to move away from SQL statements but instead use ORM such as sqlalchemy or db.py. Or make sure we only use ANSI SQL statements.
  3. Look into alternative data storages such as MongoDB or others.

@IanSudbery
Copy link
Member Author

Hi Andreas,

So setting the jobs_limit doesn't help, at least on the Lustre system. Although I find it difficult to confirm that the jobs_limit is doing what it says it is, running a csv2db.py command on the commandline gives the same error. However, the command will run without error on an NFS location.

I read somewhere that sqlite confirms its writes by querying something about the low-level disk state, which can't be done on distributed file systems, I don't know how IFS gets around this.

One solution might be to store the database file on a seperate NFS filespace. This would still require restricting db jobs to 1 as concurrent NFS access attempts are unsafe in sqlite, but it might at least run. However, Pipeline.load at the moment assumes that you want to write to ./csvdb I'm going to try to modify this and give it a go. Not a longer solution at the moment because I only get 50GB NFS storage (unlimited lustre), but the alternative solution would not only require re-factoring the pipelines, but also buying/renting a machine to run as a db server.

@AndreasHeger
Copy link
Member

Thanks, I see. Are there any databases installed on your systems? What do people use for data storage other than the file system?

@sebastian-luna-valero
Copy link
Member

Is there any additional error/output log file from sqlite that you can look at? (apart from to Python's traceback)

That would help to narrow down the problem.

Other interesting reading:
https://www.sqlite.org/faq.html#q5
https://www.sqlite.org/faq.html#q6
https://www.sqlite.org/lockingv3.html
https://www.sqlite.org/wal.html

NB: We use ifs through NFSv3.

@AndreasHeger
Copy link
Member

It seems to be something that ruffus struggles with as well:

https://code.google.com/p/ruffus/issues/detail?id=59
http://www.ruffus.org.uk/tutorials/new_tutorial/checkpointing.html

From reading around, it seems that in principle in should not be a problem in lustre. The locking mechanisms that sqlite requires are there, but might be not behaving in the way that sqlite expects.

@AndreasHeger
Copy link
Member

There is this thread:
http://comments.gmane.org/gmane.comp.file-systems.lustre.user/5724
It says:

you definitely need to mount all clients with "-o flock" so the locking is coherent across all clients.

Not sure if that is an option.

@IanSudbery
Copy link
Member Author

I can ask the HPC people here...

@IanSudbery
Copy link
Member Author

Running the databases in a seperate localtion seems to be working so far. Don't know how well it will work if the databases start getting too large to live on my 100Gb NFS share...

@IanSudbery
Copy link
Member Author

I asked about mounting with -o flock: apparently if slows the entire system down by a large amount.

@AndreasHeger
Copy link
Member

Thanks. I have implemented the fundamentals for other database access - mysql should work, but the code will still need to develop.

@IanSudbery
Copy link
Member Author

Hi Andreas,

At the moment things seem to be doing okay with sperating the database off in a serpate location. Not ideal, but working for the moment. May move to mysql in the future, but currently I don't have any machines I can run one on .

@AndreasHeger
Copy link
Member

Ok, thanks - I will close this issue for now to be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants