-
-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3.OperationalError: disk I/O error - coverage/sqldata.py, line 1024 #915
Comments
Happens in Ubuntu docker image with circleci too but slight different error message is thrown.. Couldn't use data file '/home/circleci/project/test_results/.coverage': database disk image is malformed |
Broke our CI as well with the same issue. We pinned converage to |
This does sound serious. I tried to reproduce it with the Apprise API repo, but running the tests locally doesn't show the problem. Does anyone have specific instructions for reproducing the error? |
@geekvikas Can you give me details of your failure? |
@joaonc Can you provide details of your failure? |
Hope this is not adding noise, but perhaps is related? I see an exception in the same file in python2.7 running coverage with the --with-coverage option from nosetest. This exception does not occur in
|
@feu-de-bois That sounds like a different error. If you could open a new issue, with specific instructions about how to reproduce it, that would be a big help. |
@nedbat The one thing i did not include in my original description (since you're not able to reproduce right now), is that it seems to happen with:
It's part of the xenial docker containers (available on Ubuntu's officially hosted docker page here). This may or may not help you. I'll add this information to my original description so it's all in one place. |
@caronc Thanks, I appreciate the extra context. I will try it out. I should warn you though: I may need more explicit instructions, like "docker run ..." then "sudo apt-get install ....", etc. |
@nedbat here's some more context:
Hope that helps. |
@joaonc Sorry, I should have been clearer. Can you give me detailed specific instructions of the exact commands I can run that will clone your code and run your test suite to reproduce the failure? |
@feu-de-bois Your error is described in #916, and I'm working on a fix for it. |
I am almost certain the error we are observing is the same issue. With
Issue does not exist with Tried to come up with a minimum reproducible setup but couldn't come up with something I could share here that is simple.. |
workaround for nedbat/coveragepy#915
@shashanksingh28 You don't have to make a minimal reproducer. I will run your entire test suite on your product if you can share it with me. |
@nedbat Have a look at the Travis CI ticket i had open. One of the community/dev reps there was kind enough to give you exact instructions you hinted towards. I'll paste them here, but I certainly don't want to take credit for his awesome effort: cid=$(docker run -dti --privileged=true --entrypoint=/sbin/init -v /sys/fs/cgroup:/sys/fs/cgroup:ro travisci/ci-sardonyx:packer-1542104228-d128723)
docker exec -it $cid /bin/bash
su - travis
git clone --depth=50 --branch=python38-ci-testing https://github.com/caronc/apprise-api.git caronc/apprise-api
cd caronc/apprise-api/
export TOXENV=py36
source ~/virtualenv/python3.6/bin/activate
python --version
pip --version
pip install codecov
pip install -r dev-requirements.txt
pip install -r requirements.txt
tox Kudos to BanzaiMan for this information. 👍 |
@caronc Thanks, this reproduces the error. The bad news is I still don't have an idea why it's happening, but this gives me something to work with. |
Perhaps it's something to do with |
@caronc No, even with simple unittest run I also get the same error.
with older version things works fine |
I have been trying to understand what is going on, and I don't get it. Using @caronc's example. It's not something between 5.0.1 and 5.0.2. It reproduces on every version that had SQLite (5.0a2 and beyond). If I remove pytest-cov, then the behavior stops. But I cannot see what is happening that makes the error. If others have reproducible scenarios, send them my way. |
@nedbat that is still really good detective work. I've updated my Travis CI ticket to see if it's something they can update on their end. I wish i did this sooner, but I was able to push a build forcing v5.0.1 which proved your observation (as it still failed with the same error). What version of SQLite are you testing with right now and passing with? At the same time, since it's an Alpha version of SQLite, maybe it would be worth opening a ticket with them before it goes final an a lot more people show up here? Are you saying that 5.0a1 works fine? |
@caronc I'm not sure what you mean by an alpha version of SQLite? We're all using whatever version is built with Python, no? On the docker image, it's Python 3.6.7 with SQLite 3.11.0. On my Mac, it's Python 3.6.0 with SQLite 3.19.3 And just to be clear, I can still easily reproduce the I/O error using the docker image, I just don't see what's causing it. I'm still working on it.... |
I understand; this line threw me off:
|
I see! I meant versions of coverage.py that used SQLite. |
Wait, it opened the file again before the successful read. Looks like the file was somehow closed even though the sqlite database connection was still open. |
@gfv Thanks. Do you mind sharing the code change you mean? |
@nedbat I figured out that the effect was mostly due to my forgotten debug statement within "close". This points strongly at a race condition somewhere close by (pun intended), but I can't yet see anything conclusive. (I've deleted my earlier comment, sorry) |
Disabling the Python GC (with |
My blog post was posted to Hacker News, and people there have been digging: https://news.ycombinator.com/item?id=22028581 It seems that a NamedTemporaryFile in apprise-api is being closed, and SQLite thinks it's its file, and tries to use the closed file descriptor. Not sure why that is... |
The bug is this code (also present in some of the other tests), which patches tempfile._TemporaryFileWrapper to inject OSErrors: This causes this line to raise an OSError: https://github.com/python/cpython/blob/3.6/Lib/tempfile.py#L556 triggering the _os.close in the exception handler. However, the FileIO object created by _io.open continues to reference the file descriptor. The os reuses the closed fd for opening the sqlite db file. When the FileIO is finalized by the CPython GC some time later, it closes that fd from under sqlite. That leads to the EBADF errors described above. |
I found that too. Here's my understanding:
So this is a subtle bug in NamedTemporaryFile. |
So is the issue related to my unit tests (in apprise-api)? I'm trying to effectively simulate a bad disk write/read in several tests in order to achieve 100% code coverage in addition to just being able to handle these edge cases in general. Should I be doing it another way? |
Mocking NamedTemporaryFile instead of _TemporaryFileWrapper would work around it. This is probably something that should be fixed in Python. |
I'm not especially familiar with Python best practices, but patching tempfile.NamedTemporaryFile instead of the internal implementation class is probably a better idea. I wouldn't consider this a Python bug, as the code is making what I consider to be the reasonable assumption that making a _TemporaryFileWrapper doesn't throw. It would probably also be correct for NamedTemporaryFile to call file.close() instead of os.close if the FileIO object was successfully created. Edit: The Python fix is still probably worth doing, as exceptions like KeyboardInterrupt can come at any time. |
Wow, thanks everyone, here and on Hacker News. I've summed up the answer in Bug #915: solved! @caronc Thanks for persisting! I'll close this as not a coverage issue. The mock should be on NamedTemporaryFile, not on _TemporaryFileWrapper, since the first is the public interface, and is what you actually call in your product code. But I'm curious: the mock is called "mock_ntf" as if it had once been for NamedTemporaryFile. Why is that? |
@nedbat: at the time I was having a lot of trouble using What's weird is all of this worked fine before (tests passed a few weeks ago) and even today, they all pass on other versions of Python. Either way, I'll re-investigate this and play with the outer call of I'm curious though: If there ever was an actual
No, Thank YOU so much and all of the amazing people you conjured from Hackernews! It was exciting to watch all of your brains in motion while mine sputtered and spewed black smoke trying to grasp how you all narrowed it down! 🙂 |
@caronc I do believe this is a genuine bug in Python, and I hope the core devs take bpo 39318 seriously and fix it, ideally with backports. I understand about the difficulty of getting mocks to work. This article might help: Why your mock doesn't work. |
The code was written with the assumption that any failure would happen in _io.open instead of _TemporaryFileWrapper. The _TemporaryFileWrapper constructor doesn't actually do anything other than construct two objects, so it's very rare for that to fail. As for why this worked on other versions of Python and other environments, that's because the problem only shows up when the GC happens to run after the same FD has been reused. If the GC runs at even a very slightly different time (such as between database opens instead of while the database is open) it won't fail in the same way. |
I found this same type of error was happening while using
The root cause in my particular case was due to running tests inside a docker container as a user which did not have file ownership of the python app's source code directory. Changing ownership to this user before running the tests worked! Original Error was:
After creating a blank
|
@trinitronx The underlying error is different in your case, with a different cause. I don't know if there is something coverage.py can do to make that situation better, but we should discuss it in a new issue. |
If you run into this or a similar SQLite / coverage issue, make sure that you aren't running |
FWIW, I have a simple test case. I'm running and pip reports and pytest reports |
@ruck94301 The combine step thinks that your Others here: did you have any files that would fit this pattern? |
@nedbat, your expectation about |
A bit of exception rewrapping in sqldata's _connect to show the actual filename on failure would help a lot. @ruck94301 you could patch that file locally if you have a reliable reproducer? |
@ruck94301 Thanks, that explains perfectly why the problem happens with the swp file: |
Describe the bug
![Screenshot from 2020-01-06 16-05-07](https://user-images.githubusercontent.com/850374/71848840-94584400-309e-11ea-8595-d087bd67ae6a.png)
Since coverage
v5.0.2
, I throw an exception during Python 3.6 testing (Python 3.4, 3.5, 3.7, and 3.8 work fine - source). Below is a screenshot of the exception thrown (source)This problem does not occur with
v5.0.1
(source).Originally I thought this was a Travis CI issue (source), but it appears specific to the last release of
coverage
instead.To Reproduce
How can we reproduce the problem? Please be specific.
coverage
(pulled from PyPi/pip)Expected behavior
Not to get an exception (or maybe it just needs to be gracefully caught since it's on an
__exit__
call anyway)?Additional context
If i re-run the Travis-CI runner on a previous Python 3.6 test runner that has passed (in the past), it fails on Python 3.6 (more details and screenshots explained here).
The text was updated successfully, but these errors were encountered: