Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A strange SQLite internal state corruption issue? #374

Closed
eliask opened this issue Jul 19, 2019 · 5 comments · Fixed by #376
Closed

A strange SQLite internal state corruption issue? #374

eliask opened this issue Jul 19, 2019 · 5 comments · Fixed by #376
Labels

Comments

@eliask
Copy link

eliask commented Jul 19, 2019

I was doing a mass conversion of coordinates from various CRSs and stumbled on a strange bug. It seems fairly low-level, and even seems to affect some global Python SQLite state (interfering with later IPython functions for instance).

The included file "wkts.txt" is over 90k lines. I haven't yet been able to make a simpler test case that reproduces this issue.

wkts.txt

Running the code, I get the following output (in an ipython session):

 97%|█████████████████████████████████▊ | 88243/91210 [00:26<00:00, 6190.94it/s]
CRSs instantiated: 507
CRSs instantiated (cache hits included): 88603
Transformers instantiated: 502
Transformers instantiated (cache hits included): 88389
---------------------------------------------------------------------------
ProjError                                 Traceback (most recent call last)
... <snip> ...
~/.local/share/virtualenvs/bug-Ew6sNC7W/lib/python3.7/site-packages/pyproj/transformer.py in from_proj(proj_from, proj_to, skip_equivalent, always_xy)

pyproj/_transformer.pyx in pyproj._transformer._Transformer.from_crs()

ProjError: Error creating CRS to CRS.: (Internal Proj Error: proj_create: no dat
abase context specified)

In [2]:
Do you really want to exit ([y]/n)?

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/elias/.local/share/virtualenvs/bug-Ew6sNC7W/lib/python3.7/site-packages/IPython/core/history.py", line 578, in end_session
sqlite3.OperationalError: unable to open database file

Code Sample, a copy-pastable example if possible

import pyproj, tqdm
USE_CACHE = True # without USE_CACHE, this is painfully slow.
wkt_cache = {}; transformer_cache = {}
# Keep track of the number of invocations:
transforms = transforms_with_cache = 0
crss_created = crss_created_with_cache = 0

def get_crs(code):
    global crss_created
    if code in wkt_cache: return wkt_cache[code]
    try:    crs = pyproj.CRS.from_authority('esri', code)
    except: crs = pyproj.CRS.from_epsg(code)
    if USE_CACHE: wkt_cache[code] = crs
    crss_created += 1
    return crs

# lines = [next(open('wkts.txt', 'rt'))] * 200_000 # This does not trigger the bug
lines = open('wkts.txt', 'rt').readlines()
proj_wgs84 = pyproj.Proj("+init=epsg:4326")

def main(lines):
    global crss_created, crss_created_with_cache, transforms_with_cache, transforms
    for line in tqdm.tqdm(lines):
        try:
            key = wkid = int(line.strip())
            crs = get_crs(wkid)
        except ValueError:
            key = wkt = line.strip()
            if wkt in wkt_cache:
                crs = wkt_cache[wkt]
            else:
                crs = wkt_cache[wkt] = pyproj.CRS.from_wkt(wkt)
                crss_created += 1

        crss_created_with_cache += 1
        try:
            if USE_CACHE and key in transformer_cache:
                t = transformer_cache[key]
            else:
                t = transformer_cache[key] = pyproj.Transformer.from_proj(crs, proj_wgs84)
                transforms += 1
            transforms_with_cache += 1
        except Exception as ex:
            if 'Input is not a transformation' not in str(ex): raise

try:
    main(lines)
finally:
    print('CRSs instantiated:', crss_created)
    print('CRSs instantiated (cache hits included):', crss_created_with_cache)
    print('Transformers instantiated:', transforms)
    print('Transformers instantiated (cache hits included):', transforms_with_cache)

Problem description

It would be ideal if the low level program state did not randomly get corrupted :-)

Expected Output

There should be no Internal pyproj errors.

Environment Information

I tried this with pipenv and conda/conda-forge versions of pyproj 2.2.1 on Ubuntu 18.04 (on two different machines as it happens).

pipenv:

System:
    python: 3.7.3 (default, Apr  3 2019, 19:16:38)  [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
executable: /home/elias/.local/share/virtualenvs/bug-Ew6sNC7W/bin/python
   machine: Linux-4.15.0-54-generic-x86_64-with-Ubuntu-18.04-bionic

PROJ:
      PROJ: 6.1.0
  data dir: /home/elias/.local/share/virtualenvs/bug-Ew6sNC7W/lib/python3.7/site-packages/pyproj/proj_dir/share/proj

Python deps:
    pyproj: 2.2.1
       pip: 19.1.1
setuptools: 41.0.1
    Cython: None
     aenum: None```

conda:

System:
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0]
executable: /home/elias/miniconda3/envs/bug/bin/python
machine: Linux-4.15.0-54-generic-x86_64-with-debian-buster-sid

PROJ:
PROJ: 6.1.0
data dir: /home/elias/miniconda3/envs/bug/share/proj

Python deps:
pyproj: 2.2.1
pip: 19.1.1
setuptools: 41.0.1
Cython: None
aenum: None


#### Installation method
 - conda, pip wheel, from source, etc...

#### Conda environment information (if you installed with conda):

<br/>
NB: conda environment was created with: conda install -c conda-forge 'pyproj>2.2' numpy
Environment (<code>conda list</code>):
<details>

$ conda list | grep -E "proj|aenum"
proj4 6.1.0 he751ad9_2 conda-forge
pyproj 2.2.1 py37hc44880f_0 conda-forge

</details>

<br/>
Details about  <code>conda</code> and system ( <code>conda info</code> ):
<details>

$ conda info
active environment : bug
active env location : /home/elias/miniconda3/envs/bug
shell level : 1
user config file : /home/elias/.condarc
populated config files : /home/elias/.condarc
conda version : 4.7.5
conda-build version : not installed
python version : 3.6.8.final.0
virtual packages :
base environment : /home/elias/miniconda3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /home/elias/miniconda3/pkgs
/home/elias/.conda/pkgs
envs directories : /home/elias/miniconda3/envs
/home/elias/.conda/envs
platform : linux-64
user-agent : conda/4.7.5 requests/2.21.0 CPython/3.6.8 Linux/4.15.0-54-generic ubuntu/18.04.2 glibc/2.27
UID:GID : 1000:1000
netrc file : None
offline mode : False

</details>

@eliask eliask added the bug label Jul 19, 2019
@snowman2
Copy link
Member

Thanks for the report and the reproducible example. I think the issue has to do with the soft limit of 1024 files open in Linux. I have an idea for a better way to handle this in pyproj and will work on it when I get a free moment.

@jorisvandenbossche
Copy link
Contributor

@snowman2 an alternative option that might be worth exploring is to not hold a PJ* projobj (and thus PJ_CONTEXT *projctx) alive on the Python classes. That would solve this issue, and would also give more options for releasing the GIL (#386). The obvious downside of course is that you need to recreate the PROJ object each time when doing an operation (that is not cached), but it might be worth to do a time check if this would actually give a significant overhead.

(eg for a typical geopandas usage of transforming the geometries of a GeoDataFrame, the actual Transformer.transform call will be much more than creating the CRS/Transformer object. But of course, this is not the only typical usage of pyproj)

@snowman2
Copy link
Member

@jorisvandenbossche, thanks for the thoughts there. However, the reasoning behind creating the transformer class was because re-creating the PJ* used in the transformation every time caused it to be very slow (#187). I am thinking the best solution will be able to present itself once this PR is merged in: OSGeo/PROJ#1566

@jorisvandenbossche
Copy link
Contributor

However, the reasoning behind creating the transformer class was because re-creating the PJ* used in the transformation every time caused it to be very slow (#187)

Yes, but that is a case where you typically want to repeat something many times (transforming many points given a single crs->crs transformation). And the performance of the Transformer class would not be much impacted by needing to recreate the to/from CRS proj obj first.
But the methods and attributes on a single CRS instance (like is_projected, source_crs, ..) are not typically things you check 1000 times in a row (and they are cached anyway). So for those attributes I wouldn't care about it taking 5µs instead of 1µs (of course, there will be other people with other use cases).

I am thinking the best solution will be able to present itself once this PR is merged in: OSGeo/PROJ#1566

I didn't read the full issue there, but that would allow to go back from a single global PROJ_CONTEXT to a context per PJ object?

@snowman2
Copy link
Member

I didn't read the full issue there, but that would allow to go back from a single global PROJ_CONTEXT to a context per PJ object?

I believe so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants