-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame.to_json(orient='table') emits data:str instead of data:[dict,] after a number of requests under mod-wsgi #20728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Unfortunately this is tough to work with...do you have any way of isolating this from mod_wsgi? Otherwise how do you know its a pandas issue and not with that package? |
@WillAyd yeah, I understand :( I've been trying to create a reproducer without much luck :( As a comment, if I remove |
Maybe trying another serving option like gunicorn? Whether that works or not could clue in on any potential issue. I haven't used pandas extensively with any kind of deployed server before. You may also want to join the Gitter channel to see if anyone out there has expertise to offer |
Can you adding this to your apache config - would confirm that's the issue, if it fixes
http://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIApplicationGroup.html For reference ran into something similar in arrow |
Thank you @chris-b1 for the suggestion. I think I have implemented your suggestion and the watched pot has yet to boil for this issue. My apache config now looks like WSGIDaemonProcess iemwsgi_iws processes=15 threads=1 display-name=%{GROUP} maximum-requests=10000
<VirtualHost...>
<Directory...>
SetHandler wsgi-script
WSGIApplicationGroup %{GLOBAL}
WSGIProcessGroup iemwsgi_iws
</Directory>
</VirtualHost> will update after running this a few days without issue, I typically would see it happen once or twice per day. |
Well, so far so good with the |
Zero issues noted since the change to |
Cool, if you're feeling brave would welcome any debugging/investigation to see what the underlying issue is. Not entirely sure what the best way do that is, may be some helpful pointers here |
Closing as not reproducible outside of mod_wsgi |
Sadly, I don't have a SSCE for this, but the setup seems to reproduce the bug easily for me in production. I am currently using conda-forge current pandas (0.22.0) on python2.7 within a single threaded mod-wsgi daemon process. My general code is
This will work for some number of sequential requests underneath mod-wsgi. By work, I mean the emitted JSON object has a data attribute with an array of dict objects, one for each row.
After some number of requests though, the emitted JSON looks like so
restarting Apache/restarting mod-wsgi will return
to_json
to properly emitting the same data frame with the proper"data":[dict]
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: None.None
pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 5.6.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.2
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
I have been fighting mod-wsgi for many moons now with various libs like numpy, matplotlib and pandas, so I suspect perhaps this just isn't a good idea. If you have a suggestion of a good long-run web process to run pandas within, I would be grateful to know as well. Thank you for your time!
The text was updated successfully, but these errors were encountered: