Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

folium crashes Jupyter Notebook when using large geojson file #787

Closed
krinsman opened this issue Dec 5, 2017 · 6 comments
Closed

folium crashes Jupyter Notebook when using large geojson file #787

krinsman opened this issue Dec 5, 2017 · 6 comments

Comments

@krinsman
Copy link

krinsman commented Dec 5, 2017

Note: This issue is mostly copy/pasted from the below link:

https://stackoverflow.com/questions/47660163/folium-crashes-jupyter-notebook-when-using-large-geojson-file


Versions (according to my conda environment YAML):

python=3.6
folium=0.5.0
branca=0.2.0
numpy=1.13.3
pandas=0.21.0
geopandas=0.3.0

(Although I don't think I even use all of the above packages)


I am trying to follow the example of adding a geo_json overlay found here. The geo_json file I am using is this map of the zip codes in Germany, found here. It is large at 85.8 mb.

Here is my MWE; I downloaded the 'postleitzahlen.geojson' file to (pwd being my "present working directory") 'pwd/data/postleitzahlen.geojson'. Note that the same problem happens even when I change the extension from the non-standard .geojson to .json.

import folium
# Not sure if I need all of the packages below:
import json
import os

# Center map in middle of Berlin, zoom out enough so all of Germany is visible,
# and use mapbox bright instead of default OpenStreetMap so as to hopefully make
# images easier to render. But it still crashes the notebook anyway.
m=folium.Map(location=[52.5194, 13.4067], tiles='Mapbox Bright', zoom_start=5)

# This is the analog of the example on the Folium website, but I don't really understand it.
# Wouldn't we need to load the file into memory somehow, maybe using geopandas or something?
zipcode_regions = os.path.join('data', 'postleitzahlen.geojson')

# Add the geoJSON layer to map ostensibly:
folium.GeoJson(zipcode_regions, name='geo_json').add_to(m)

# Still crashes regardless of whether I include following line - I think it just adds control in top-right of map on example website, which I don't need.
folium.LayerControl().add_to(m)

m

In Jupyter Notebook (on my computer at least) what happens is that for the output of the cell containing the last line, one just gets a large empty white space. Moreover, at the top it says 'Autosave Failed!'. And when one tries to click the save button, the notebook freezes momentarily, and then nothing happens. (I.e. there's no indication that the file was saved.)

I can still run new cells (e.g. performing basic arithmetic), but nothing saves. (EDIT: Nope, doing this too often causes the notebook to crash outright in Chrome.)

This might be a bug in either Jupyter, IPython, or Folium, in which case asking here might not help much. But I figured that I would try asking here at least.

Looking at the documentation for this function (scroll down a lot), should I try either (1) setting the overlay parameter to True, since the default is False? (2) setting the smooth_factor to a float greater than the default of 1.0? (I will try both and update this post with any results.)

I read these questions, but did not understand how to use their answers to solve my problem. If someone can explain how to apply those answers here, I would greatly appreciate it. (1)(2)(3)(4)(5)

EDIT: I tried doing what this person did, namely increasing the data limit (specifically I ran jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10 to launch Jupyter Notebook) but the same error occurred as before.
EDIT (2): Still occurs setting overlay=True and both using and not using data limit increase.
EDIT (3): Setting smooth_factor to 10, 100, or 1000 didn't fix it, although it did make the penultimate cell run faster. So this seems more likely a problem with Jupyter than folium.
The terminal output each time for Jupyter Notebook contains multiple errors of the form:

Saving file at /map.ipynb
[I 12:00:00.000 NotebookApp] Malformed HTTP message from ::1: Content-Length too long

Watching the terminal more closely as the notebook runs, it is clear that this error occurs exactly as the notebook tries to load the map. So next I will try the solutions proposed here.

EDIT (4): Still doesn't work trying the jupyter notebook --NotebookApp.tornado_settings="{'max_body_size': 104857600, 'max_buffer_size': 104857600}" suggested here; also doesn't work when adding three additional zeroes to max_body_size and max_buffer_size, nor when adding six additional zeroes to both. (I.e. approx. 0.5 petabytes, a million times the defaults.)

Since the GeoJSON loads on GitHub (albeit slowly), it seems that it is possible for the GeoJSON to be loaded with a map. And 0.5 petabytes is probably not a reasonable limit for HTTP requests generated by folium to exceed, so hence why I am posting this as an issue here, although it might (I don't know) be an issue with Jupyter Notebook instead. It could also conceivably be an issue with leaflet too.

EDIT: Full code (but not minimal):

import pandas as pd
import numpy as np
import folium
import branca
import geopandas as gpd

import json
import shelve
import os

m = folium.Map(location=[52.5194, 13.4067], tiles='Mapbox Bright', zoom_start=5)
# Test that folium works without any geoJSON layer -- it does, and very quickly too.
m
zipcode_regions = os.path.join('data', 'postleitzahlen.json')
folium.GeoJson(zipcode_regions, name='geo_json', overlay=True, smooth_factor=1000).add_to(m)
# The cell above this isn't immediate, but fairly quick. When trying to run the following cell, the notebook crashes:
m
@Conengmo
Copy link
Member

Conengmo commented Dec 6, 2017

I tried to replicate your problem. I stored the geojson locally and entered the path to GeoJson (zipcode_regions = {path_to_file}). You don't need to load the file because Folium detects whether its an URL, a path, or data and then loads the data in the right form.

It took a while but the map showed up. It's really slow though and it's consuming over 2 GB of RAM. In the ipython output I notice the same errors as you: Malformed HTTP message from ::1: Content-Length too long.

When I create the map in regular Python and open the html file, the map shows but it's again sluggish. It now consumes 1 GB of RAM.

So I think your problem is with the size of your dataset. Leaflet doesn't seem to be able to efficiently handle it. Opening it in Jupyter seems to worsen the problem. I don't think the problem is with Folium. The html it creates is efficient I think.

I can think of two approaches: reduce the size of the dataset before putting it in Folium. And/or create the map with Folium in regular python, open the html file, and accept a very sluggish map.

@ocefpaf
Copy link
Member

ocefpaf commented Dec 6, 2017

@Conengmo nailed it. There is not much we can do there.

I did research https://github.com/mapbox/geojson-vt in the past but never got to actually try it.
I believe it can help you with large datasets.

Closing b/c this is not a folium issue we can act on.

@ocefpaf ocefpaf closed this as completed Dec 6, 2017
@krinsman
Copy link
Author

krinsman commented Dec 6, 2017

That makes sense. Do you think it would make sense to open this is as a performance issue with the Leaflet team, since it's not really a bug in the strict sense?

Anyway the recommendation looks very good, assuming the demo on their page is accurate, since an American postal code geo JSON would definitely be a much bigger performance challenge than even a German one. I appreciate the recommendation since I had not heard of it before and it looks like it could fix my problem exactly.

@ocefpaf
Copy link
Member

ocefpaf commented Dec 6, 2017

Do you think it would make sense to open this is as a performance issue with the Leaflet team, since it's not really a bug in the strict sense?

I would guess that such issue already exists there but it is worth taking a look to see what they recommend.

@krinsman
Copy link
Author

krinsman commented Dec 6, 2017

Also stupid question; how does Folium access/call leaflet, since it's a Javascript (i.e. not Python) library? Does Folium have Leaflet as a dependency? And since the rendering mechanism is different between raw Python and the IPython kernel/Jupyter Notebook, if geojson-vt were to be installed automatically, would that be a feature request to Leaflet, Jupyter Notebook, or someone else?

Leaflet is supposed to be a small library, so I doubt that would make a feature request over there. I also don't understand at all how Jupyter Notebook is even able to use/access any JavaScript features in the first place. (Does it do so through the user's web browser? Does that mean loading the geojson-vt library by default would be a feature request for the web browser?)

@ocefpaf
Copy link
Member

ocefpaf commented Dec 6, 2017

Also stupid question;

Not stupid at all and, with the rise of ipyleaflet we should probably document that somewhere.

how does Folium access/call leaflet, since it's a Javascript (i.e. not Python) library? Does Folium have Leaflet as a dependency?

Not really. TL;DR folium is a hack that builds the HTML for you using the jinja templates and cdn for the JS part.

And since the rendering mechanism is different between raw Python and the IPython kernel/Jupyter Notebook, if geojson-vt were to be installed automatically, would that be a feature request to Leaflet, Jupyter Notebook, or someone else?

Using the folium approach one could use geojson-vt as a plugin (take a look at our plugins for more info).

Leaflet is supposed to be a small library, so I doubt that would make a feature request over there. I also don't understand at all how Jupyter Notebook is even able to use/access any JavaScript features in the first place. (Does it do so through the user's web browser? Does that mean loading the geojson-vt library by default would be a feature request for the web browser?)

There are two approaches, folium stand-alone HTML and whatever ipyleaflet does, which is more sophisticated (and I am unfamiliar with the details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants