Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for parsing GeoJSON from a file #198

Open
nk9 opened this issue Feb 26, 2024 · 6 comments
Open

Documentation for parsing GeoJSON from a file #198

nk9 opened this issue Feb 26, 2024 · 6 comments

Comments

@nk9
Copy link
Contributor

nk9 commented Feb 26, 2024

I am trying to read GeoJSON from a file and iterate through the features in the FeatureCollection with their geom and feature properties. I would have thought this would be a very common use case, but I don't see any documentation giving example code for how to do this. All the examples seem to assume you've got the string of a single geometry in memory already, or that you're using FlatGeobuf (*.fgb) files.

I've found two examples on GitHub, but:

  • geoq: Doesn't use either GeoJsonReader or FeatureProcessor, which seems like it should be the right way…
  • geoarrow: Uses this GeoTableBuilder thing which I think is part of Arrow, and thus not something someone who isn't using Arrow would use.

Have I missed the sample code for this? I'd expect a simple example in the overview on the GeoJsonReader page, and probably in the main README as well.

@nk9 nk9 changed the title Documentation for parsing GeoJson from a file Documentation for parsing GeoJSON from a file Feb 26, 2024
@kylebarron
Copy link
Member

The GeoJsonReader accepts any input that implements Read. So you can pass in a File or, preferably, a BufReader<File>.

@nk9
Copy link
Contributor Author

nk9 commented Feb 27, 2024

Thanks for the quick reply. I found this issue and adapted his code to do what I needed, along with this bit of their docs. I'm thinking that the answer to my question is "geojson is the a higher-level library, and is probably what you want to use for simply iterating through features in a geojson file." Is that safe to say?

Update: Thanks for the note about BufReader. That is an order of magnitude faster than just passing a File directly!

Update 2: I've achieved another order of magnitude speed-up by using rayon and placing the entire load_geojson function inside par_iter(). Down to 2.2 seconds to read 280 files containing 175k features.

For posterity, here's what I ended up with:

use geo::{MultiLineString, MultiPolygon};
use geojson::{Feature, GeoJson, Value};
use std::fs::File;
use std::io::BufReader;

fn load_geojson(path: &PathBuf) -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);
    let geojson = GeoJson::from_reader(reader)?;

    match geojson {
        GeoJson::FeatureCollection(collection) => {
            for feature in collection.features {
                let _ = process_feature(&feature)?;
            }
        }
        _ => println!("Unsupported GeoJSON type"),
    }

    Ok(())
}

fn process_feature(feature: &Feature) -> Result<(), Box<dyn std::error::Error>> {
    // String value of the "name" property, or an empty string
    let name = feature
        .property("name")
        .and_then(|v| v.as_str())
        .map_or(String::from(""), |s| s.to_string());

    let geom = feature.geometry.as_ref().unwrap();

    match &geom.value {
        Value::MultiPolygon(_) => {
            let p: MultiPolygon<f64> = geom.value.clone().try_into().unwrap();
            println!("{p:?}");
        }
        Value::MultiLineString(_) => {
            let p: MultiLineString<f64> = geom.value.clone().try_into().unwrap();
            println!("{p:?}");
        }
        _ => panic!("not a recognized feature type"),
    };

    Ok(())
}

@kylebarron
Copy link
Member

I'm thinking that the answer to my question is "geojson is the a higher-level library, and is probably what you want to use for simply iterating through features in a geojson file." Is that safe to say?

I'd say the opposite. geojson is lower-level in the sense that you have to handle specifics about GeoJSON to handle input data. geozero is higher-level in the sense that GeoJSON input is just one type of input, but can export to any consumer. For example in geoarrow-rs GeoJsonReader::process just works even though geozero has no knowledge of the GeoArrow output format.

I'd say the real issue is that there's no "default" library in georust for handling geometries with attributes. You can parse to geo structs but then you lose the associated attributes. This is a main feature of geoarrow though; being really optimized about both the geometries and their attributes.

@nk9
Copy link
Contributor Author

nk9 commented Feb 29, 2024

OK, that's interesting to know. If this is the higher-level library, I think it's even more important to have some sample code of loading a GeoJSON file (ideally from disk, but at least from a string) and iterating its features. I never got that to work with geozero.

As for geometries and attributes, the geojson::Feature struct seems to handle that pretty well with feat.property("prop_key") and feat.geometry upon pulling them out of the file. But you're right, I had to create my own struct to store the converted geometry along with the properties I wanted. Providing a ready-made struct for that purpose would be a nice QoL improvement.

@kylebarron
Copy link
Member

kylebarron commented Feb 29, 2024

I never got that to work with geozero.

You'd need to impl your own GeozeroDatasource. It's a bit of work, which is why it's not done often; instead converting to existing representations instead of creating your own.

As for geometries and attributes, the geojson::Feature struct seems to handle that pretty well with feat.property("prop_key") and feat.geometry upon pulling them out of the file

Sure, but that's storing attributes in the GeoJSON model, which is quite restrictive. For example, you can't store a date time in GeoJSON; you can only store a string.

Providing a ready-made struct for that purpose would be a nice QoL improvement.

It's not geozero's concern to provide those structs. Geozero focuses only on conversions between representations. I'm building a representation around Arrow, which enables storing properties quite efficiently, but does incur some large dependencies.

@nk9
Copy link
Contributor Author

nk9 commented Mar 1, 2024

OK, well I filed this bug to document that I wanted to do something I thought was very common, and yet could find no sample code on how to do it. I've solved my problem at this point. If you want people to actually use this library for parsing features and properties out of geojson files, especially given that it's nonobvious how to do that, then having some sample code would really help and I'd suggest this bug should stay open.

But if people looking for a simple way to iterate through features/properties in a GeoJSON files should just use geojson instead, which already has sample code for this, then that's fine too. I'd still suggest that it would be friendly to new devs to put a pointer about this in the docs, but that's up to you.

Thanks for engaging with me on this, just trying to make things a little easier for the next guy or gal. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants