Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Handling Issues #44

Closed
brandonreavis opened this issue Nov 14, 2014 · 7 comments
Closed

CSV Handling Issues #44

brandonreavis opened this issue Nov 14, 2014 · 7 comments

Comments

@brandonreavis
Copy link

By removing mapnik (#43), we will need to figure out to read and validate CSV's the same way mapnik does.

JSON geometry column
Does mapnik-omnivore need to support JSON geometry within a CSV like mapnik does? Do some CSVs actually do this?

GDAL doesn't really have a great way to go from JSON straight to a geometry object yet, so it would have to be a purely JS conversion which seems nasty and slow. The only "fast" alternative I've thought of is dumping the json column into a Buffer object, then opening a VSI memory file with the GeoJSON driver: naturalatlas/node-gdal#80.

I suppose if only the extent is needed from the spatial column, you could just flatten the coordinates array in the each JSON object and then manually come up with the extent... but, again, this seems pretty slow for just getting an extent / center.

cc: @springmeyer

@springmeyer
Copy link
Contributor

Thanks for laying out the problem.

JSON in CSV was pioneered by @jlord @maxogden and is used as the output format from http://filebakery.com. I'd like to keep supporting it. It looks like it is under discussion by other CSV folks too: https://issues.apache.org/jira/browse/CSV-116

/cc @maxogden - do you have a geocsv parser/validator in pure js we could leverage? If this exists and we see any major differences to the Mapnik one I would be inclined to change Mapnik's behavior to conform.

@springmeyer
Copy link
Contributor

@brandonreavis
Copy link
Author

Ahh I see. The best solution would be just to make a legit geocsv OGR driver that we could use consistently across mapnik-omnivore, node-gdal, mapnik, etc. It would also open the door for getting data out of this format easily with ogr2ogr.

I'm not sure if I have the time to devote to this though... I'd imagine it would be substantial undertaking.

@brandonreavis
Copy link
Author

For now I think using a normal node CSV parser will work fine. Here's a hackish way to deal with the JSON geometry if all mapnik-omnivore needs is the extent:

function getJSONExtent(str){
    // example inputs: 
    // "{\"type\":\"Point\",\"coordinates\":[30.0,10.0]}" // escaped "
    // '{"type":"Point","coordinates":[30.0,10.0]}' // single quotes no need for escaping "
    // "{""type"":""Point"",""coordinates"":[30.0,10.0]}" // filebakery.com style ""

    //Assumption: coordinate array will be the only array in a JSON geometry object
    //1. Strip away all characters but array text
    //2. Remove any brackets/whitespace and repeated commas (may be unnecessary if no empty subarrays)
    //3. Parse as JSON

    str = str.substring(str.indexOf('['), str.lastIndexOf(']')); 
    str = str.replace(/[\[\]\s]/g, '').replace(/\.{2,}/g, ','); 
    var arr = JSON.parse('['+str+']');

    var len = arr.length;
    var dim = len % 3 == 0 ? 3 : 2;

    var minX = Number.POSITIVE_INFINITY;
    var minY = Number.POSITIVE_INFINITY;
    var maxX = Number.NEGATIVE_INFINITY;
    var maxY = Number.NEGATIVE_INFINITY;

    for(var i=0; i<len; i+=dim){
        var x = arr[i];
        var y = arr[i+1];
        minX = Math.min(minX, x);
        minY = Math.min(minY, y);
        maxX = Math.max(maxX, x);
        maxY = Math.max(maxY, y);
    }

    return new gdal.Envelope({minX:minX, minY:minY, maxX:maxX, maxY:maxY});
}

function getWKTExtent(str){
    return gdal.Geometry.fromWKT(str).getEnvelope();
}

Speed comparison
Finding the extent of 1000 linestrings each with 100pts:

  • getWKTExtent: 118ms
  • getJSONExtent: 80ms

@max-mapper
Copy link

@springmeyer I don't have any geocsv specific parsers in JS, sorry! But that's not a bad idea...

@springmeyer
Copy link
Contributor

@springmeyer I don't have any geocsv specific parsers in JS, sorry! But that's not a bad idea...

Ah, okay! Seems like we should formalize a quick spec for geocsv that spells out it is valid csv:

  • plus x/y, geojson, and wkt columns are looked for
  • assumed crs of wgs84

And then the implementations would follow. What do you think?

For now I think using a normal node CSV parser will work fine.

@brandonreavis - I agree, a quick js CSV parser I think is a good workaround for now.

The best solution would be just to make a legit geocsv OGR driver

Good idea. Another idea I had would be to use emscripten to generate a pure js geocsv parser from Mapnik's impl: which would be a bit round-about but a decent medium term solution for node.js apps wanting to stay light.

@brandonreavis
Copy link
Author

Alright the quick and lightweight js geoCSV parser for finding basic infomation is over here: https://github.com/naturalatlas/geocsv-info

I'll close this and we can bring up more specific issues at:
https://github.com/naturalatlas/geocsv-info/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants