Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use node-gdal more #43

Closed
springmeyer opened this issue Nov 13, 2014 · 22 comments
Closed

Use node-gdal more #43

springmeyer opened this issue Nov 13, 2014 · 22 comments

Comments

@springmeyer
Copy link
Contributor

We need to replace temporary code like this to use node-gdal, and why not actually just try to use node-gdal everywhere instead of needing node-mapnik around? The vision behind omnivore is to figure out the right metadata in order to pass along to Mapnik for fast reading of the data. Mapnik itself does not need to be involved in figuring out this metadata.

/cc @brandonreavis who I've been chatting with about this.

@springmeyer
Copy link
Contributor Author

in concert with reducing dependencies, would be great to hit mapbox/node-srs#49 at some point too.

@brandonreavis
Copy link

I've started refactoring process and have come up with a couple of really minor questions:

Multilayer vector datasets
Right now it looks like first layer that has features is used for computing the extent / center. Is it best to keep this approach? Or union together the extent from each valid layer? Imagine a user inputs a .gpx file with a waypoints layer containing one marker, and a routes layer with a bunch of long paths. If the waypoints layer gets picked, the extent / center will be wrong. I could be totally missing something though.

Output formatting
For now I will make everything work identically to before, but at some point maybe a reorganizing the output would make some sense.

  • Why is the 'vector_layers' property in a 'json' object?
  • Why is the minzoom / maxzoom computed for the overall datasource metadata but set to 0 and 22 for each layer in vector_layers?

To me, it would make the most sense to me to make mapnik-omnivore output an array of mml-like Layer objects. This could lessen the number intermediate steps when going from mapnik-omnivore to carto. Really not a big deal at the moment. Would be nice to see this issue mapbox/carto#274 addressed for that to happen though.

@springmeyer
Copy link
Contributor Author

great questions, I'll defer to @yhahn and @GretaCB who will best answering.

@yhahn
Copy link
Member

yhahn commented Nov 14, 2014

Hey @brandonreavis,

Multilayer vector datasets

I think @GretaCB can comment better on this one but I don't see any reason not to calculate extent/center from all the layers in multilayer datasources.

Output formatting

Yep, this could use another pass but will probably be something we take on in a breaking API change in a future version maybe.

OTOH combining the layers and vector_layers keys and leaving out the minzoom/maxzoom properties are things that come to mind -- vector_layers is a key specific to creating vector tiles from a datasource and that's not necessarily something everyone using mapnik-omnivore will do.

@brandonreavis
Copy link

Alright I think it's getting pretty close. While removing mapnik I tried to distill things a bit and make it a bit more modular. The output should be pretty much identical to earlier though!

The changes can be seen here:
https://github.com/naturalatlas/mapnik-omnivore/tree/gdal-refactor

Changes

  • Broke out the datasourceProcessor into separate "digestor" methods. If there any formats in the future that need to be parsed separately (like geocsv needed) a new method can be easily added.
  • It runs through each digestor until one works.
  • Extents of multilayer datasets are now the union of the extent from each layer.
  • Removed datasourceProcessor-specific tests. Some of these should probably be added back in, but they should just be incorporated into the digest method tests.
  • metadata.raster.bands[i].overviews now returns an array of overview sizes instead of an empty object. ex. [{x: 256, y: 256}, ...]

Breaking changes

Needed changes

  • Update tests to be less rigid
    • A number of the tests fail because of slightly different floating point values, or there is more more metadata now than there used to be
    • Switching the tests over to mocha would be nice for quickly seeing the difference between these objects
  • Add back node-srs? ... I forgot I chopped it out until now, but it is probably necessary for returning the @springmeyer -canonical-spherical-mercator srs instead of GDAL's default output? It would be good finish Re-implement this module as a pure JS wrapper around node-gdal node-srs#49 first though.

Additional proposed changes

  • make module.exports equal to function Digest() ...

@springmeyer
Copy link
Contributor Author

Awesome work @brandonreavis. Have my head down on mapbox/mapbox-studio-classic#1040 still but will review soon.

@GretaCB
Copy link
Contributor

GretaCB commented Nov 19, 2014

Awesome @brandonreavis .

  • I definitely support union for extent/center. I was grabbing the first valid datasource to use as the extent/center, and didn't realize I could union. 👍
  • Also, the tests are super finicky due to raster floating point differences in operating systems. I added a workaround for that in the tests by truncating the values in a couple different ways.
  • The tests can probably be spruced up. You can run tests using UPDATE=1 npm test to account for new json values, if expected.
  • The tests used to be mocha. @yhahn why did we switch to tape?
  • @brandonreavis are you validating that a file is CSV by parsing the entire file?

@brandonreavis
Copy link

Ahh that's what the fs.writeFile() stuff in the tests is about. I now see that in the datasourceProcessor tests, but it's in a try/catch in the index.test.js file. But yeah, I think it shouldn't be too hard get them all passing again!

I am hesitant to do the UPDATE=1 npm test thing until its clear what values have changed and if they are negligible. For example, the raster extents seemed like they were off by a few hundredths of a degree, and that could be because mapnik computes the extent slightly differently or I'm doing something wrong.

Using chai's assert.closeTo() to do floating point tests with node-gdal has saved some time and could help with this. But yeah, I don't really want to do much with the tests or convert them to mocha until I know why they were converted to tape from mocha earlier.

CSVs
Right now I am doing a quick file extension check and skipping the csv digestor if it doesnt match one of these: .csv, .tsv, .txt. If the filename looks like a CSV then it's going through the whole CSV and computing the extent. I added an 'estimate' option to geocsv-info so we can just look at a certain number of rows though (similar to the 'row_count' thing you were doing with mapnik). This could still use a little work though.

Is this fine? Are there any other extensions I should add? or should we always open up the file and look at the first 100 lines?

@springmeyer
Copy link
Contributor Author

Tape port happened to workaround that appveyor/node v0.10.x/mocha bug that caused missing log output. And because http://www.macwright.org/2014/03/11/tape-is-cool.html.

The CSV row_limit is a recent addition on my suggestion. I realize now it could result in incorrect extents for the dataset so we should consider reverting back to reading the whole thing. That said supporting row_limit/estimate extent in geocsv-info is still a useful feature.

Really interesting that the extents are slightly off for rasters. Mapnik is calling into gdal to get the extents so it will be interesting to see who is wrong and whether it is a bug or just floating point drift.

As far as CSV extensions yes those should be good! Thanks!

@brianreavis
Copy link
Contributor

Tap/tape/mocha: Yeah, mochajs/mocha#333 was a bummer. Thankfully mochajs/mocha#1339 fixed it (it's been working fine for node-gdal). I get the pros that @tmcw lays out there for tap (all totally valid), but the limited structure for large projects and vague failure details makes it hard for outsiders to come into a project and know what's going on.

@brandonreavis
Copy link

@springmeyer, I just tested the extent right before converting it to WGS84 and it matches the output of gdalinfo identically. The error seems to be introduced when transforming the extent polygon to WGS84 here as far as I can tell. Maybe mapnik-omnivore/mapnik uses a slightly different definition for WGS84 or its just floating point differences when doing the transform. I don't think it's anything major.

actual   = [ -110.3765574645805, 44.53327824851143, -110.27303961243526,  44.59676651822825 ]
expected = [ -110.3650933429331, 44.53327824851143, -110.28443250326441,  44.596766518228264 ]

@brandonreavis
Copy link

Ahh found the problem. Either mapnik or the mapnik-omnivore (previously) was transforming the upper left / lower right point and setting the extent based on those points.

If you imagine a rectangle rotated clockwise by a few degrees, the bottom left and top right points are what should define the extent. That's why the extent is a little wider than it was before, but the height is spot on.

@springmeyer
Copy link
Contributor Author

@brianreavis - wow, thanks for the links to mocha fixes. I did not know about these - I'm still using/happy with mocha in node-mapnik so do I just need to upgrade to mocha 2.x to get node v0.10.x+mocha+appveyor output working?

@brandonreavis - wrt to the extent differences - so do you see any action that we need to take? Do the results still differ between gdal and mapnik?

@brandonreavis
Copy link

@springmeyer - Yeah, it seems like this should be fixed in node-mapnik here:

mapnik_projection.cpp

double minx = a->Get(0)->NumberValue();
double miny = a->Get(1)->NumberValue();
double maxx = a->Get(2)->NumberValue();
double maxy = a->Get(3)->NumberValue();
p->projection_->forward(minx,miny);
p->projection_->forward(maxx,maxy);
Local<Array> arr = NanNew<Array>(4);
arr->Set(0, NanNew(minx));
arr->Set(1, NanNew(miny));
arr->Set(2, NanNew(maxx));
arr->Set(3, NanNew(maxy));
NanReturnValue(arr);

should become something like this:

double ulx, ulx, urx, ury, lrx, lry, llx, lly;
ulx = llx = a->Get(0)->NumberValue();
lry = lly = a->Get(1)->NumberValue();
lrx = urx = a->Get(2)->NumberValue();
uly = ury = a->Get(3)->NumberValue();
p->projection_->forward(ulx,uly);
p->projection_->forward(urx,ury);
p->projection_->forward(lrx,lry);
p->projection_->forward(llx,lly);
Local<Array> arr = NanNew<Array>(4);
arr->Set(0, NanNew(ulx<llx ? ulx : llx));
arr->Set(1, NanNew(lry<lly ? lry : lly));
arr->Set(2, NanNew(urx>lrx ? urx : lrx));
arr->Set(3, NanNew(ury>uly ? ury : uly));
NanReturnValue(arr);

@brandonreavis
Copy link

Turns out it also would need to be changed in mapnik too:

https://github.com/mapnik/mapnik/blob/master/src/proj_transform.cpp#L200

The backward methods would need a similar fix.

@springmeyer
Copy link
Contributor Author

@brandonreavis - Okay, thank you.

If you imagine a rectangle rotated clockwise by a few degrees, the bottom left and top right points are what should define the extent

Is there documentation you know of describing this as the ideal method for bbox calculations?

/cc @artemp

@brandonreavis
Copy link

Hmm not that I know of. Can a coordinate transformation ever mirror about the x axis? Safest thing to do would be just to recompute the extent based on all the corners like:

minx = Math.min(ulx, urx, lrx, llx)

@brianreavis
Copy link
Contributor

do I just need to upgrade to mocha 2.x to get node v0.10.x+mocha+appveyor output working?

@springmeyer Looks like I was wrong... seems like it's not in master yet. We're waiting on: mochajs/mocha#1440. For a workaround see: https://github.com/naturalatlas/node-gdal/blob/master/appveyor.yml#L37 (I won't be deleting that branch until 1440 is merged)

@artemp
Copy link

artemp commented Dec 3, 2014

@brianreavis @springmeyer - re-projecting all four corners sounds good.

@springmeyer
Copy link
Contributor Author

Update on my thinking here:

  • Solid PostGIS support in mapnik-omnivore is going to require keeping node-mapnik around (We want to use node-mapnik to query Mapnik for the metadata gleaned from Mapnik's communication with postgres which is very particular given Mapnik's support for SQL subqueries).
  • We've solved the build issues that was one of the other motivators to trying this (getting node-mapnik and node-gdal built with the same visual studio version)
  • That said, reducing the use of node-mapnik in some cases and increasing the use of node-gdal still makes good sense so integrating @brandonreavis's work will be great.

Here are the steps I'm seeing (for Jan/Feb 2015):

After those steps are done then we'd be able to move on getting mapnik-omnivore updated to use new node-gdal to be able to bring in fixes from https://github.com/naturalatlas/mapnik-omnivore/tree/gdal-refactor.

Other followups:

@springmeyer springmeyer changed the title node-mapnikless mapnik-omnivore Use node-gdal more Jan 14, 2015
@springmeyer
Copy link
Contributor Author

Also just updated the ticket name to avoid confusion: now says Use node-gdal more. We are keeping mapnik, but this ticket stands for recording the items above towards using more of node-gdal over time.

@springmeyer
Copy link
Contributor Author

After those steps are done then we'd be able to move on getting mapnik-omnivore updated to use new node-gdal to be able to bring in fixes from https://github.com/naturalatlas/mapnik-omnivore/tree/gdal-refactor.

This happened!

BBOX fixes in mapnik/node-mapnik: mapnik/mapnik#2618

This also happened. These were the key last things keeping this ticket open. Closing now. Overall we want and are going to keep using both mapnik and gdal in omnivore and this is working well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants