Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeReader SPI #2977

Conversation

jbouffard
Copy link
Contributor

Overview

This PR provides an SPI API for RangeReader for the following backeds: file, hdfs, http, and s3.

Checklist

  • docs/CHANGELOG.rst updated, if necessary
  • docs guides update, if necessary
  • New user API has useful Scaladoc strings
  • Unit tests added for bug-fix or new feature

Demo

val rangeReader: FileRangeReader = RangeReader(new URI("file///tmp/catalog/file.tif"))

Notes

The HdfsRangeReader has a problem where it will use a default Configuration even if another one is present. This is because there is no current way of getting a Configuration without it being provided explicitly or found via the SparkContext. HdfsConfig introduced in geotrellis/geotrellis-contrib#186 would resolve this issue.

Closes #2940

jbouffard and others added 5 commits June 7, 2019 16:23
…the (locationtech#2927)

spark package into it

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added the apacheIO dependency to the layers project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Made the spark package depend on the layers package

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the avro package to geotrellis.layers.io and updated the imports

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the index package to geotrellis.layers.io and updated the imports

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the json package to geotrellis.layers.io and updated some of the imports

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved Metadata, TileLayerMetadata, LayerId, and Mergable from spark and spark.merge to layers and layers.merge, respectively. In addition, imports were updated

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the io, cog, file, and hadoop logic from geotrellis.spark.io into geotrellis.layers.io

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved buffer and mapalgebra logic from spark.buffer and spark.mapalgebra to layers.buffer and layers.mapalbegra, respectively

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the Collection mask methods from spark.mask to layer.mask

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Broke out collections API spark into the layers package

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added application.conf to layers tests' resources

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Refactored the BufferTiles object

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Refactored the Mask object

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Cleaned and added tests

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved BufferedTile and BufferSizes to the raster package

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Removed the mapalgebra tests from layers

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Removed the io package in layers

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Cleaned up the hadoop formats logic

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved the KeyJsonFormats to the tiling project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the spark-testkit

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated and fixes the spark tests so that they can compile

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Continued to try and get the spark tests working

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Continued to woking on the spark tests

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the tests so that they all passed

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Cleaned up the hadoop backend packages

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved CamelCaseConfig to geotrellis.util

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the njoin methods parJoin and moved them into IOUtils

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Moved HdfsUtilsSpec back to spark tests

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed TileLayerRDDMetadata to CollectTileLayerMetadata

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the RDDLayerProviders to SparkLayerProviders

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the spark resources

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added a TODO

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Split the accumulo package into two new ones: accumulo and layers-accumulo. The former contains logic that uses Spark while the latter does not

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added a reference.conf to the layers-accumulo package

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the accumulo tests

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the CHANGELOG

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added layers-accumulo to the root project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added layers-accumulo to the various scripts

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the geowave project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Split the cassandra package into two new ones: cassandra and layers-cassandra. The former contains logic that uses Spark while the latter does not

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the cassandra tests

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Removed unneeded wrapper class

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added layers-cassandra to the root project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Added layers-cassandra to the various scripts

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Split HBase spark/non spark code

Organize hbase store project

Delete old hbase dir

Add hbase collection layer provider

Update META-INF for SPI

Rename layer provider for spark case

Split the s3 package into two new ones: s3 and s3-store. The former contains logic that uses Spark while the latter does not

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the SPIs for the s3 and s3-store packages

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Cleaned up the hbase packages

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the spark-pipeline project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the doc-examples project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed imports

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the cassandra-layers and cassandra packages to cassandra-store and cassandra-spark, respectively

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated in the imports for the cassandra-spark console

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed docker image name

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Bumped the downloaded Hbase version to 2.1.5

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the accumulo-layers and accumulo packages to accumulo-store and accumulo-spark, respectively

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the class path in the accumulo-spark package to geotrellis.spark.store.accumulo

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated in the imports for the accumulo-spark console

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Cleaned up after rebase

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed the serialization in HadoopAttributeStore

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Changed the constructors for the HadoopAttributeStore

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the imports in the geowave package

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed the resources in the hbase-spark project

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed the layerIdString method in HBaseAttributeStore

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed spark.io to spark.store

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Renamed the s3 package to s3-spark

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated the dependencies

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed the container name for cassandra

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Fixed failing test

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Update ingest documentation

Update docs; scripts; changelog

Update heading in pipeline docs

Reremove etl

Remove lingering references to deprecated ETL

Fixed bad rebase

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
@moradology
Copy link
Contributor

Take a look at the provider here - hdfs+ isn't valid outside of GT but we use it to provide clues to SPI

Copy link
Contributor

@moradology moradology left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. There are some spots that might be worth reflecting on that I've called out

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
@pomadchin pomadchin force-pushed the feature/range-reader-spi branch 2 times, most recently from d59a745 to 5828659 Compare June 13, 2019 01:07
Copy link
Member

@pomadchin pomadchin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed tests, check out d7bacc8, I left only a single comment; also there are no headers in all new files. You can generate headers using headerCreate command and test:headerCreate for tests.

Also tests folders structure doesn't follow the package namings (the real question is why all the tests are in /src/test/scala/geotrellis/spark/io/s3/?) and there is no CHANGELOG.


import org.scalatest._

class S3RangeReaderProviderSpec extends FunSpec with Matchers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this file is in s3-spark and not in s3-store? The implementation of the S3RangeReaderProvider is in s3-store.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There wasn't a really great reason. We just have all of our tests for s3-store in s3-spark still, so I just put it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbouffard can you look into the tests structure more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pomadchin Actually, we can't move it to s3-store as it needs TestUtils which is in s3-spark.

@pomadchin pomadchin force-pushed the feature/range-reader-spi branch from 5828659 to 3ec17b7 Compare June 13, 2019 01:43
@pomadchin pomadchin force-pushed the feature/range-reader-spi branch from 3ec17b7 to d7bacc8 Compare June 13, 2019 02:26
@echeipesh echeipesh force-pushed the feature/spark-reorganization branch 5 times, most recently from d5aa869 to 5af6e80 Compare June 15, 2019 19:13
@pomadchin
Copy link
Member

@jbouffard can you rebase this PR?

@jbouffard jbouffard mentioned this pull request Jun 17, 2019
3 tasks
@jbouffard
Copy link
Contributor Author

Superseded by #2998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants