RangeReader SPI #2977

jbouffard · 2019-06-11T19:59:25Z

Overview

This PR provides an SPI API for RangeReader for the following backeds: file, hdfs, http, and s3.

Checklist

docs/CHANGELOG.rst updated, if necessary
docs guides update, if necessary
New user API has useful Scaladoc strings
Unit tests added for bug-fix or new feature

Demo

val rangeReader: FileRangeReader = RangeReader(new URI("file///tmp/catalog/file.tif"))

Notes

The HdfsRangeReader has a problem where it will use a default Configuration even if another one is present. This is because there is no current way of getting a Configuration without it being provided explicitly or found via the SparkContext. HdfsConfig introduced in geotrellis/geotrellis-contrib#186 would resolve this issue.

Closes #2940

…the (locationtech#2927) spark package into it Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added the apacheIO dependency to the layers project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Made the spark package depend on the layers package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the avro package to geotrellis.layers.io and updated the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the index package to geotrellis.layers.io and updated the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the json package to geotrellis.layers.io and updated some of the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved Metadata, TileLayerMetadata, LayerId, and Mergable from spark and spark.merge to layers and layers.merge, respectively. In addition, imports were updated Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the io, cog, file, and hadoop logic from geotrellis.spark.io into geotrellis.layers.io Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved buffer and mapalgebra logic from spark.buffer and spark.mapalgebra to layers.buffer and layers.mapalbegra, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the Collection mask methods from spark.mask to layer.mask Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Broke out collections API spark into the layers package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added application.conf to layers tests' resources Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Refactored the BufferTiles object Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Refactored the Mask object Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned and added tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved BufferedTile and BufferSizes to the raster package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed the mapalgebra tests from layers Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed the io package in layers Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hadoop formats logic Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the KeyJsonFormats to the tiling project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the spark-testkit Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated and fixes the spark tests so that they can compile Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Continued to try and get the spark tests working Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Continued to woking on the spark tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the tests so that they all passed Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hadoop backend packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved CamelCaseConfig to geotrellis.util Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the njoin methods parJoin and moved them into IOUtils Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved HdfsUtilsSpec back to spark tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed TileLayerRDDMetadata to CollectTileLayerMetadata Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the RDDLayerProviders to SparkLayerProviders Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the spark resources Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added a TODO Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split the accumulo package into two new ones: accumulo and layers-accumulo. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added a reference.conf to the layers-accumulo package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the accumulo tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the CHANGELOG Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-accumulo to the root project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-accumulo to the various scripts Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the geowave project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split the cassandra package into two new ones: cassandra and layers-cassandra. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the cassandra tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed unneeded wrapper class Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-cassandra to the root project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-cassandra to the various scripts Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split HBase spark/non spark code Organize hbase store project Delete old hbase dir Add hbase collection layer provider Update META-INF for SPI Rename layer provider for spark case Split the s3 package into two new ones: s3 and s3-store. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the SPIs for the s3 and s3-store packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hbase packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the spark-pipeline project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the doc-examples project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the cassandra-layers and cassandra packages to cassandra-store and cassandra-spark, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated in the imports for the cassandra-spark console Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed docker image name Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Bumped the downloaded Hbase version to 2.1.5 Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the accumulo-layers and accumulo packages to accumulo-store and accumulo-spark, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the class path in the accumulo-spark package to geotrellis.spark.store.accumulo Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated in the imports for the accumulo-spark console Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up after rebase Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the serialization in HadoopAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Changed the constructors for the HadoopAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the imports in the geowave package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the resources in the hbase-spark project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the layerIdString method in HBaseAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed spark.io to spark.store Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the s3 package to s3-spark Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the dependencies Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the container name for cassandra Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed failing test Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Update ingest documentation Update docs; scripts; changelog Update heading in pipeline docs Reremove etl Remove lingering references to deprecated ETL Fixed bad rebase Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

moradology · 2019-06-12T15:24:23Z

Take a look at the provider here - hdfs+ isn't valid outside of GT but we use it to provide clues to SPI

moradology

Looking pretty good. There are some spots that might be worth reflecting on that I've called out

layers/src/main/scala/geotrellis/layers/hadoop/HadoopCollectionLayerProvider.scala

layers/src/main/scala/geotrellis/layers/hadoop/util/HdfsRangeReaderProvider.scala

project/Dependencies.scala

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

pomadchin

I fixed tests, check out d7bacc8, I left only a single comment; also there are no headers in all new files. You can generate headers using headerCreate command and test:headerCreate for tests.

Also tests folders structure doesn't follow the package namings (the real question is why all the tests are in /src/test/scala/geotrellis/spark/io/s3/?) and there is no CHANGELOG.

pomadchin · 2019-06-13T01:13:23Z

s3-spark/src/test/scala/geotrellis/spark/io/s3/util/S3RangeReaderProviderSpec.scala

+
+import org.scalatest._
+
+class S3RangeReaderProviderSpec extends FunSpec with Matchers {


Why this file is in s3-spark and not in s3-store? The implementation of the S3RangeReaderProvider is in s3-store.

There wasn't a really great reason. We just have all of our tests for s3-store in s3-spark still, so I just put it there.

@jbouffard can you look into the tests structure more?

@pomadchin Actually, we can't move it to s3-store as it needs TestUtils which is in s3-spark.

s3-spark/src/test/scala/geotrellis/spark/io/s3/util/S3RangeReaderProviderSpec.scala

s3-store/src/main/scala/geotrellis/store/s3/util/S3RangeReader.scala

pomadchin · 2019-06-15T22:19:52Z

@jbouffard can you rebase this PR?

jbouffard · 2019-06-17T14:16:00Z

Superseded by #2998

jbouffard and others added 5 commits June 7, 2019 16:23

Remove ETL

29f80fe

Update ingest documentation Update docs; scripts; changelog Update heading in pipeline docs Reremove etl Remove lingering references to deprecated ETL Fixed bad rebase Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Created RangeReader SPI for the file, http, and s3 backends

dfc8df8

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Implemented RangeReader SPI for hdfs

04a1787

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

Updated imports

e66aa5a

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

moradology suggested changes Jun 12, 2019

View reviewed changes

Cleaned up code based on review feedback

45d47d3

Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>

pomadchin added the in progress label Jun 12, 2019

pomadchin force-pushed the feature/range-reader-spi branch 2 times, most recently from d59a745 to 5828659 Compare June 13, 2019 01:07

pomadchin requested changes Jun 13, 2019

View reviewed changes

pomadchin force-pushed the feature/range-reader-spi branch from 5828659 to 3ec17b7 Compare June 13, 2019 01:43

pomadchin reviewed Jun 13, 2019

View reviewed changes

s3-spark/src/test/scala/geotrellis/spark/io/s3/util/S3RangeReaderProviderSpec.scala Show resolved Hide resolved

Fix S3Provider specs

d7bacc8

pomadchin force-pushed the feature/range-reader-spi branch from 3ec17b7 to d7bacc8 Compare June 13, 2019 02:26

pomadchin reviewed Jun 13, 2019

View reviewed changes

s3-store/src/main/scala/geotrellis/store/s3/util/S3RangeReader.scala Show resolved Hide resolved

echeipesh force-pushed the feature/spark-reorganization branch 5 times, most recently from d5aa869 to 5af6e80 Compare June 15, 2019 19:13

pomadchin assigned jbouffard Jun 15, 2019

jbouffard mentioned this pull request Jun 17, 2019

RangeReader SPI #2998

Merged

3 tasks

jbouffard closed this Jun 17, 2019

pomadchin removed the in progress label Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RangeReader SPI #2977

RangeReader SPI #2977

jbouffard commented Jun 11, 2019

moradology commented Jun 12, 2019

moradology left a comment

pomadchin left a comment •

edited

Loading

pomadchin Jun 13, 2019

jbouffard Jun 13, 2019

pomadchin Jun 14, 2019

jbouffard Jun 14, 2019

pomadchin commented Jun 15, 2019

jbouffard commented Jun 17, 2019


		import org.scalatest._

		class S3RangeReaderProviderSpec extends FunSpec with Matchers {

RangeReader SPI #2977

RangeReader SPI #2977

Conversation

jbouffard commented Jun 11, 2019

Overview

Checklist

Demo

Notes

moradology commented Jun 12, 2019

moradology left a comment

Choose a reason for hiding this comment

pomadchin left a comment • edited Loading

Choose a reason for hiding this comment

pomadchin Jun 13, 2019

Choose a reason for hiding this comment

jbouffard Jun 13, 2019

Choose a reason for hiding this comment

pomadchin Jun 14, 2019

Choose a reason for hiding this comment

jbouffard Jun 14, 2019

Choose a reason for hiding this comment

pomadchin commented Jun 15, 2019

jbouffard commented Jun 17, 2019

pomadchin left a comment •

edited

Loading