-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RangeReader SPI #2977
RangeReader SPI #2977
Conversation
…the (locationtech#2927) spark package into it Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added the apacheIO dependency to the layers project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Made the spark package depend on the layers package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the avro package to geotrellis.layers.io and updated the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the index package to geotrellis.layers.io and updated the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the json package to geotrellis.layers.io and updated some of the imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved Metadata, TileLayerMetadata, LayerId, and Mergable from spark and spark.merge to layers and layers.merge, respectively. In addition, imports were updated Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the io, cog, file, and hadoop logic from geotrellis.spark.io into geotrellis.layers.io Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved buffer and mapalgebra logic from spark.buffer and spark.mapalgebra to layers.buffer and layers.mapalbegra, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the Collection mask methods from spark.mask to layer.mask Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Broke out collections API spark into the layers package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added application.conf to layers tests' resources Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Refactored the BufferTiles object Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Refactored the Mask object Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned and added tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved BufferedTile and BufferSizes to the raster package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed the mapalgebra tests from layers Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed the io package in layers Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hadoop formats logic Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved the KeyJsonFormats to the tiling project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the spark-testkit Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated and fixes the spark tests so that they can compile Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Continued to try and get the spark tests working Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Continued to woking on the spark tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the tests so that they all passed Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hadoop backend packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved CamelCaseConfig to geotrellis.util Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the njoin methods parJoin and moved them into IOUtils Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Moved HdfsUtilsSpec back to spark tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed TileLayerRDDMetadata to CollectTileLayerMetadata Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the RDDLayerProviders to SparkLayerProviders Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the spark resources Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added a TODO Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split the accumulo package into two new ones: accumulo and layers-accumulo. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added a reference.conf to the layers-accumulo package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the accumulo tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the CHANGELOG Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-accumulo to the root project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-accumulo to the various scripts Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the geowave project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split the cassandra package into two new ones: cassandra and layers-cassandra. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the cassandra tests Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Removed unneeded wrapper class Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-cassandra to the root project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Added layers-cassandra to the various scripts Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Split HBase spark/non spark code Organize hbase store project Delete old hbase dir Add hbase collection layer provider Update META-INF for SPI Rename layer provider for spark case Split the s3 package into two new ones: s3 and s3-store. The former contains logic that uses Spark while the latter does not Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the SPIs for the s3 and s3-store packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up the hbase packages Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the spark-pipeline project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the doc-examples project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed imports Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the cassandra-layers and cassandra packages to cassandra-store and cassandra-spark, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated in the imports for the cassandra-spark console Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed docker image name Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Bumped the downloaded Hbase version to 2.1.5 Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the accumulo-layers and accumulo packages to accumulo-store and accumulo-spark, respectively Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the class path in the accumulo-spark package to geotrellis.spark.store.accumulo Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated in the imports for the accumulo-spark console Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Cleaned up after rebase Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the serialization in HadoopAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Changed the constructors for the HadoopAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the imports in the geowave package Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the resources in the hbase-spark project Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the layerIdString method in HBaseAttributeStore Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed spark.io to spark.store Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Renamed the s3 package to s3-spark Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Updated the dependencies Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed the container name for cassandra Signed-off-by: Jacob Bouffard <jbouffard@azavea.com> Fixed failing test Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Update ingest documentation Update docs; scripts; changelog Update heading in pipeline docs Reremove etl Remove lingering references to deprecated ETL Fixed bad rebase Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
Take a look at the provider here - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. There are some spots that might be worth reflecting on that I've called out
layers/src/main/scala/geotrellis/layers/hadoop/HadoopCollectionLayerProvider.scala
Outdated
Show resolved
Hide resolved
layers/src/main/scala/geotrellis/layers/hadoop/util/HdfsRangeReaderProvider.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Jacob Bouffard <jbouffard@azavea.com>
d59a745
to
5828659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed tests, check out d7bacc8, I left only a single comment; also there are no headers in all new files. You can generate headers using headerCreate
command and test:headerCreate
for tests.
Also tests folders structure doesn't follow the package namings (the real question is why all the tests are in /src/test/scala/geotrellis/spark/io/s3/
?) and there is no CHANGELOG.
|
||
import org.scalatest._ | ||
|
||
class S3RangeReaderProviderSpec extends FunSpec with Matchers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this file is in s3-spark
and not in s3-store
? The implementation of the S3RangeReaderProvider is in s3-store
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There wasn't a really great reason. We just have all of our tests for s3-store
in s3-spark
still, so I just put it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbouffard can you look into the tests structure more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pomadchin Actually, we can't move it to s3-store
as it needs TestUtils
which is in s3-spark
.
5828659
to
3ec17b7
Compare
s3-spark/src/test/scala/geotrellis/spark/io/s3/util/S3RangeReaderProviderSpec.scala
Show resolved
Hide resolved
3ec17b7
to
d7bacc8
Compare
d5aa869
to
5af6e80
Compare
@jbouffard can you rebase this PR? |
Superseded by #2998 |
Overview
This PR provides an SPI API for
RangeReader
for the following backeds:file
,hdfs
,http
, ands3
.Checklist
docs/CHANGELOG.rst
updated, if necessarydocs
guides update, if necessaryDemo
Notes
The
HdfsRangeReader
has a problem where it will use a defaultConfiguration
even if another one is present. This is because there is no current way of getting aConfiguration
without it being provided explicitly or found via theSparkContext
.HdfsConfig
introduced in geotrellis/geotrellis-contrib#186 would resolve this issue.Closes #2940