Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python raster reader argument refactor #329

Conversation

vpipkt
Copy link
Member

@vpipkt vpipkt commented Sep 2, 2019

Initial argument for spark.read.raster is called source and can now be a str, list of str, list of list of str, pandas DF catalog or spark DF catalog.

Add docstring to this important entry point for users.

Simplifies documentation pages to remove unneeded catalog named argument.

Expands unit test in python to all claimed use cases of source.

Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
…on-raster-reader-arg-refactor

Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt vpipkt requested a review from metasim September 2, 2019 16:26
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt vpipkt added this to the 0.8.2 milestone Sep 2, 2019
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Copy link
Member

@metasim metasim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome step forward in usability. Thank you for taking this on.

I suggest going back through the Scala API and thinking about how we can synchronize the names so we can maintain parity in naming (and functionality). As much as we can get away with, I'd like there to be a more or less mechanical translation between the Python API and Scala API, which starts with consistent names.

def fromCatalog(catalog: DataFrame, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader] {
val tmpName = tmpTableName()
catalog.createOrReplaceTempView(tmpName)
reader
.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tmpName)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(",")): DataFrameReader
}
def fromCatalog(tableName: String, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tableName)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(","))
)
def fromCSV(catalogCSV: String, bandColumnNames: String*): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.CATALOG_CSV_PARAM, catalogCSV)
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(","))
)
def from(newlineDelimPaths: String): RasterSourceDataFrameReader =
tag[RasterSourceDataFrameReaderTag][DataFrameReader](
reader.option(RasterSourceDataSource.PATHS_PARAM, newlineDelimPaths)
)
def from(paths: Seq[String]): RasterSourceDataFrameReader =
from(paths.mkString("\n"))
def from(uris: Seq[URI])(implicit d: DummyImplicit): RasterSourceDataFrameReader =

Where in Python we have to do type inspection, in Scala we can use method overloading.

Also consider revising option key names, as appropraite:

final val SHORT_NAME = "raster"
final val PATH_PARAM = "path"
final val PATHS_PARAM = "paths"
final val BAND_INDEXES_PARAM = "bandIndexes"
final val TILE_DIMS_PARAM = "tileDimensions"
final val CATALOG_TABLE_PARAM = "catalogTable"
final val CATALOG_TABLE_COLS_PARAM = "catalogColumns"
final val CATALOG_CSV_PARAM = "catalogCSV"
final val LAZY_TILES_PARAM = "lazyTiles"

pyrasterframes/src/main/python/pyrasterframes/__init__.py Outdated Show resolved Hide resolved
pyrasterframes/src/main/python/pyrasterframes/__init__.py Outdated Show resolved Hide resolved
pyrasterframes/src/main/python/pyrasterframes/__init__.py Outdated Show resolved Hide resolved
pyrasterframes/src/main/python/tests/RasterSourceTest.py Outdated Show resolved Hide resolved
class RasterSourceTest(TestEnvironment):

@staticmethod
def path(scene, band):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should be promoted to TestEnvironment...

Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@vpipkt
Copy link
Member Author

vpipkt commented Sep 6, 2019

@metasim I made some changes to data source parameters, which hopefully simplifies things in the direction you were thinking.

I am less clear on the vision for synchronizing naming between scala and python. Happy to discuss

@metasim metasim merged commit 61d6b6e into locationtech:develop Sep 9, 2019
@vpipkt vpipkt deleted the feature/python-raster-reader-arg-refactor-and-docs branch October 1, 2019 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants