-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python raster reader argument refactor #329
Python raster reader argument refactor #329
Conversation
Signed-off-by: Jason T. Brown <jason@astraea.earth>
…reader-arg-refactor
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
…on-raster-reader-arg-refactor Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome step forward in usability. Thank you for taking this on.
I suggest going back through the Scala API and thinking about how we can synchronize the names so we can maintain parity in naming (and functionality). As much as we can get away with, I'd like there to be a more or less mechanical translation between the Python API and Scala API, which starts with consistent names.
rasterframes/datasource/src/main/scala/org/locationtech/rasterframes/datasource/raster/package.scala
Lines 61 to 90 in ba3d30d
def fromCatalog(catalog: DataFrame, bandColumnNames: String*): RasterSourceDataFrameReader = | |
tag[RasterSourceDataFrameReaderTag][DataFrameReader] { | |
val tmpName = tmpTableName() | |
catalog.createOrReplaceTempView(tmpName) | |
reader | |
.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tmpName) | |
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(",")): DataFrameReader | |
} | |
def fromCatalog(tableName: String, bandColumnNames: String*): RasterSourceDataFrameReader = | |
tag[RasterSourceDataFrameReaderTag][DataFrameReader]( | |
reader.option(RasterSourceDataSource.CATALOG_TABLE_PARAM, tableName) | |
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(",")) | |
) | |
def fromCSV(catalogCSV: String, bandColumnNames: String*): RasterSourceDataFrameReader = | |
tag[RasterSourceDataFrameReaderTag][DataFrameReader]( | |
reader.option(RasterSourceDataSource.CATALOG_CSV_PARAM, catalogCSV) | |
.option(RasterSourceDataSource.CATALOG_TABLE_COLS_PARAM, bandColumnNames.mkString(",")) | |
) | |
def from(newlineDelimPaths: String): RasterSourceDataFrameReader = | |
tag[RasterSourceDataFrameReaderTag][DataFrameReader]( | |
reader.option(RasterSourceDataSource.PATHS_PARAM, newlineDelimPaths) | |
) | |
def from(paths: Seq[String]): RasterSourceDataFrameReader = | |
from(paths.mkString("\n")) | |
def from(uris: Seq[URI])(implicit d: DummyImplicit): RasterSourceDataFrameReader = |
Where in Python we have to do type inspection, in Scala we can use method overloading.
Also consider revising option key names, as appropraite:
Lines 44 to 52 in ba3d30d
final val SHORT_NAME = "raster" | |
final val PATH_PARAM = "path" | |
final val PATHS_PARAM = "paths" | |
final val BAND_INDEXES_PARAM = "bandIndexes" | |
final val TILE_DIMS_PARAM = "tileDimensions" | |
final val CATALOG_TABLE_PARAM = "catalogTable" | |
final val CATALOG_TABLE_COLS_PARAM = "catalogColumns" | |
final val CATALOG_CSV_PARAM = "catalogCSV" | |
final val LAZY_TILES_PARAM = "lazyTiles" |
class RasterSourceTest(TestEnvironment): | ||
|
||
@staticmethod | ||
def path(scene, band): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this should be promoted to TestEnvironment
...
…on-raster-reader-arg-refactor-and-docs
…reader-arg-refactor-and-docs
Signed-off-by: Jason T. Brown <jason@astraea.earth>
Signed-off-by: Jason T. Brown <jason@astraea.earth>
@metasim I made some changes to data source parameters, which hopefully simplifies things in the direction you were thinking. I am less clear on the vision for synchronizing naming between scala and python. Happy to discuss |
Initial argument for
spark.read.raster
is calledsource
and can now be a str, list of str, list of list of str, pandas DF catalog or spark DF catalog.Add docstring to this important entry point for users.
Simplifies documentation pages to remove unneeded
catalog
named argument.Expands unit test in python to all claimed use cases of
source
.