-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow band indexing in RasterDataset #687
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a self.bands
attribute for this. Right now, it's only used when self.separate_files
, but we should be able to reuse this for the purpose you're using it for here. The only problem is that self.bands
is the string names, not the integer indices. Maybe we can translate from one to the other using self.all_bands
?
I did notice those attributes and don't think they can be reused.
Conceptually I see it as two different things. The current |
I think it's possible to reuse these. There's no reason that |
I think I understand. Do you mean something like this in the
|
Yes, something like that would be half of it. The other half is users need a way of specifying this when they instantiate a dataset ( |
So we have a If we go back to my previous example of creating a
|
Modification to class RasterDataset(GeoDataset):
"""Abstract base class for :class:`GeoDataset` stored as raster files."""
#: Glob expression used to search for files.
#:
#: This expression should be specific enough that it will not pick up files from
#: other datasets. It should not include a file extension, as the dataset may be in
#: a different file format than what it was originally downloaded as.
filename_glob = "*"
#: Regular expression used to extract date from filename.
#:
#: The expression should use named groups. The expression may contain any number of
#: groups. The following groups are specifically searched for by the base class:
#:
#: * ``date``: used to calculate ``mint`` and ``maxt`` for ``index`` insertion
#:
#: When :attr:`separate_files`` is True, the following additional groups are
#: searched for to find other files:
#:
#: * ``band``: replaced with requested band name
#: * ``resolution``: replaced with a glob character
filename_regex = ".*"
#: Date format string used to parse date from filename.
#:
#: Not used if :attr:`filename_regex` does not contain a ``date`` group.
date_format = "%Y%m%d"
#: True if dataset contains imagery, False if dataset contains mask
is_image = True
#: True if data is stored in a separate file for each band, else False.
separate_files = False
#: Names of all available bands in the dataset
all_bands: List[str] = []
#: Names of RGB bands in the dataset, used for plotting
rgb_bands: List[str] = []
#: Color map for the dataset, used for plotting
cmap: Dict[int, Tuple[int, int, int, int]] = {}
def __init__(
self,
root: str,
crs: Optional[CRS] = None,
res: Optional[float] = None,
+ bands: List[str] = [],
transforms: Optional[Callable[[Dict[str, Any]], Dict[str, Any]]] = None,
cache: bool = True,
) -> None:
"""Initialize a new Dataset instance.
Args:
root: root directory where dataset can be found
crs: :term:`coordinate reference system (CRS)` to warp to
(defaults to the CRS of the first file found)
res: resolution of the dataset in units of CRS
(defaults to the resolution of the first file found)
+ bands: list of band names to be used
transforms: a function/transform that takes an input sample
and returns a transformed version
cache: if True, cache file handle to speed up repeated sampling
Raises:
FileNotFoundError: if no files are found in ``root``
"""
super().__init__(transforms)
self.root = root
self.cache = cache
# Populate the dataset index
i = 0
pathname = os.path.join(root, "**", self.filename_glob)
filename_regex = re.compile(self.filename_regex, re.VERBOSE)
for filepath in glob.iglob(pathname, recursive=True):
match = re.match(filename_regex, os.path.basename(filepath))
if match is not None:
try:
with rasterio.open(filepath) as src:
# See if file has a color map
if len(self.cmap) == 0:
try:
self.cmap = src.colormap(1)
except ValueError:
pass
if crs is None:
crs = src.crs
if res is None:
res = src.res[0]
with WarpedVRT(src, crs=crs) as vrt:
minx, miny, maxx, maxy = vrt.bounds
except rasterio.errors.RasterioIOError:
# Skip files that rasterio is unable to read
continue
else:
mint: float = 0
maxt: float = sys.maxsize
if "date" in match.groupdict():
date = match.group("date")
mint, maxt = disambiguate_timestamp(date, self.date_format)
coords = (minx, maxx, miny, maxy, mint, maxt)
self.index.insert(i, coords, filepath)
i += 1
if i == 0:
raise FileNotFoundError(
f"No {self.__class__.__name__} data was found in '{root}'"
)
+ if not self.all_bands:
+ band_indexes = None
+ else:
+ if self.bands:
+ band_indexes = [self.all_bands.index(i) + 1 for i in self.bands]
+ assert len(band_indexes) == len(self.bands)
+ else:
+ band_indexes = None
+
+ if self.rgb_bands:
+ rgb_band_indexes = [self.all_bands.index(i) + 1 for i in self.rgb_bands]
+ assert len(rgb_band_indexes) == len(self.rgb_bands)
+
+ self.band_indexes = band_indexes
self._crs = cast(CRS, crs)
self.res = cast(float, res) |
Yes, that looks correct to me. You wouldn't even need to override |
Great. I'll push a PR tomorrow (once mypy is satisfied) and we can continue from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for taking so long to review this!
Looks like we can remove Sentinel2's |
Yep, let's remove Sentinel-2's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also add a dummy _verify()
method to the base class and get rid of the __init__
for most methods, but let's save that for another PR. There's also a bug where classes with no __init__
don't get any docs, but there's a sphinx setting to fix that I need to play around with.
Can you add unit tests for this? We'll want to test selecting a subset of bands for at least one separate = True
and one separate = False
dataset. Test just needs to make sure that the total number of bands returned actually changes.
Remind me to review this this weekend. |
Pinging @adamjstewart |
25d3e20
to
3b37dda
Compare
77ea1f1
to
528fcc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Couple minor formatting requests, but otherwise this looks ready to me.
* Allow band indexing * Add bands attribute to RasterDataset * Review comments#1 * Remove sentinel2 __init__ & fix landsat test * Add tests * Add test for coverage * Review comments#2 * Review comments#3 * Trigger build
RasterDataset
by default loads all bands. This PR allows us to be more selective.