-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor gac-run #55
Refactor gac-run #55
Conversation
… (l1c_factory.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for taking the time to submit this big refactoring. Many good things in here, I like especially the simplification of the main
functions.
However, I think the usage of the factory and builder classes are a bit misleading and not really necessary. Also don't forget to add unit tests for every new function and class that is added to the code.
pygac/utils.py
Outdated
file - path to file or file object | ||
""" | ||
close = True | ||
if is_gzip(file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use just a try/except instead of writing a function for this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, gzip.open
does not raise an exception immediately, but the following lines should do the job:
@contextmanager
def file_opener(file):
"""Open a file depending on the input.
Args:
file - path to file or file object
"""
# open file if necessary
if is_file_object(file):
open_file = file
close = False
else:
open_file = open(file, mode='rb')
close = True
# check if it is a gzip file
try:
file_object = gzip.open(open_file)
file_object.read(1)
except OSError:
file_object = open_file
finally:
file_object.seek(0)
# provide file_object with the context
try:
yield file_object
finally:
if close:
file_object.close()
pygac/klm_reader.py
Outdated
@@ -574,12 +574,11 @@ class KLMReader(Reader): | |||
|
|||
tsm_affected_intervals = TSM_AFFECTED_INTERVALS_KLM | |||
|
|||
def read(self, filename): | |||
def read(self, fileobj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the current API that other libs use (eg Satpy). Could we use the file opener in this function, but make it transparent if the file is already open ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected this change to be critical, but it's an easy step to fix it. The function file_opener
has no effect on open files, i.e. it does not close them.
pygac/reader.py
Outdated
@property | ||
def filename(self): | ||
"""Get the property 'filename'.""" | ||
return self.__filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this attribute needs to be initialized in __init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already initialized in the __init__
. I added the filename to the argument list and therefore moved it upwards to the initialization of the other given arguments.
Since you are using the Reader.read
method in satpy, I wanted to ask, if you keep a single reader instance and reuse it for may files, or if one reader instance represents one file.
In the former case, I would suggest to give the Reader.read
the optional argument fileobj
as seen in file_opener
and keep setting the filename attribute in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Should the filename attribute be initialized with the given filename, or would you prefer the variable Reader.head['data_set_name']
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe using 'data_set_name' would be more robust, right ?
Also, in the future, please try to submit smaller changes at the time. This PR is quite large but could have been split into multiple smaller ones (ie one feature at the time). It would have made the review process less time consuming... ;) |
Sorry for that lengthy PR, but it's hard to stop once started ;-) Thank you for taking the time. |
…aced utils.is_gzip by a try except clause.
…asses raise the new ReaderError in case it is unable to read the file. Some readers implement the new method _validate_header.
Hi @mraspaud,
Open questions/issues (not to be handled in this PR :-)):
Many thanks again for your time for the code review! |
Scanlines are masked based on the quality flags, e.g. Line 698 in 826941f
But I don't know what they mean... |
The pod guide Table 3.1.2.1-2. Format of quality indicators and klm guide Table 8.3.1.3.2.1-1. and following tables give the details about each bit. These tables should be mentioned in the doc string of Edit: def _get_corrupt_mask(self):
"""Get mask for corrupt scanlines.
Note
The quality indicator format is listed in
https://www1.ncdc.noaa.gov/pub/data/satellite/publications/
podguides/TIROS-N%20thru%20N-14/pdf/NCDCPOD3.pdf
Table 3.1.2.1-2. Format of quality indicators.
"""
BITS_PER_BYTE = 8
# quality indicator bit position
# Note the index numeration is inverted to Table 3.1.2.1-2.
# Format of quality indicators, because we skip the big
# endian uint conversion and directly read the bitstream
FATAL_FLAG = 0 # Data should not be used for product generation
CALIBRATION = 4 # Insufficient data for calibration
NO_EARTH_LOCATION = 5 # Earth location data not available
# need a contiguous array for memory view
quality_indicators = np.ascontiguousarray(
self.scans['quality_indicators'])
shape = (
quality_indicators.nbytes//quality_indicators.itemsize,
BITS_PER_BYTE*quality_indicators.itemsize
)
bits = np.unpackbits(
quality_indicators.view(dtype=np.ubyte)
).reshape(shape).astype(bool)
subset = [FATAL_FLAG, CALIBRATION, NO_EARTH_LOCATION]
mask = bits[:, subset].any(axis=1)
return mask The index locations of the quality indicator bits could also be stored in a dictionary in |
Nice, thanks for clarifying and immediately fixing it! If the |
I had a look into the interpolator and would recommend to add an optional argument for a mask to the interpolation functions. This argument should be passed to the def lat_lon_interpolator(lons_subset, lats_subset, cols_subset, cols_full, mask=None):
"""Interpolate lat-lon values in the AVHRR data."""
lines = lats_subset.shape[0]
rows_subset = np.arange(lines)
if mask is not None:
rows_subset = rows_subset[mask]
rows_full = np.arange(lines)
along_track_order = 1
cross_track_order = 3
satint = gtp.SatelliteInterpolator((lons_subset, lats_subset),
(rows_subset, cols_subset),
(rows_full, cols_full),
along_track_order,
cross_track_order)
return satint.interpolate() What do you think? This time, I maybe wait before implementing...
I will revert to 487660d and keep the commits for another PR. |
Yes, can we take this in an other PR ? |
I'd like to stop the addition of features on this one for now, so that we can make a final test and merge if possible |
As announced earlier, I reverted the bit unpacking commits and updated the author lists. |
@carloshorn ok, thanks. |
@mraspaud PR works ok for my testcase. |
Testing now |
So, a couple of files are longer than before. Was the consensus to use these extra scanlines? Apart from that, I see some differences in two files
See: https://public.cmsaf.dwd.de/data/sfinkens/pygac_l1c_factory |
@sfinkens aren't these just extra lines ? |
@mraspaud I don't think so, both orbits have the same length and if you look at |
Ah, but now I remember. This was because the extra lines are used for interpolation, right? |
Yes that's probably the case. |
So it looks like everyone is good with this, I'll be merging shortly. @carloshorn thank you very much for your sustained efforts and patience! That was an epic PR :) |
Yeah, big thumbs up @carloshorn! |
Nice, thank you all for the reviews. |
This PR should close #54 by introducing a the new module
l1c_factory
.This factory is used by the command line script pygac-run, which now is also able to process directories and tar archives.
The critical point is the change in the
Reader.read
method, which now takes an open file object as input. This change could lead to backwards incompatibility if someone uses this method in a custom script. If you see any risk, I could use theutils.file_opener
(where open files are passed through) again in the read method.