Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readability of quality indicators bit unpacking #60

Closed
carloshorn opened this issue May 11, 2020 · 14 comments · Fixed by #72
Closed

Improve readability of quality indicators bit unpacking #60

carloshorn opened this issue May 11, 2020 · 14 comments · Fixed by #72

Comments

@carloshorn
Copy link
Collaborator

The current reader access to the quality indicators bit field, e.g.

pygac/pygac/pod_reader.py

Lines 482 to 487 in 2d7fbaf

def _get_corrupt_mask(self):
"""Get mask for corrupt scanlines."""
mask = ((self.scans["quality_indicators"] >> 31) |
((self.scans["quality_indicators"] << 4) >> 31) |
((self.scans["quality_indicators"] << 5) >> 31))
return mask.astype(bool)
should give more details in the code about the actual meaning of these bits.
See #55 (comment) for an implementation proposal.

@sfinkens
Copy link
Member

I agree that the current implementation is not well documented, but I still find it easier to read than the numpy magic involved in the proposed alternative. I'd be ok with just adding a nice docstring to the current method. Any other opinions?

@abhaydd
Copy link
Collaborator

abhaydd commented May 11, 2020

@carloshorn @sfinkens I agree with Stephan. We can just update the documentation about it. I saw what you suggested @carloshorn and I have to admit, it felt like a numpy magic trick to me as well ;-) But again, it is a matter of getting used to it, I guess.

@abhaydd
Copy link
Collaborator

abhaydd commented May 11, 2020

The thing is that when the developer sees the bitwise shifting in its current implementation, it is easy to relate it to the POD/KLM documentation. @carloshorn Do you see any reason the current implementation might break under certain circumstances? If so, then we should definitely fix it.

@sfinkens
Copy link
Member

I find the magic trick impressive, though 🙂

@carloshorn
Copy link
Collaborator Author

I prefer self explanatory code which does not require a relation to a documentation by giving all information through object names and comment lines if necessary. When I saw the selected example code for the first time, I was scared by the bit shifting and hard coded numbers. The second look revealed that it was the bit swiping for a 32 bit integer, but I would not say that this is obvious to everybody.
I agree that the numpy calls look again like magic, because the function np.unpackbits takes a contiguous uint8 array which requires some additional calls...
Another possible implementation would be a bit access function doing the shifting with an underlying dictionary that relates each bit to the corresponding name. Instead of the integer shifting, the code reader would only see a call like self.get_quality_indicator("FATAL_FLAG").
How about that?

@mraspaud
Copy link
Member

mraspaud commented May 11, 2020

Regarding unpackbits: the problem I see is that it will eat up 8 times more memory than the packed array, so I'd rather keep the bitshifting logic.
However, I agree that adding explicit values and comments for which flags we use is good.
I would even like to go as far as to allow passing arbitrary flags (with reasonable defaults) eg (not tested):

from enum import Enum
class QFlags(Enum):
    FATAL_FLAG = 0  # Data should not be used for product generation
    CALIBRATION = 4  # Insufficient data for calibration
    NO_EARTH_LOCATION = 5  # Earth location data not available

def _get_corrupt_mask(self, flags=[QFlags.FATAL_FLAG, QFlags.CALIBRATION, QFlags.NO_EARTH_LOCATION]): 
    """Get mask for corrupt scanlines.""" 
    mask = None
    for flag in flags:
        if mask is None:
            mask = (self.scans['quality_indicators'] << flag) >> 31
        else:
            mask |= (self.scans['quality_indicators'] << flag) >> 31
     return mask.astype(bool)

@mraspaud
Copy link
Member

Ah, I didn't see your last answer @carloshorn before I posted my answer. But it looks like we are thinking alike in the end :)

@mraspaud mraspaud reopened this May 11, 2020
@abhaydd
Copy link
Collaborator

abhaydd commented May 11, 2020

@carloshorn I see your good point about self-explanatory code. Makes sense. I like your last suggestion. Thanks.

@sfinkens
Copy link
Member

Sounds good!

@mraspaud
Copy link
Member

I actually though of a more efficient implementation last night:

from enum import IntFlag
class QFlag(IntFlag):
    FATAL_FLAG = 2**0  # Data should not be used for product generation
    CALIBRATION = 2**4  # Insufficient data for calibration
    NO_EARTH_LOCATION = 2**5  # Earth location data not available

def _get_corrupt_mask(self, flags=QFlag.FATAL_FLAG | QFlag.CALIBRATION | QFlag.NO_EARTH_LOCATION): 
    """Get mask for corrupt scanlines.

    *flags* is the ORed bitmask that defines corrupt values, by default it is:
        QFlag.FATAL_FLAG | QFlag.CALIBRATION | QFlag.NO_EARTH_LOCATION
    """ 
    return (self.scans['quality_indicators'] & flags.value).astype(bool)

This is more efficient because we perform only one boolean operation with the array.
But is this too magic ?

@abhaydd
Copy link
Collaborator

abhaydd commented May 13, 2020

That looks good as well. Thanks.

@carloshorn
Copy link
Collaborator Author

I like the use of IntFlag for the pod/klm specific bit mapping and I appreciate the re-usability of the method. A good candidate would be:

pygac/pygac/pod_reader.py

Lines 489 to 499 in 2d7fbaf

def get_qual_flags(self):
"""Read quality flags."""
number_of_scans = self.scans["telemetry"].shape[0]
qual_flags = np.zeros((int(number_of_scans), 7))
qual_flags[:, 0] = self.scans["scan_line_number"]
qual_flags[:, 1] = (self.scans["quality_indicators"] >> 31)
qual_flags[:, 2] = ((self.scans["quality_indicators"] << 4) >> 31)
qual_flags[:, 3] = ((self.scans["quality_indicators"] << 5) >> 31)
qual_flags[:, 4] = ((self.scans["quality_indicators"] << 13) >> 31)
qual_flags[:, 5] = ((self.scans["quality_indicators"] << 14) >> 31)
qual_flags[:, 6] = ((self.scans["quality_indicators"] << 15) >> 31)

To me it appears natural to move the generalised method to the parent class (Reader) and set QFlag as attribute in the children.

def _get_corrupt_mask(self, flags=None): 
    """Get mask for corrupt scanlines.

    *flags* is the ORed bitmask that defines corrupt values, by default it is:
        QFlag.FATAL_FLAG | QFlag.CALIBRATION | QFlag.NO_EARTH_LOCATION
    """ 
    if flags is None:
        flags = self.QFlag.FATAL_FLAG | self.QFlag.CALIBRATION | self.QFlag.NO_EARTH_LOCATION
    return (self.scans['quality_indicators'] & flags.value).astype(bool)

This is more efficient because we perform only one boolean operation with the array.

I agree, in detail, the efficiency results from not allocating "throwaway"-arrays.

@sfinkens
Copy link
Member

Excellent!

@mraspaud
Copy link
Member

Sounds good @carloshorn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants