Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add script to apply ESA quality mask #15

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ceholden
Copy link
Collaborator

What I'm changing

This PR is intended to address this ticket, NASA-IMPACT/hls-sentinel#155

This PR replaces the original PR (#14) that I submitted from a fork because my forked branch wasn't triggering CI tests.

Specifically this is intended to replace a utility written in C to apply a binary quality mask to Sentinel-2 L1C data that masks lost or degraded MSI pixel values. The original PR was, NASA-IMPACT/hls-sentinel#152

This utility script is intended to fit into the rest of the HLS Sentinel-2 related processing. To apply the script we would need to do a few extra steps,

  1. Merge this PR and create a new release in Github to point to (e.g., hls-utilities@v1.10)
  2. Update the installation of hls-utilities in the hls-sentinel container to include this update (https://github.com/NASA-IMPACT/hls-sentinel/blob/20180e0d0cd286ffefe61bd6aeeaeff056874102/Dockerfile#L123)
  3. Update the hls-sentinel processing steps to include running this script as part of sentinel_granule.sh (e.g., around here https://github.com/NASA-IMPACT/hls-sentinel/blob/20180e0d0cd286ffefe61bd6aeeaeff056874102/scripts/sentinel_granule.sh#L44)
  4. Test
  5. Update the hls-sentinel image used in our production pipelines by tagging our new container appropriately (i.e., "latest")

How I did it

The ESA quality mask is an 8 band single bit image that encodes 0/1 for 8 quality related attributes. The bands we're interested in are bands 3 & 4 (counting from 1 in GDAL notation) which encode if the pixel retrieval was either lost or degraded.

I followed the original C code implementation with a few modifications...

  1. Finds the mask + image pairs
    • The original C code will exit 1 if not all of the images have been found, or would exit 0 if the granule doesn't have the expected image/mask pair (e.g., prior to baseline 04.00).
    • I wanted to ensure we don't hard fail if we can't apply the mask, and favored doing a "no-op" if the granule file structure isn't as we expect
  2. Reads the 3rd and 4th bands of the mask
    • In the original implementation the JPEG2000 is first converted to an ENVI file, a binary in "band sequential" (BSQ) format. The C code reads all bands for the entire image and fills the imagery band if EITHER mask band 3 or 4 have a value of 1 for a pixel row/column.
    • In our implementation we can use GDAL to only read the 3rd and 4th bands
  3. Updates the imagery in-place by assigning HLS_REFL_FILLVAL (-9999) to any pixel where the mask was TRUE for either band.
    • The -9999 value is not suitable for us because the original L1C images in JPEG2000 format use 0 as the nodata value (since -9999 is out of data type range)
    • The 0 value is used as a no data value for L1C imagery and this is noted in the metadata
    • I see in the hls-sentinel that some utilities modify in place while others create new data. If it'd be useful for a "debug mode" it would be easy to rewrite the images to new files

How you can test it

Install the test dependencies and run unit tests for this script,

pip install -e .[test]
pytest tests/test_apply_s2_quality_mask.py

The unit tests are what helped me discover that JPEG2000 driver will apply lossy compression when we update the file (r+ access mode) even though the original file had lossless compression in creation options 🙃

I ran this for one of the granules mentioned in the original C code PR and recorded a tiny demo,
2025-01-02 16 45 12
Note the "inspect tool" that shows the imagery values on the right hand side. The S2 L1C imagery don't have a No Data Value set, so QGIS shows the 0 as valid pixels. Still we can see the individual pixel values are correct (0 instead of ~1,000) and the image is stretched differently because of the different minimum values.

@ceholden ceholden changed the title Ceh/apply esa quality mask feat: add script to apply ESA quality mask Jan 14, 2025
@ceholden ceholden marked this pull request as ready for review January 14, 2025 16:42
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic!

Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying the type hints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants