Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproject before or after converting to vector #5

Closed
jordancaraballo opened this issue Feb 4, 2025 · 8 comments
Closed

Reproject before or after converting to vector #5

jordancaraballo opened this issue Feb 4, 2025 · 8 comments

Comments

@jordancaraballo
Copy link
Collaborator

The function to get coords is listed below. If we reproject the raster, then that will load the raster to memory (which for large rasters can be incredibly big running the node out of memory). However, we do have the option of reprojecting the vector. So for example, 1) read raster, 2) get bounds, and generate vector, 3) reproject that vector to 4326 then.

If these two reprojections are the same, which I think they are, I would do it in that order.

What do you all think?

def _getCoords(file):
    """
    Extract coordinates from a raster file and convert
    them to EPSG:4326 projection.

    Args:
        file (str): Path to the raster file.

    Returns:
        list: A list of coordinate pairs [longitude, latitude]
        defining the bounding box of the raster.

    Note: This function assumes the input file is a valid
    raster file readable by rioxarray.
    """
    # Open the raster file
    raster = rxr.open_rasterio(file)

    # Reproject the raster to EPSG:4326 (WGS84) coordinate system
    raster_proj = raster.rio.reproject("EPSG:4326")

    # Create a bounding box from the raster's extent
    poly = box(*raster_proj.rio.bounds())

    # Extract the coordinates of the bounding box
    # Convert each coordinate pair to [longitude, latitude] format
    coords = [[i[0], i[1]] for i in list(poly.exterior.coords)]

    return coords
@jordancaraballo
Copy link
Collaborator Author

@jli-99 @gtamkin @pahbs

@jordancaraballo
Copy link
Collaborator Author

Testing this theory with a 57GB WorldView 0.5 m raster.

PYTHONPATH=/explore/nobackup/people/jacaraba/development python view/ccdc_cli.py --gee_config /explore/nobackup/projects/ilab/gee/gee_config.json --footprint_file /explore/nobackup/projects/3sl/data/VHR/CAS/M1BS_pansharp/WV02_20211229_M1BS_10300100CB91C300-toa-sharpened.tif --output_path /explore/nobackup/projects/ilab/scratch/vhr-toolkit/ccdc-test

@jordancaraballo
Copy link
Collaborator Author

The code would change to:

from shapely.geometry import box
from shapely.ops import transform
from pyproj import Transformer, CRS

def _getCoords(file):
    """
    Extract coordinates from a raster file and convert
    them to EPSG:4326 projection.

    Args:
        file (str): Path to the raster file.

    Returns:
        list: A list of coordinate pairs [longitude, latitude]
        defining the bounding box of the raster.

    Note: This function assumes the input file is a valid
    raster file readable by rioxarray.
    """
    # Open the raster file
    raster = rxr.open_rasterio(file)

    # Create a bounding box from the raster's extent
    poly = box(*raster.rio.bounds())
    
    # Define the source and target CRS
    source_crs = CRS.from_epsg(CRS.from_epsg(raster.rio.crs.to_epsg()))  # WGS 84
    target_crs = CRS.from_epsg(4326)  # Web Mercator

    # Create a transformer
    transformer = Transformer.from_crs(source_crs, target_crs, always_xy=True)

    # Reproject the raster to EPSG:4326 (WGS84) coordinate system
    poly_reproj = transform(transformer.transform, poly)

    # Extract the coordinates of the bounding box
    # Convert each coordinate pair to [longitude, latitude] format
    coords = [[i[0], i[1]] for i in list(poly_reproj.exterior.coords)]

    return coords

@jordancaraballo
Copy link
Collaborator Author

jordancaraballo commented Feb 4, 2025

This saves an hour of compute time.

From 1.1 hour to 10 seconds.

@jordancaraballo
Copy link
Collaborator Author

After the comparison, files are identical. Going ahead and making the change for the release.

Singularity> cmp -s /explore/nobackup/projects/ilab/scratch/vhr-toolkit/ccdc-test/WV02_20211229_M1BS_10300100CB91C300-toa-sharpened_ccdc.tiff /explore/nobackup/projects/ilab/scratch/vhr-toolkit/ccdc-test/WV02_20211229_M1BS_10300100CB91C300-toa-sharpened_ccdc-reproject-raster.tiff && echo "Files are identical" || echo "Files are different"
Files are identical

@pahbs
Copy link

pahbs commented Feb 5, 2025 via email

jordancaraballo added a commit that referenced this issue Feb 5, 2025
@jordancaraballo
Copy link
Collaborator Author

Went ahead and did the hotfix.

To answer Paul questions: 1) total bounds of the strip, 2) no need to coarsen for CCDC, we just need the full strip to go from X EPSG to 4326 EPSG.

The solution was to generate a polygon of the strip coords, then reproject the polygon. Coords are the same and the process takes seconds.

@jordancaraballo
Copy link
Collaborator Author

Closing this issues since it is fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants