Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set default of mask_and_scale to True in open_rasterio #495

Open
gcaria opened this issue Apr 2, 2022 · 6 comments
Open

Set default of mask_and_scale to True in open_rasterio #495

gcaria opened this issue Apr 2, 2022 · 6 comments
Labels
proposal Idea for a new feature.

Comments

@gcaria
Copy link

gcaria commented Apr 2, 2022

In xarray.open_dataset, the optional arguments mask_and_scale defaults to True (doc), which is what I expect almost any user prefers.

I was surprised to find out that in rioxarray.open_rasterio the same argument defaults to False.

@gcaria gcaria added the proposal Idea for a new feature. label Apr 2, 2022
@gcaria gcaria changed the title Set default of mask_and_scale to True Set default of mask_and_scale to True in open_rasterio Apr 2, 2022
@snowman2
Copy link
Member

snowman2 commented Apr 4, 2022

It is that way because the mask_and_scale option doesn't exist in xarray.open_rasterio. Leaving it defaulted to False makes it less surprising for users who are transitioning to rioxarray.open_rasterio.

If the option is to change in the future, first a release of rioxarray with a runtime warning for users who are not changing mask_and_scale that the default behavior will change needs to be made.

@snowman2
Copy link
Member

snowman2 commented Apr 4, 2022

Related: #69 & #281

@gcaria
Copy link
Author

gcaria commented Apr 6, 2022

Thanks for the explanation! It looks like there are two contrasting needs here:

  1. ease of use for users transitioning from xarray.open_rasterio
  2. ease of use for users that are used to xarray.open_dataset

I guess the name of the function warrants priority to 1), and actually this is more of a xarray issue (the fact that theopen_rasterio and open_dataset have opposite defaults for mask_and_scale).

Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), is mask_and_scale=True by default in this case?

@snowman2
Copy link
Member

snowman2 commented Apr 6, 2022

Probably a more logical way of working for people in the 2) category (like myself) is to use xarray.open_dataset(..., engine='rasterio'), ismask_and_scale=True by default in this case?

Yes, it is.

@raphaelquast
Copy link

raphaelquast commented Apr 13, 2022

... not sure if this is the right place to mention this but since I just stumbled upon this issue while trying to avoid memory-overloads from unnecessary float-conversions while reading large GeoTIFFs, I thought I add a comment here.

It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )

import xarray as xar
with xar.open_dataset("path_to_GeoTIFF.tif") as raster:
    print(raster.band_data.dtype)  
    # >>> dtype('float32')
    print(raster.band_data.encoding)
    # >>> {'dtype': 'int16',
    # >>>  'scale_factor': 1.0,
    # >>>  'add_offset': 0.0,
    # >>>  '_FillValue': -9999.0,
    # >>>  'grid_mapping': 'spatial_ref',
    # >>>  'rasterio_dtype': 'int16'}

Is this intentional? If a dataset is actually not encoded, I'd prefer to avoid unnecessary float-conversion as much as possible...

@snowman2
Copy link
Member

It seems that at the moment using mask_and_scale=True always converts the data to float (even if add_offset=0. and scale_factor=1. )

Yes, this is required. It is due to the mask part. xarray uses NaN to represent nodata when masked. This requires the dtype to be a float.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Idea for a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants