Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComplexWarning: Casting complex values to real discards the imaginary part #4655

Closed
SebastienDorgan opened this issue Dec 6, 2020 · 2 comments · Fixed by #7671
Closed

ComplexWarning: Casting complex values to real discards the imaginary part #4655

SebastienDorgan opened this issue Dec 6, 2020 · 2 comments · Fixed by #7671

Comments

@SebastienDorgan
Copy link

SebastienDorgan commented Dec 6, 2020

xarray version 0.16.2/ Python 3.8.5

When reading a dataset containing complex variables using Dataset.open_zarr method the following warning appears:
_/home/.../python3.8/site-packages/xarray/coding/variables.py:218: ComplexWarning: Casting complex values to real discards the imaginary part
And the imaginary part is effectively discarded which is not what I expected.
After a slightly more in-depth analysis I came across the function (xarray/coding/variables.py:226)

def _choose_float_dtype(dtype, has_offset):
    """Return a float dtype that can losslessly represent `dtype` values."""
    # Keep float32 as-is.  Upcast half-precision to single-precision,
    # because float16 is "intended for storage but not computation"
    if dtype.itemsize <= 4 and np.issubdtype(dtype, np.floating):
        return np.float32
    # float32 can exactly represent all integers up to 24 bits
    if dtype.itemsize <= 2 and np.issubdtype(dtype, np.integer):
        # A scale factor is entirely safe (vanishing into the mantissa),
        # but a large integer offset could lead to loss of precision.
        # Sensitivity analysis can be tricky, so we just use a float64
        # if there's any offset at all - better unoptimised than wrong!
        if not has_offset:
            return np.float32
    # For all other types and circumstances, we just use float64.
    # (safe because eg. complex numbers are not supported in NetCDF)
    return np.float64

For me, this behavior is strange, I find more natural to use the stored type rather than to make a systematic transformation into a float.
To test, I have modified the decode method (xarray/coding/variables.py:265)

   def decode(self, variable, name=None):
        dims, data, attrs, encoding = unpack_for_decoding(variable)

        if "scale_factor" in attrs or "add_offset" in attrs:
            scale_factor = pop_to(attrs, encoding, "scale_factor", name=name)
            add_offset = pop_to(attrs, encoding, "add_offset", name=name)
            # my change
            # dtype = _choose_float_dtype(data.dtype, "add_offset" in attrs)
            dtype = data.dtype
            if np.ndim(scale_factor) > 0:
                scale_factor = scale_factor.item()
            if np.ndim(add_offset) > 0:
                add_offset = add_offset.item()
            transform = partial(
                _scale_offset_decoding,
                scale_factor=scale_factor,
                add_offset=add_offset,
                dtype=dtype,
            )
            data = lazy_elemwise_func(data, transform, dtype)

        return Variable(dims, data, attrs, encoding)

and it is working as I expected.
If there is a good reason to keep things as they are, can you explain me how to deal with complex data without creating a new variable?

Thank you for your great job, xarray is awesome.

@SebastienDorgan
Copy link
Author

SebastienDorgan commented Dec 6, 2020

I finally found the source of my problem.
The data I read from the zarr store was initially stored in tiff files that I opened using the open_rasterio method. It seems that open_rasterio systematically adds the attributes scale_factor=1 and add_offset=0. By removing them everything work as I expected, so the problem would come from open_rasterio.

@dcherian
Copy link
Contributor

We've deleted the internal rasterio backend in favor of rioxarray. If this issue is still relevant, please migrate the discussion to the rioxarray repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants