Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle IDs are converted to floats although saved as unsigned integers #517

Closed
arkordt opened this issue Sep 19, 2024 · 2 comments
Closed

Comments

@arkordt
Copy link

arkordt commented Sep 19, 2024

  • unyt version: 3.0.2
  • Python version: 3.12.6
  • Operating System: macOS

Description

I was working with an arepo snapshot and noticed that the ParticleIDs field is of type float64 although it is saved as uint64 in the hdf5 file. Originally suspecting a data type conversion in yt, it turns out that the conversion is done in unyt. Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

In the current yt main branch version (66ddd0eb1), the unit conversion occurs in yt/data_objects/selection_objects/data_selection_objects.py:217. It calls convert_to_unit on a YTArray and in unyt/array.py:712, there is an explicit conversion of all integer-like types to float32 or float64 (depending on the number of particles). There is an additional conversion of all values to float64 in yt (see here), too, but even commenting out the conversion part, IDs are returned as floats.

Although this bug can be traced back to unyt, I am not sure if it may be better to fix it in yt, e.g. by skipping the unit conversion for certain dimensionless quantities.

What I Did

I was using my own arepo snapshot but it is the same for the yt test data. Tracing back the location where the conversion occurs was done using pdb.

  • yt version: 4.3 or current main branch
>>> import yt
>>> ds = yt.load('yt-test-data/ArepoBullet/snapshot_150.hdf5')
>>> ds.all_data()['PartType0', 'ParticleIDs']
unyt_array([1710352., 1710346., 1710350., ..., 5721056., 5843308., 5843259.],
      dtype=float32, units='(dimensionless)')
@chrishavlin
Copy link
Contributor

Here's a simpler unyt-only illustration:

import unyt 
x = unyt.unyt_array([1, 2], "1", dtype='int')
print(x, x.dtype)
y = x.to("1")
print(y, y.dtype)
[1 2] dimensionless int64
[1. 2.] dimensionless float64

But this is actually the currently expected behavior (see https://unyt.readthedocs.io/en/stable/usage.html#dealing-with-data-types ).

Most physical quantities may be represented by floating point numbers but this is not appropriate for ID variables.

I think that this point (which I agree with) is more of a yt issue. Maybe particle IDs need to be handled with a special case since they are not physical quantities.

@neutrinoceros
Copy link
Member

Indeed I don't think there's anything to fix on unyt's side, but it should be addressed in yt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants