Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[yt-4.0] ds bbox selection bug #2612

Closed
AshKelly opened this issue May 29, 2020 · 7 comments
Closed

[yt-4.0] ds bbox selection bug #2612

AshKelly opened this issue May 29, 2020 · 7 comments
Labels

Comments

@AshKelly
Copy link
Member

AshKelly commented May 29, 2020

Bug report

Bug summary

Building particle index with bounding box and then loading without leads to strange particle selection. After re-loading the particle count in a region is different

import numpy as np
import yt

n_ref = 32
fname = "../test_data/GadgetDiskGalaxy/snapshot_200.hdf5"

left, right = (
    np.array([28007.81291869, 22181.34092477, 28461.73994773]),
    np.array([33147.21291312, 33320.7409192 , 33601.13994216]))

# Test a GadgetDiskGalaxy snapshot
bbox = [
    [-200_000, 200_000],
    [-200_000, 200_000],
    [-200_000, 200_000],
]
ds = yt.load(fname, bounding_box=bbox)
cg = ds.arbitrary_grid(left, right, [1, 1, 1])
print(np.sum(cg[("deposit", "PartType1_count")]))

ds = yt.load(fname)
cg = ds.arbitrary_grid(left, right, [1, 1, 1])
print(np.sum(cg[("deposit", "PartType1_count")]))

Actual outcome

yt : [INFO     ] 2020-05-29 14:53:15,487 Calculating time from 3.448e-01 to be 1.108e+17 seconds
yt : [INFO     ] 2020-05-29 14:53:15,489 Assuming length units are in kpc/h (comoving)
yt : [INFO     ] 2020-05-29 14:53:15,636 Parameters: current_time              = 1.1075810732534835e+17 s
yt : [INFO     ] 2020-05-29 14:53:15,637 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2020-05-29 14:53:15,637 Parameters: domain_left_edge          = [-200000. -200000. -200000.]
yt : [INFO     ] 2020-05-29 14:53:15,638 Parameters: domain_right_edge         = [200000. 200000. 200000.]
yt : [INFO     ] 2020-05-29 14:53:15,638 Parameters: cosmological_simulation   = 1
yt : [INFO     ] 2020-05-29 14:53:15,638 Parameters: current_redshift          = 1.8999965286929705
yt : [INFO     ] 2020-05-29 14:53:15,639 Parameters: omega_lambda              = 0.721
yt : [INFO     ] 2020-05-29 14:53:15,639 Parameters: omega_matter              = 0.279
yt : [INFO     ] 2020-05-29 14:53:15,639 Parameters: omega_radiation           = 0.0
yt : [INFO     ] 2020-05-29 14:53:15,639 Parameters: hubble_constant           = 0.7
yt : [INFO     ] 2020-05-29 14:53:15,688 Allocating for 1.191e+07 particles
Initializing coarse index : 100%|████████████████████████████████████████████| 19/19 [00:02<00:00,  9.24it/s]
Initializing refined index: 100%|████████████████████████████████████████████| 19/19 [00:19<00:00,  0.98it/s]
1898397.0 dimensionless
yt : [INFO     ] 2020-05-29 14:53:40,510 Calculating time from 3.448e-01 to be 1.108e+17 seconds
yt : [INFO     ] 2020-05-29 14:53:40,512 Assuming length units are in kpc/h (comoving)
yt : [INFO     ] 2020-05-29 14:53:40,607 Parameters: current_time              = 1.1075810732534835e+17 s
yt : [INFO     ] 2020-05-29 14:53:40,607 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2020-05-29 14:53:40,607 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2020-05-29 14:53:40,607 Parameters: domain_right_edge         = [64000. 64000. 64000.]
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: cosmological_simulation   = 1
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: current_redshift          = 1.8999965286929705
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: omega_lambda              = 0.721
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: omega_matter              = 0.279
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: omega_radiation           = 0.0
yt : [INFO     ] 2020-05-29 14:53:40,608 Parameters: hubble_constant           = 0.7
yt : [INFO     ] 2020-05-29 14:53:40,641 Allocating for 1.191e+07 particles
Loading particle index: 100%|██████████████████████████████████████████████| 19/19 [00:00<00:00, 2930.06it/s]
78949.0 dimensionless

The two numbers should be equal

@triage-new-issues triage-new-issues bot added the triage Triage needed label May 29, 2020
@munkm munkm added the bug label May 29, 2020
@triage-new-issues triage-new-issues bot removed the triage Triage needed label May 29, 2020
@AshKelly
Copy link
Member Author

Prior to this commit
https://github.com/AshKelly/yt/commit/68068dd0212b3cb5bad2fde085ddc8be4cff4a33#diff-1ae235e0f5c43b8e79463c30ab481aef

These numbers are equal. But I think that commit was correct and this is a different bug!

@AshKelly
Copy link
Member Author

I think the case might be that the particle index is bounding box specific and thus the bounding box could be incorporated into the hash, so it will gen a new particle index with loaded with a new bbox.

Otherwise, maybe the loading bounding box should be ignored during index creation and applied afterwards?

The particle index is not my area of expertise

@AshKelly
Copy link
Member Author

So, after the merge of #2618 the above script works. However, I don't think #2618 has completely solved the problem.

In the following test script I test the chunk loading with a bbox and without one and I select a specific range,

import numpy as np
import yt

n_ref = 32
fname = "../test_data/GadgetDiskGalaxy/snapshot_200.hdf5"

ptypes = ['PartType1']
left, right = (
    np.array([28007.81291869, 22181.34092477, 28461.73994773]),
    np.array([33147.21291312, 33320.7409192 , 33601.13994216]))

# Test a GadgetDiskGalaxy snapshot
bbox1 = [
    [left[0], right[0]],
    [left[1], right[1]],
    [left[2], right[2]],
]
bbox2 = None


for bbox in [bbox1, bbox2]:
    print("bbox:", bbox)
    ds = yt.load(fname, bounding_box=bbox)

    center = None
    _data_source = ds.region(center, left, right)

    positions = []
    for ptype in ptypes:
        field = (ptype, 'particle_position')
        for chunk in _data_source.chunks([field], "io"):
            positions.append(
                chunk[field].to("code_length").d
            )
    positions = np.concatenate(positions)
    print("Total num particles:", 4786616)
    print("Loaded:", positions.shape)

    mask = (
        (positions[:, 0] > left[0]) &
        (positions[:, 0] < right[0]) &
        (positions[:, 1] > left[1]) &
        (positions[:, 1] < right[1]) &
        (positions[:, 2] > left[2]) &
        (positions[:, 2] < right[2])
    )
    print("In region:", np.sum(mask))

which outputs:

bbox: [[28007.81291869, 33147.21291312], [22181.34092477, 33320.7409192], [28461.73994773, 33601.13994216]]
yt : [INFO     ] 2020-06-10 19:46:06,853 Calculating time from 3.448e-01 to be 1.108e+17 seconds
yt : [INFO     ] 2020-06-10 19:46:06,854 Assuming length units are in kpc/h (comoving)
yt : [INFO     ] 2020-06-10 19:46:06,926 Parameters: current_time              = 1.1075810732534835e+17 s
yt : [INFO     ] 2020-06-10 19:46:06,926 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2020-06-10 19:46:06,926 Parameters: domain_left_edge          = [28007.81291869 22181.34092477 28461.73994773]
yt : [INFO     ] 2020-06-10 19:46:06,926 Parameters: domain_right_edge         = [33147.21291312 33320.7409192  33601.13994216]
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: cosmological_simulation   = 1
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: current_redshift          = 1.8999965286929705
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: omega_lambda              = 0.721
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: omega_matter              = 0.279
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: omega_radiation           = 0.0
yt : [INFO     ] 2020-06-10 19:46:06,927 Parameters: hubble_constant           = 0.7
yt : [INFO     ] 2020-06-10 19:46:06,944 Allocating for 1.191e+07 particles
Initializing coarse index : 100%|███████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 19.03it/s]
Initializing refined index: 100%|███████████████████████████████████████████████████████████████████████████████████████| 19/19 [01:28<00:00,  0.70it/s]
Total num particles: 4786616
Loaded: (4786616, 3)
In region: 1898397
bbox: None
yt : [INFO     ] 2020-06-10 19:47:40,514 Calculating time from 3.448e-01 to be 1.108e+17 seconds
yt : [INFO     ] 2020-06-10 19:47:40,515 Assuming length units are in kpc/h (comoving)
yt : [INFO     ] 2020-06-10 19:47:40,576 Parameters: current_time              = 1.1075810732534835e+17 s
yt : [INFO     ] 2020-06-10 19:47:40,576 Parameters: domain_dimensions         = [1 1 1]
yt : [INFO     ] 2020-06-10 19:47:40,576 Parameters: domain_left_edge          = [0. 0. 0.]
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: domain_right_edge         = [64000. 64000. 64000.]
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: cosmological_simulation   = 1
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: current_redshift          = 1.8999965286929705
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: omega_lambda              = 0.721
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: omega_matter              = 0.279
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: omega_radiation           = 0.0
yt : [INFO     ] 2020-06-10 19:47:40,577 Parameters: hubble_constant           = 0.7
yt : [INFO     ] 2020-06-10 19:47:40,601 Allocating for 1.191e+07 particles
Initializing coarse index : 100%|███████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 27.56it/s]
Initializing refined index: 100%|███████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:31<00:00,  1.31it/s]
Total num particles: 4786616
Loaded: (1898397, 3)
In region: 1898397

The summary is: yt correctly builds a particle index and instead of selecting zero particles, it now select every particle. But, if I remove the bounding box, then it only selects particles in the range I specified.

@AshKelly
Copy link
Member Author

You can see without a bbox the yt chunking only returns 1898397 particles, which is the correct amount in that region. However, with a bbox yt just returns every particle in the simulation.

This is why the covering_grid and octree work with the patch, because they get given every particle and discard the ones they don't need.

Do you have any ideas what is happening here @matthewturk @munkm - I'm lost!

@matthewturk
Copy link
Member

So I think that what might be happening is that there's an asymmetry in loading vs caching. Without the fix in #2624 , it's possible to load a cache if we have generated one before. So with a bbox it will use the existing EWAH index on disk.

@AshKelly
Copy link
Member Author

In that case- I don't think I appreciate the difference between loading and caching...

@neutrinoceros
Copy link
Member

Looks like we forgot to close this when we merged #2634
Please reopen if I'm misinterpreted the situation !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants