Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CHUNKSIZE into an index property #2658

Merged
merged 8 commits into from
Jan 22, 2021

Conversation

matthewturk
Copy link
Member

@matthewturk matthewturk commented Jun 20, 2020

Currently, the variable CHUNKSIZE lives at the top level of the particle_geometry_handler.py file. This should instead be an index attribute.

(Note that #2626 is a bit related to this.)

This also adds the ability to set it to 0 to indicate it's unlimited, although this may cause issues with some frontends; that needs a bit more testing.

This -- or something like it -- may be required to get the OWLSSubfindDataset class more easily used.

@matthewturk matthewturk added this to the 4.0 milestone Jun 20, 2020
@munkm munkm changed the base branch from yt-4.0 to master June 22, 2020 15:08
@matthewturk
Copy link
Member Author

@yt-fido test this please

@Xarthisius
Copy link
Member

@yt-fido test this please

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor suggestion: since the attribute is meant to accessed from outside the class, maybe it should either not have a leading underscore, or be accessed through a property. This second option is best if there's a reason to forbid the end user to change this attribute.

@matthewturk
Copy link
Member Author

well, it's kind of a "friend" attribute. So I was not completely sure what to do. I suppose it's OK to make it dunder or a property, though, to make sure, since changing it mid-stream can be problematic.

@neutrinoceros
Copy link
Member

well, it's kind of a "friend" attribute. So I was not completely sure what to do. I suppose it's OK to make it dunder or a property, though, to make sure, since changing it mid-stream can be problematic.

yes I think that's the closer it gets to defining friends in Python 👍

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome ! thanks for the refactor !

@matthewturk
Copy link
Member Author

I'm going to fix these conflicts and attempt again.

@matthewturk
Copy link
Member Author

/black

@neutrinoceros
Copy link
Member

Feel free to ping me when this is ready for a final review, I feel like it has already changed enough that I should go over it again at some point :-)

@matthewturk
Copy link
Member Author

I think it's ready

@neutrinoceros
Copy link
Member

Ok I'll go over it then.

This -- or something like it -- is required to get the OWLSSubfindDataset class working again, and to get #2437 in.

This is now a bit outdated I guess since the PR in question was in fact merged before this one. Could you explain why it is so and/or edit the description ?

@matthewturk
Copy link
Member Author

I actually can't remember why I thought it was necessary. I still think it's good though.

Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question before I read everything in detail: wouldn't it be worth having the chunk_size property defined in BaseIOHandler as

@property
def chunk_size(self):
    raise NotImplementedError("something something")

so as to keep uniformity in all subclasses ?

I also don't understand why it need to be defined in ParticleIndex as well, which looks unrelated to BaseIOHandler. Is there a way to define this set property in one base class only and make the other one get the value from there ?

@neutrinoceros
Copy link
Member

Looks like you're doing just that in some places already, e.g.

ind += self.ds.index.chunksize

So I think a clean pattern would be

# in ParticleIndex
@property
def chunk_size(self):
    raise NotImplementedError("blabla")

# in BaseIOHandler
@property
def chunk_size(self):
    return self.ds.index.chunk_size

and of course have the appropriate definitions in supported ParticleIndex's child classes

@neutrinoceros
Copy link
Member

Upon further reflection maybe it's even better to make ParticleIndex an abstract class instead of throwing a NotImplementedError

Base automatically changed from master to main January 20, 2021 15:27
@Xarthisius
Copy link
Member

@yt-fido test this please

@brittonsmith
Copy link
Member

brittonsmith commented Jan 21, 2021

@yt-fido test this please

Oops, didn't see this was just done.

@neutrinoceros
Copy link
Member

I take a bet that the errors on Jenkins are not reproducible.
@yt-fido test this please.

@Xarthisius
Copy link
Member

I take a bet that the errors on Jenkins are not reproducible.
@yt-fido test this please.

Did you actually try? I had no problem reproducing them...

xarth@shakuras ~/codes/xarthisius/yt (index_chunksize) $ python3.8 doc/source/cookbook/particle_filter.py
yt : [INFO     ] 2021-01-21 14:48:52,674 Parameters: current_time              = 20.100000000000044
yt : [INFO     ] 2021-01-21 14:48:52,679 Allocating for 3.154e+05 particles
Traceback (most recent call last):
  File "doc/source/cookbook/particle_filter.py", line 47, in <module>
    ds = yt.load(filename)
  File "/home/xarth/codes/xarthisius/yt/yt/loaders.py", line 95, in load
    return candidates[0](fn, *args, **kwargs)
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/tipsy/data_structures.py", line 103, in __init__
    super(TipsyDataset, self).__init__(
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/sph/data_structures.py", line 34, in __init__
    super(SPHDataset, self).__init__(
  File "/home/xarth/codes/xarthisius/yt/yt/data_objects/static_output.py", line 1873, in __init__
    super(ParticleDataset, self).__init__(
  File "/home/xarth/codes/xarthisius/yt/yt/data_objects/static_output.py", line 263, in __init__
    self._set_derived_attrs()
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/tipsy/data_structures.py", line 231, in _set_derived_attrs
    self.index
  File "/home/xarth/codes/xarthisius/yt/yt/data_objects/static_output.py", line 509, in index
    self._instantiated_index = self._index_class(
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 24, in __init__
    super(ParticleIndex, self).__init__(ds, dataset_type)
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/geometry_handler.py", line 44, in __init__
    self._detect_output_fields()
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 269, in _detect_output_fields
    pcounts = self._get_particle_type_counts()
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 38, in _get_particle_type_counts
    for df in self.data_files:
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 58, in data_files
    self._setup_filenames()
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 90, in _setup_filenames
    df = cls(self.dataset, self.io, template % {"num": i}, fi, (start, end))
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/tipsy/data_structures.py", line 23, in __init__
    io._update_domain(self)
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/tipsy/io.py", line 254, in _update_domain
    c = min(self.ds.index.chunksize, stop - ind)
  File "/home/xarth/codes/xarthisius/yt/yt/data_objects/static_output.py", line 509, in index
    self._instantiated_index = self._index_class(
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 25, in __init__
    self._initialize_index()
  File "/home/xarth/codes/xarthisius/yt/yt/frontends/sph/data_structures.py", line 89, in _initialize_index
    super(SPHParticleIndex, self)._initialize_index()
  File "/home/xarth/codes/xarthisius/yt/yt/geometry/particle_geometry_handler.py", line 155, in _initialize_index
    self.regions = ParticleBitmap(
  File "yt/geometry/particle_oct_container.pyx", line 477, in yt.geometry.particle_oct_container.ParticleBitmap.__init__
    self.left_edge[i] = left_edge[i]
TypeError: 'int' object is not subscriptable

@matthewturk
Copy link
Member Author

matthewturk commented Jan 21, 2021 via email

@matthewturk
Copy link
Member Author

Oddly enough on my branch (merged with main) this is not the specific traceback I get using load_sample! I suspect there's something with the kwargs.

@matthewturk
Copy link
Member Author

When run without load_sample, the bounds are not passed in to the constructor. The difference ends up being whether it searches for the left/right edges or not, and where the error ends up showing up as a result. In both cases, I believe, the left_edge is set somewhere as 0 (not as an array) and this propagates and causes the problem.

@matthewturk
Copy link
Member Author

OK, I tracked down the issue. Having chunksize hang off the index is problematic in some cases, because we need to call update_domain. But when that is called, the index is not ready yet.

I'm going to modify the code to use a hardcoded chunksize for the tipsy update_domain section.

@brittonsmith brittonsmith merged commit aeb4947 into yt-project:main Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants