-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AHF halo ID fix #88
AHF halo ID fix #88
Changes from 17 commits
0f26bfd
37e4626
3179077
ee51d3a
c29c405
4502ee1
134b570
5299a33
92fed69
2f678a9
317de9d
fba582b
c344e8e
ea7c175
1d55bfb
c7547fc
c62b698
1030ac3
d873acc
355e5df
8ff003a
f8eb973
aacf2ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ class Halo(Base): | |
id = Column(Integer, primary_key=True) | ||
halo_number = Column(Integer) | ||
finder_id = Column(Integer) | ||
catalog_index = Column(Integer) | ||
timestep_id = Column(Integer, ForeignKey('timesteps.id')) | ||
timestep = relationship(TimeStep, backref=backref( | ||
'objects', order_by=halo_number, cascade='all', lazy='dynamic'), cascade='save-update, merge') | ||
|
@@ -72,10 +73,11 @@ def object_typetag_from_code(typecode): | |
return c.tag | ||
raise ValueError("Unknown object typecode %d",typecode) | ||
|
||
def __init__(self, timestep, halo_number, finder_id, NDM, NStar, NGas, object_typecode=0): | ||
def __init__(self, timestep, halo_number, finder_id, catalog_index, NDM, NStar, NGas, object_typecode=0): | ||
self.timestep = timestep | ||
self.halo_number = int(halo_number) | ||
self.finder_id = int(finder_id) | ||
self.catalog_index = int(catalog_index) | ||
self.NDM = int(NDM) | ||
self.NStar = int(NStar) | ||
self.NGas = int(NGas) | ||
|
@@ -120,12 +122,23 @@ def load(self, mode=None): | |
handler this can be None or 'partial' (in a normal session) and, when running inside an MPI session, | ||
'server' or 'server-partial'. See https://pynbody.github.io/tangos/mpi.html. | ||
""" | ||
if self.finder_id is None: | ||
halo_number = self.halo_number | ||
if not hasattr(self, "finder_id"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why has this line changed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This like is basically still there, but my thought was to first check to see if the object even has finder_id and catalog_index columns to begin with for backwards compatibility. Maybe this is all overkill? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've now simplified this. The previous checks were deprecated anyway I think |
||
finder_id = self.halo_number # backward compatibility | ||
else: | ||
finder_id = self.finder_id | ||
if self.finder_id is None: | ||
finder_id = self.halo_number | ||
else: | ||
finder_id = self.finder_id | ||
if not hasattr(self, "catalog_index"): | ||
catalog_index = self.finder_id | ||
else: | ||
if self.catalog_index is None: | ||
catalog_index = finder_id | ||
else: | ||
catalog_index = self.catalog_index | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic is a bit baffling, is it possible to add a comment explaining what is going on? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is really just an attempt at backwards compatibility with a database that does not have these columns already. Maybe this is all too much and it would be better simply provide a tool for the user to update their database accordingly? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think simple is better here. I believe sqlalchemy takes care of adding a column of NULLs (will appear as Nones) when opening an old database. Certainly, I don't think the attribute will ever be missing |
||
|
||
return self.handler.load_object(self.timestep.extension, finder_id, object_typetag=self.tag, mode=mode) | ||
return self.handler.load_object(self.timestep.extension, finder_id, catalog_index, object_typetag=self.tag, mode=mode) | ||
|
||
def calculate(self, calculation, return_description=False): | ||
"""Use the live-calculation system to calculate a user-specified function of the stored data. | ||
|
@@ -390,7 +403,7 @@ class Tracker(Halo): | |
tag = "tracker" | ||
|
||
def __init__(self, timestep, halo_number): | ||
super(Tracker, self).__init__(timestep, halo_number, halo_number, 0,0,0, | ||
super(Tracker, self).__init__(timestep, halo_number, halo_number, halo_number, 0,0,0, | ||
self.__mapper_args__['polymorphic_identity']) | ||
|
||
@property | ||
|
@@ -449,7 +462,7 @@ class PhantomHalo(Halo): | |
tag = "phantom" | ||
|
||
def __init__(self, timestep, halo_number, finder_id): | ||
super(PhantomHalo, self).__init__(timestep, halo_number, finder_id, 0,0,0, | ||
super(PhantomHalo, self).__init__(timestep, halo_number, finder_id, finder_id, 0,0,0, | ||
self.__mapper_args__['polymorphic_identity']) | ||
|
||
TimeStep.phantoms = orm.relationship(Halo, cascade='all', lazy='dynamic', | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,8 @@ | |
|
||
class HaloStatFile(object): | ||
"""Manages and reads a halo stat file of unspecified format.""" | ||
_id_offset = 0 | ||
_catalog_index_offset = 0 | ||
# _id_from_file_pos = False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this commented out? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should probably just be deleted. The old version had a flag whether the finder_id was derived from the position within the catalog file, but now finder_id and catalog_index are both saved separately |
||
_column_translations = {} | ||
|
||
def __new__(cls, timestep): | ||
|
@@ -56,22 +57,29 @@ def all_columns(self): | |
def iter_rows_raw(self, *args): | ||
""" | ||
Yield the halo ID and requested columns from each line of the stat file, without any emulation. | ||
|
||
If _id_from_file_pos is True, the ID is not taken from the stat file, but rather from the order in which | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to update this description because this is inaccurate |
||
the halos appear. | ||
:param args: strings for the column names | ||
:return: id, arg1, arg2, arg3 where ID is the halo ID and argN is the value of the Nth named column | ||
""" | ||
with open(self.filename) as f: | ||
header = self._read_column_names(f) | ||
cnt = 0 | ||
ids = [0] | ||
for a in args: | ||
try: | ||
ids.append(header.index(a)) | ||
except ValueError: | ||
ids.append(None) | ||
|
||
for l in f: | ||
if not l.startswith("#"): | ||
yield self._get_values_for_columns(ids, l) | ||
col_data = self._get_values_for_columns(ids, l) | ||
col_data.insert(0, cnt+self._catalog_index_offset) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit confused now - shouldn't something like this be conditional on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that's because the docstring is not accurate! I will fix this. Now iter_rows always returns the catalog index (+ any offset set by the class) AND the raw finder ID number read directly from the file, followed by all of the requested quantities. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've updated the docstring for the iter_rows_raw function. Hopefully it is more clear |
||
#if self._id_from_file_pos: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's this commented-out stuff about? |
||
# col_data[0] = cnt | ||
#col_data[0] += self._id_offset | ||
yield col_data | ||
cnt += 1 | ||
|
||
def iter_rows(self, *args): | ||
""" | ||
|
@@ -90,19 +98,18 @@ def iter_rows(self, *args): | |
raw_args+=self._column_translations[arg].inputs() | ||
else: | ||
raw_args.append(arg) | ||
|
||
for raw_values in self.iter_rows_raw(*raw_args): | ||
values = [raw_values[0]] | ||
values = [raw_values[0], raw_values[1]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused about what is happening here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the returned values are now always catalog_index, finder_id, then all of the asked-for values |
||
for arg in args: | ||
if arg in self._column_translations: | ||
values.append(self._column_translations[arg](raw_args, raw_values[1:])) | ||
values.append(self._column_translations[arg](raw_args, raw_values[2:])) | ||
else: | ||
values.append(raw_values[1:][raw_args.index(arg)]) | ||
values.append(raw_values[2:][raw_args.index(arg)]) | ||
yield values | ||
|
||
def read(self, *args): | ||
"""Read the halo ID and requested columns from the entire file, returning each column as a separate array""" | ||
return_values = [[] for _ in range(len(args)+1)] | ||
return_values = [[] for _ in range(len(args)+2)] | ||
for row in self.iter_rows(*args): | ||
for return_array, value in zip(return_values, row): | ||
return_array.append(value) | ||
|
@@ -128,7 +135,7 @@ def _get_values_for_columns(self, columns, line): | |
this_cast = this_str | ||
|
||
results.append(this_cast) | ||
results[0] += self._id_offset | ||
#results[0] += self._id_offset | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove commented-out code if it's definitely not needed |
||
return results | ||
|
||
def _read_column_names(self, f): | ||
|
@@ -138,14 +145,15 @@ def _read_column_names(self, f): | |
|
||
|
||
class AHFStatFile(HaloStatFile): | ||
_id_offset = 1 | ||
_catalog_index_offset = 1 | ||
# _id_from_file_pos = True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Confused that this is commented out again. Is |
||
|
||
_column_translations = {'n_gas': translations.DefaultValue('n_gas', 0), | ||
'n_star': translations.DefaultValue('n_star', 0), | ||
'n_dm': translations.Function(lambda ngas, nstar, npart: npart - (ngas or 0) - (nstar or 0), | ||
'n_gas', 'n_star', 'npart'), | ||
'hostHalo': translations.Function( | ||
lambda id: None if id==-1 else proxy_object.IncompleteProxyObjectFromFinderId(id+AHFStatFile._id_offset, 'halo'), | ||
lambda id: None if id==-1 else proxy_object.IncompleteProxyObjectFromFinderId(id, 'halo'), | ||
'hostHalo')} | ||
|
||
def __init__(self, timestep_filename): | ||
|
@@ -169,26 +177,24 @@ def filename(cls, timestep_filename): | |
return "CannotFindAHFHaloFilename" | ||
else: | ||
return file_list[0] | ||
return file | ||
|
||
def _calculate_children(self): | ||
# use hostHalo column to calculate virtual childHalo entries | ||
self._children_map = {} | ||
for h_id, host_id_raw in self.iter_rows_raw("hostHalo"): | ||
if host_id_raw!=-1: | ||
host_id = host_id_raw + self._id_offset | ||
cmap = self._children_map.get(host_id, []) | ||
cmap.append(proxy_object.IncompleteProxyObjectFromFinderId(h_id,'halo')) | ||
self._children_map[host_id] = cmap | ||
for c_id, f_id, host_f_id in self.iter_rows_raw("hostHalo"): | ||
if host_f_id!=-1: | ||
cmap = self._children_map.get(host_f_id, []) | ||
cmap.append(proxy_object.IncompleteProxyObjectFromFinderId(f_id,'halo')) | ||
self._children_map[host_f_id] = cmap | ||
|
||
def _calculate_children_if_required(self): | ||
if not hasattr(self, "_children_map"): | ||
self._calculate_children() | ||
|
||
def _child_halo_entry(self, this_id_raw): | ||
self._calculate_children_if_required() | ||
this_id = this_id_raw + self._id_offset | ||
children = self._children_map.get(this_id, []) | ||
#this_id = this_id_raw + self._id_offset | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove code if it's not needed |
||
children = self._children_map.get(this_id_raw, []) | ||
return children | ||
|
||
class RockstarStatFile(HaloStatFile): | ||
|
@@ -208,7 +214,7 @@ def filename(cls, timestep_filename): | |
return "CannotComputeRockstarFilename" | ||
|
||
class AmigaIDLStatFile(HaloStatFile): | ||
_id_offset = 0 | ||
# _id_offset = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove |
||
|
||
_column_translations = {'n_dm': translations.Rename('N_dark'), | ||
'n_gas': translations.Rename('N_gas'), | ||
|
@@ -220,4 +226,27 @@ class AmigaIDLStatFile(HaloStatFile): | |
def filename(cls, timestep_filename): | ||
return timestep_filename + '.amiga.stat' | ||
|
||
|
||
def iter_rows_raw(self, *args): | ||
""" | ||
Yield the halo ID and requested columns from each line of the stat file, without any emulation. | ||
If _id_from_file_pos is True, the ID is not taken from the stat file, but rather from the order in which | ||
the halos appear. | ||
:param args: strings for the column names | ||
:return: id, arg1, arg2, arg3 where ID is the halo ID and argN is the value of the Nth named column | ||
""" | ||
with open(self.filename) as f: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this copy-paste code appear here? Shouldn't it be inherited from the base class? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code expects from the stat file enumerator output of the form There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the docstring incorrect in that case? (I think this is also true for the base class method above) As for avoiding copy-paste, would something like the following work? (NB I didn't actually try it at all! Just speculating) def iter_rows_raw(self, *args):
for row in super(self).iter_rows_raw(*args):
row[0] = row[1] # sequential catalog index not right in this case; overwrite to match finder id
yield row There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh yeah I think that's a good solution. I can code that up and see. I'll double check the docstrings There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've implemented this and tested (no "self" in the super call, but otherwise exactly as you have it) and it seems to work. I've pushed this over just now! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also double checked and the |
||
header = self._read_column_names(f) | ||
ids = [0] | ||
for a in args: | ||
try: | ||
ids.append(header.index(a)) | ||
except ValueError: | ||
ids.append(None) | ||
for l in f: | ||
if not l.startswith("#"): | ||
col_data = self._get_values_for_columns(ids, l) | ||
col_data.insert(0, col_data[0]) | ||
#if self._id_from_file_pos: | ||
# col_data[0] = cnt | ||
#col_data[0] += self._id_offset | ||
yield col_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add comments to these definitions to explain what the three different things are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is
catalog_index
the best name for it? Wouldfinder_offset
or something like that be more descriptive?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I can put in comments. The rationale for catalog_index is that this is the value passed to the input handler's halo catalog reader to extract a halo. In other words, this is the index pointing to this halo within that catalog object. Note that it is not at all a constant offset necessarily from the finder_id. In fact, there is no reason why it is related to the finder_id at all. Rather,
h[catalog_index]
will return this halo from halo catalogh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that yes... but somehow
finder_offset
sounds right to me, it's the offset in the finder output of this halo from the start? Whereascatalog_index
draws attention tofinder
vscatalog
- is that the right thing to contrast?As another alternative, could(bad idea, breaks backward compatibility)finder_id
becomecatalog_id
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha I guess we might just have a difference in opinion here? to me your description of "offset" just sounds like an "index" (0 = start, 1 = start +1, etc). I'm ok going with whatever you think is the most descriptive. it shouldn't be very hard to change. Maybe I've been thinking about it too long and find
finder_offset
confusing.