Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul and fix Sonar beam_group variables #658

Merged
merged 9 commits into from
May 10, 2022

Conversation

emiliom
Copy link
Collaborator

@emiliom emiliom commented May 1, 2022

Working on something else, I realized there were some problems with my initial implementation of the Sonar beam_group_name and beam_group_descr variables and associated SetGroups<SENSOR>._beamgroups property. For reference, this implementation was part of PR #574 (see especially #574 (comment)).

Problems

  1. While the beam_group_ variables had a beam_group dimension, there was no corresponding coordinate variable.
  2. For AD2CP, SetGroupsAd2cp.beamgroupswas present (added in PR #617), but thebeam_group` variables were not being created.
  3. For EK80, it assumed that the two Beam groups were always present. The beam_group_ variables were being added as length 2, even when only one Beam group (Beam_group1) was actually created.

In this PR

All 3 problems are addressed in this PR. For problem 1, I renamed beam_group_name to beam_group and made it the previously missing coordinate variable.

Problem 3 (EK80 variable-number-of-beam-groups issue) was more complicated. Previously I set the SetGroupsEK80._beamgroups property in the __init__, and it was fixed. I've moved the dict of the 2 possible beam groups to a new SetGroupsEK80 class property, beamgroups_possible. The decision of whether there is 1 or 2 beam groups present apparently can only be made after set_beam is executed. Therefore, I modified convert/api.py so that set_sonar is executed after set_beam, and for EK80 added a new beam_group_count argument to set_sonar.

Everything seems to work as expected now.

TODO: Add a test.

@leewujung : I added the set_sonar beam_group_count argument to EK80 only. That keeps the other SetGroups<SENSOR> classes a bit cleaner, but with the downside that the set_sonar methods are no longer exactly identical across SetGroups<SENSOR> classes. Thoughts?

@lsetiawan : When flipping the order of execution in convert/api.py so that set_sonar happens after set_beam, a datatree error occurred. See below. I don't fully understand how it's happening, given that tree_dict assignments are just dictionary assignments. But I found that I could eliminate the error by preassigning tree_dict["Sonar"] = None before tree_dict[f"Sonar/Beam_group{idx}"] is assigned. Do you think this could be simplified a bit?

Error in convert/api.py when tree_dict["Sonar"] = None is not preassigned before tree_dict[f"Sonar/Beam_group{idx}"]:

--> 477 tree = DataTree.from_dict(tree_dict)
    478 echodata = EchoData(source_file=file_chk, xml_path=xml_chk, sonar_model=sonar_model)
    479 echodata._set_tree(tree)

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/datatree/datatree.py:179, in DataTree.from_dict(cls, data_objects, name)
    177         # Create and set new node
    178         new_node = cls(name=node_name, data=data)
--> 179         obj.set_node(
    180             relative_path,
    181             new_node,
    182             allow_overwrite=False,
    183             new_nodes_along_path=True,
    184         )
    185 return obj

File ~/miniconda3/envs/echopype_dev/lib/python3.9/site-packages/datatree/treenode.py:194, in TreeNode.set_node(self, path, node, new_nodes_along_path, allow_overwrite)
    191         del child
    192     else:
    193         # TODO should this be before we walk to the new node?
--> 194         raise KeyError(
    195             f"Cannot set item at {path} whilst that path already points to a "
    196             f"{type(parent.get_node(node_name))} object"
    197         )
    199 # Place new child node at this location
    200 parent.add_child(node)

KeyError: "Cannot set item at / whilst that path already points to a <class 'datatree.datatree.DataTree'> object"

@emiliom emiliom added this to the 0.6.0 milestone May 1, 2022
@emiliom emiliom requested review from leewujung and lsetiawan May 1, 2022 20:55
@codecov-commenter
Copy link

codecov-commenter commented May 1, 2022

Codecov Report

Merging #658 (62912fa) into dev (3544fb9) will decrease coverage by 5.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #658      +/-   ##
==========================================
- Coverage   78.77%   73.66%   -5.11%     
==========================================
  Files          43       15      -28     
  Lines        3844     2358    -1486     
==========================================
- Hits         3028     1737    -1291     
+ Misses        816      621     -195     
Flag Coverage Δ
unittests 73.66% <100.00%> (-5.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
echopype/convert/api.py 84.44% <100.00%> (+0.59%) ⬆️
echopype/convert/set_groups_ad2cp.py 98.94% <100.00%> (+0.04%) ⬆️
echopype/convert/set_groups_azfp.py 97.33% <100.00%> (+0.15%) ⬆️
echopype/convert/set_groups_base.py 88.17% <100.00%> (-2.05%) ⬇️
echopype/convert/set_groups_ek60.py 92.03% <100.00%> (+0.14%) ⬆️
echopype/convert/set_groups_ek80.py 96.47% <100.00%> (+0.02%) ⬆️
echopype/convert/utils/ek_raw_parsers.py 54.90% <0.00%> (-0.17%) ⬇️
... and 29 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@lsetiawan
Copy link
Member

When flipping the order of execution in convert/api.py so that set_sonar happens after set_beam

I don't completely understand why this change has to happen? Isn't it more intuitive to set_sonar first then set_beam? I think datatree is erroring out because it's trying to create a sub-node before the upper-node. In this case Sonar. Since Sonar is a parent of Beam_x. This needs to exist first. It's not just a simple dictionary assignment. Datatree creates nodes on the fly, so your solution is the most simple that I see.

@emiliom
Copy link
Collaborator Author

emiliom commented May 3, 2022

I don't completely understand why this change has to happen? Isn't it more intuitive to set_sonar first then set_beam?

I agree that it's more intuitive. But for EK80, it turns out it's not possible. In convert/api.py, you don't know whether there'll be 1 or 2 beam groups until set_beam is executed. set_sonar needs to know how many beam groups there are.

Since Sonar is a parent of Beam_x. This needs to exist first. It's not just a simple dictionary assignment. Datatree creates nodes on the fly, so your solution is the most simple that I see.

tree_dict is initially defined as a dict, though. As far as I can see, it remains a dict until all the group set_* methods have been executed. So, that's why I don't understand the error. Anyway, like I said, my solution does work, though it seems inelegant.

@leewujung
Copy link
Member

Related to #625.

@leewujung leewujung requested review from b-reyes and removed request for leewujung May 5, 2022 16:40
@emiliom
Copy link
Collaborator Author

emiliom commented May 5, 2022

We'll go ahead and add the class property beamgroups_possible to all sensors SetGroups<SENSOR> classes, not just EK80. I'll make the change to the PR ASAP

@emiliom
Copy link
Collaborator Author

emiliom commented May 5, 2022

add the class property beamgroups_possible to all sensors SetGroups<SENSOR> classes, not just EK80.

Done. @b-reyes , this PR will need your review.

@lsetiawan : If you are good with the PR as is, can you approve it? Otherwise, comment or commit away 😺

@b-reyes
Copy link
Contributor

b-reyes commented May 6, 2022

tree_dict is initially defined as a dict, though. As far as I can see, it remains a dict until all the group set_* methods have been executed. So, that's why I don't understand the error. Anyway, like I said, my solution does work, though it seems inelegant.

I looked into this a little bit because I too was confused. From my understanding, dictionaries since Python 3.6 preserve order. Datatree follows this order using .items() as you can see from their from_dict(). That's why you get an error if the parents and children are out of order in the dictionary.

I agree that it's more intuitive. But for EK80, it turns out it's not possible. In convert/api.py, you don't know whether there'll be 1 or 2 beam groups until set_beam is executed. set_sonar needs to know how many beam groups there are.

If we want to preserve order i.e. first set Sonar and then set Sonar/Beam_groupX, we could just look at the parsed data. For example, in EK80 we would say there are two beam groups if power and complex are in the keys of self.parser_obj.ping_data_dict.

@@ -187,22 +187,24 @@ def _parse_NMEA(self):
return time1, msg_type, lat, lon

def _beam_groups_vars(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emiliom I don't think this function correctly accounts for the EK80 sensor. From my understanding, there do exist EK80 files that only have the power data. Thus, these types of files would have Beam_group1 with the description "contains complex backscatter data and other beam or channel-specific data". If you want a file that exhibits this behavior, please see echopype/test_data/ek80_new/D20211004-T234429.raw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? I think what you're saying is that the current group description for EK80 Beam_group1 is inadequate. If that's the case, then the fix would be to that text in SetGroupsEK80 (note that @leewujung composed that text). This function is generic and just sets up the beam_groups_descr variable and beam_group coord variable, based on the beam group information that's passed to it.

Copy link
Contributor

@b-reyes b-reyes May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description is fine, just how you are assigning it to the beam group is incorrect. This is a known possibility, in 1.0.yml we have

name: Beam_group1
    description: >-
      contains backscatter data (either complex samples or uncalibrated power samples)
      and other beam or channel-specific data, including split-beam angle data when they exist.

Note that we say either complex samples OR uncalibrated power samples

Basically, for EK80, your code should check if only one beam group exists and if only one beam group exists, then it should select the appropriate description. It seems like the issue here (at least for EK80) is that beamgroups_possible ties the description to a specific beam group i.e. Beam_group1 is always for the complex backscatter data. In practice, Beam_group1 can correspond to backscatter data that are complex samples OR power samples.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Let me chew on this. I'll get back to it mid-late afternoon, hopefully.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've looked at this more closely. For the sample file you listed (test_data/ek80_new/D20211004-T234429.raw -- thanks!), the data is written to Beam_group1 and its description becomes the EK80 Beam_group1 description text we have, "contains complex backscatter data and other beam or channel-specific data." I'm not sure if you were saying that that's what should happen but doesn't or what does happen but shouldn't, but I believe you're saying the latter. That's the intended behavior, and not something that was changed in this PR. I think @leewujung provided that EK80 beam group description text.

I still think that what you're getting at can be addressed by simply changing the description in beamgroups_possible that's associated with Beam_group1. It sounds like something like this would be more appropriate: "contains backscatter data (either complex samples or uncalibrated power samples) and other beam or channel-specific data".

Also, just to narrow down the problem, the function beam_groups_vars itself would not be the source of the potential problem, or solution. It's very generic. It just writes out what's passed to it. The issue you're getting at is what's passed to it by SetGroupsEK80.set_sonar, which occurs here:

self._beamgroups = self.beamgroups_possible[:beam_group_count]
beam_groups_vars, beam_groups_coord = self._beam_groups_vars()

The code does check -- elsewhere -- if only one beam group exists; if that's the case, that group will be Beam_group1 and the corresponding dict from beamgroups_possible is passed to it. The description text for that group doesn't need to exactly specify whether the beam group is complex or power only; that would be nice, and we could look into it in the future, but it's not necessary. That description does need to be accurate and not misleading, though. It sounds like a change to that description along the lines of what I suggest would make it more accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've looked at this more closely. For the sample file you listed (test_data/ek80_new/D20211004-T234429.raw -- thanks!), the data is written to Beam_group1 and its description becomes the EK80 Beam_group1 description text we have, "contains complex backscatter data and other beam or channel-specific data." I'm not sure if you were saying that that's what should happen but doesn't or what does happen but shouldn't, but I believe you're saying the latter. That's the intended behavior, and not something that was changed in this PR. I think @leewujung provided that EK80 beam group description text.

Yes, I was saying the latter. For the file you reference (and I provided), there is no complex backscatter data, so the description is wrong.

I still think that what you're getting at can be addressed by simply changing the description in beamgroups_possible that's associated with Beam_group1. It sounds like something like this would be more appropriate: "contains backscatter data (either complex samples or uncalibrated power samples) and other beam or channel-specific data".

Although your change to the description does fix the issue I described, it somewhat conflicts with the Beam_group2 description: "contains backscatter power (uncalibrated) and other beam or channel-specific data, including split-beam angle data when they exist." I am sure any user could infer that Beam_group1 corresponds to "complex samples" when two beam groups exist. However, I liked the idea of being very specific and stating whether the beam group has complex or power only data. I think this would be simple to implement and would benefit the user by providing further clarity. If you think this change to a more specific description conflicts with the intent of this PR, I understand. I am willing to take on this issue, if you do not have the time for it.

Also, just to narrow down the problem, the function beam_groups_vars itself would not be the source of the potential problem, or solution. It's very generic. It just writes out what's passed to it. The issue you're getting at is what's passed to it by SetGroupsEK80.set_sonar

I see where you are coming from, maybe the function beam_groups_vars is not the most appropriate place to change this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I liked the idea of being very specific and stating whether the beam group has complex or power only data. I think this would be simple to implement and would benefit the user by providing further clarity. If you think this change to a more specific description conflicts with the intent of this PR, I understand. I am willing to take on this issue, if you do not have the time for it.

If you really think it'd be very simple to implement, go ahead; otherwise my vote would be for leaving it to 0.6.1. While the impact of this addition is "minor" on the scope of this PR (and it does fit its intent), I think it should be done in a different PR, mainly so we can move on with this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "simple" was the wrong word. I would say it is doable haha. I am fine if you would like to leave this for 0.6.1. I think we have enough on our plate for 0.6.0. I will create a new issue for this. For this PR, we can just change the Beam_group1 description to "contains backscatter data (either complex samples or uncalibrated power samples) and other beam or channel-specific data", as you suggested.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine if you would like to leave this for 0.6.1. I think we have enough on our plate for 0.6.0. I will create a new issue for this.

That would be perfect, thank you. If you already had concrete ideas on how to accomplish this, I suggest you describe them at least briefly when creating the new issue, while they're fresh in your mind.

For this PR, we can just change the Beam_group1 description to "contains backscatter data (either complex samples or uncalibrated power samples) and other beam or channel-specific data", as you suggested.

I'll go ahead and make the change now.

cleaner and more consistent with the rest of the code
@emiliom
Copy link
Collaborator Author

emiliom commented May 6, 2022

tree_dict is initially defined as a dict, though. As far as I can see, it remains a dict until all the group set_* methods have been executed. So, that's why I don't understand the error. Anyway, like I said, my solution does work, though it seems inelegant.

I looked into this a little bit because I too was confused. From my understanding, dictionaries since Python 3.6 preserve order. Datatree follows this order using .items() as you can see from their from_dict(). That's why you get an error if the parents and children are out of order in the dictionary.

Thanks for looking into it! I think that does explain the problem, if Datatree is in fact relying on the order in the dict passed to it.

I don't see any real need to maintain the order of set_sonar and set_beam execution. So, what I've done should be fine for our purposes.

emiliom and others added 2 commits May 10, 2022 13:36
Expanded the description so it covers both complex and power data
@emiliom
Copy link
Collaborator Author

emiliom commented May 10, 2022

@b-reyes I've made the edit to the EK80 group1 description. The tests are running now. When they're completed, could you review and approve if you think everything is ready to merge? Thanks.

Copy link
Contributor

@b-reyes b-reyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emiliom this looks good to me!

@emiliom emiliom merged commit e0b3cbf into OSOceanAcoustics:dev May 10, 2022
@emiliom emiliom deleted the beamgroups_sonarvars branch December 24, 2022 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants