[Discussion] Adding support for writing to nwb file. #7

h-mayorquin · 2022-08-15T18:38:41Z

Hi,
So we want to help you guys to add support to nwb. I am opening this issue to have some preliminary discussion with you so we can align on design issues (I already read your helpful contributing document).

I have spent the day looking at your code base here and in your main repo and so far my assesment is the following. I think that the task of offering nwb writing support can be divided into two big steps:

Extract all the data from the labeled frames / instances and organize it in a structure like labels.numpy which you use in your general repo to write to nwb.
Use the data in the aforementioned structure to write the pose estimation objects as you do in appropriate method of the ndx_pose module.

The latter (2) is sort of straighfroward for us as we have done this many times so I will concentrate on the former (1). I already built a prototype using a hierarchy of tidy data frames that reads all the data using the objects in your models module:

https://gist.github.com/h-mayorquin/2c20eb7c7dbb3849ce5e45bc4a8afc5d

Which produces data like this:

So, design questions that I have:

I think that the prototype method can be easily used to fulfill point 1 (I already tested it with some data) but I wanted to discuss with you if you had in mind another way of providing similar functionality. The reason for this is that I would rather help you to implement your design vision rather than using my ad-hoc method for this (I could not find anything equivalent on the library).
Where should the functionality to write to nwb should be located? Is the idea to create another module under the io directory called ndx_pose or nwb that could be used for this or do you have other ideas for the organization.
Concerning testing I am using the example that you provide in your main repo (the centered_pair_predictions.slp file) as is more complex . However, this file is not available in this repo, could we add it so we can have automatic testing with more complex files? The one currently available seem to only have two tracks and one labeled frame so some parts of the code would be harder to test.
I have a question about your data model. I don't understand the set theoretical relationships between Track and Skeleton (and maybe that's why the data above in the data-frame might look confused). An instance object contains both a skeleton and a track. However, is their relationship 1-to-1, can a track contain multiple skeletons? can the skeleton be assigned to multiple tracks (this makes less sense to me)?

If you are fine with the prototype above (question 1) I can move forward and implement it quickly so you can see it working on code. All the other questions are of less immediate importance I think.

P.D. Pandas a dependency is implied by the use of ndx_pose.

The text was updated successfully, but these errors were encountered:

talmo · 2022-08-16T18:26:49Z

Hi @h-mayorquin,

This sounds great! Thanks for the help. Some answers below.

So, design questions that I have:

I think that the prototype method can be easily used to fulfill point 1 (I already tested it with some data) but I wanted to discuss with you if you had in mind another way of providing similar functionality. The reason for this is that I would rather help you to implement your design vision rather than using my ad-hoc method for this (I could not find anything equivalent on the library).
The idea with this library is to not need another intermediate representation and instead work directly off of the data model.

For example, why not just build up the pose series directly instead of coercing them into a dataframe first?

Numpy arrays may be a reasonable intermediate for things that strictly need to be series, i.e., for inference data. (See rly/ndx-pose#9 regarding training data.)

Where should the functionality to write to nwb should be located? Is the idea to create another module under the io directory called ndx_pose or nwb that could be used for this or do you have other ideas for the organization.

sleap_io/io/nwb.py sounds great!

Concerning testing I am using the example that you provide in your main repo (the centered_pair_predictions.slp file) as is more complex . However, this file is not available in this repo, could we add it so we can have automatic testing with more complex files? The one currently available seem to only have two tracks and one labeled frame so some parts of the code would be harder to test.

Yes, we can add some more fixtures that are more complex. We've been doing it on an as-needed basis to prevent the repo from getting too bloated.

I have a question about your data model. I don't understand the set theoretical relationships between Track and Skeleton (and maybe that's why the data above in the data-frame might look confused). An instance object contains both a skeleton and a track. However, is their relationship 1-to-1, can a track contain multiple skeletons? can the skeleton be assigned to multiple tracks (this makes less sense to me)?

A Skeleton just defines a set of nodes corresponding to landmark types (+ connections). It does not contain any positional data.

Instances have a skeleton associated with them to map positions to unique landmark types. An Instance can only have one Skeleton, but different Instances can have different Skeletons.

A Track describes a (semi-)unique identity that associates Instances across frames. It can be co-opted to describe a unique class like "female" or "black_mouse" across multiple videos, but typically refers to the same physical animal (e.g., "animal1") within the same video.

An Instance must have a Skeleton set, but does not need a Track. This is to support the case where we go to a random frame and cannot identify each animal uniquely, but can annotate their poses. Forcing Instances to be contained within a Track is highly inflexible and intractably increases the annotation burden for users in many use cases where it is non-trivial to identify animals uniquely in a random-access fashion.

For final inference results not used in labeling, we will typically want every Instance to have a Track assignment. (Though this is not strictly necessary in some cases.)

Tracks and Skeletons have no relationship to each other.

If you are fine with the prototype above (question 1) I can move forward and implement it quickly so you can see it working on code. All the other questions are of less immediate importance I think.

Maybe it's best that you just try it out and send a PR. It looks like it's almost done, so feel free to finish it off with whatever approach is easiest and we'll iterate from there.

P.D. Pandas a dependency is implied by the use of ndx_pose.

Pandas is no problem, thanks!

talmolab locked and limited conversation to collaborators Aug 16, 2022

talmo converted this issue into discussion #10 Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Discussion] Adding support for writing to nwb file. #7

[Discussion] Adding support for writing to nwb file. #7

h-mayorquin commented Aug 15, 2022

talmo commented Aug 16, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[Discussion] Adding support for writing to nwb file. #7

[Discussion] Adding support for writing to nwb file. #7

Comments

h-mayorquin commented Aug 15, 2022

talmo commented Aug 16, 2022 • edited Loading

This issue was moved to a discussion.

talmo commented Aug 16, 2022 •

edited

Loading