Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Combiner refactor #656

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

SaOgaz
Copy link

@SaOgaz SaOgaz commented Jan 18, 2019

[Edit: Was apparently working off an old master, fixing conflicts right now]
[Edit 2: Resolved conflicts]

See discussion in #578.

This is my first pass/prototype of the new structure laid out in #578. I have some things I'd like people to look at... okay, a lot of things. I updated the tests to reflect the new structure and they are all passing now.

  • There is still the same amount of repeat code here that was there before. But the only way I can think of getting rid of it, while still retaining combine_method as an abstract class, which I think makes it more clear to users that they should not be using the base class ever, is to put some of the repeated code bits into separate non-abstract functions. But that might end up making the code more piecemeal/confusing then it needs to be...

  • We should rename combine_method, anybody got any suggestions?

  • I left the combine function override parameter with the combine_method, seemed a little more obvious to the users that way, but I'm not set on it.

  • Same could be said of scale_to. Do we think this is something users will want to change based on different combination runs? IN that case, I think it makes sense to leave it as a parameter to combine_method

  • Uncertainty still needs work/filling out, but I put in some basic structure for this. Not sure if I'm happy with what I have though. The simplest approach was to make constructing the uncertainty from the data, versus from input uncertainties a boolean flag. The default being re-calculate uncertainties from the input data values. But there's probably a more elegant solution to this.

  • With this restructure it seems like it shouldn't be too bad to move the chunking/memory handling into the BaseClass, per @crawfordsm's wish list in our tag up.

  • I think a test needs to be added for weighting with the combiner function. I didn't get any errors related to that while I was still updating combiner and it should have been broken, so seems to be a gap.

  • Obviously more doc strings / etc. to come.

Please have a look at the following list and replace the "[ ]" with a "[x]" if
the answer to this question is yes.

  • For new contributors: Did you add yourself to the "Authors.rst" file?

For documentation changes:

  • For documentation changes: Does your commit message include a "[skip ci]"?
    Note that it should not if you changed any examples!

For bugfixes:

  • Did you add an entry to the "Changes.rst" file?
  • Did you add a regression test?
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • Does this PR add, rename, move or remove any existing functions or parameters?

For new functionality:

  • Did you add an entry to the "Changes.rst" file?
  • Did you include a meaningful docstring with Parameters, Returns and Examples?
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number").
  • Did you include tests for the new functionality?
  • Does this PR add, rename, move or remove any existing functions or parameters?

Please note that the last point is not a requirement. It is meant as a check if
the pull request potentially breaks backwards-compatibility.


@SaOgaz SaOgaz force-pushed the combiner_refactor branch from dd6a26f to 244b2ec Compare January 18, 2019 22:17
@MSeifert04
Copy link
Contributor

Thank you @SaOgaz for your work here 👍.

I plan to have a look over the changes and the points you mentioned soon-ish (although I cannot promise anything currently).

@crawfordsm
Copy link
Member

@SaOgaz thanks for the implementation -- I think this is great and really gives a concrete example to discuss!

While there isn't a reduction in the repetition of the code, it does enable more precise handling of the uncertainty.

I think there is something very interesting in having different methods for the Uncertainty, and that would be incredible useful. I'd like to see a different name than fresh but I don't have a useful suggestion at the moment.

If this would make optimization of the memory easier, that is a very big positive to this implementation, so if you can expand on that point, that would be very helpful.

I'm okay with combine_method, though we might want to have a shortcut for each one of just average, median, and sum.

At to scale_to, probably not a common use case but it might be useful to provide it.

There are two concerns this raises:

  1. If someone wants to compare a median combine and an average combine, they need to create the class twice which would result in a large memory footprint and possible needing to delete the object. I only really saw this once I saw the implementation so that makes it really worthwhile. This may become less of a concern if the memory usage is optimized and this is also likely not a common use case.

  2. As another comment, if I search on using combine vs Combiner, that shows about a 70/30 split in the community about who is using which class. We would want to maintain the Combiner class for a while or at least provide a wrapper around the current classes for the future just to provide backward compatibility for our community. I don't think this would be too hard giving the implementation you've made.

@crawfordsm
Copy link
Member

cc @eteq

@mwcraig
Copy link
Member

mwcraig commented Jan 31, 2019

I haven't had a chance to give this a detailed look-over yet (weekend, hopefully), but looks very promising.

One change -- I think we want to keep the old Combiner class around for backwards compatibility. We can stop mentioning it in favor of the new combiner classes but I don't think we should drop it entirely.

That said, it is great to have you working on this @SaOgaz -- this is an area of the code that has needed attention for a while!

Copy link
Member

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level review (i.e., I didn't go line-by-line but can do so on the next iteration if desirable). I've already discussed some of this with @SaOgaz, but others are welcome to pipe up if they disagree with any of this.

  1. I realize now I wasn't clear about this in Consider revising the Combiner machinery into a class heirarchy #578, but the intent of making the combiner classes in a hierarchy like this is that the images should be passed in at call-time, rather than being attached to the objects themselves. That is, I think combiner_method should get the ccd_list argument, rather than it going into __init__. That is, the combiner classes should be "light-weight" and just have the implementation, but not carry around the data themselves.
  2. I think combiner_method isn't a good name because it's not an action (see my point 1 above) - instead something like combine_images is better. (To put the idea out there as a possibility, it could even be a __call__, although I think @SaOgaz convinced me that's going too far.)
  3. I don't think we should have use_input_uncertainties and uncertainty_func, as they overlap. Moreover, I don't even thing uncertainty_func should exist at all. The "old-style" (i.e., infer from the data) uncertainty should be at the combiner class level - if a user wants to roll their own they should sub-class and override uncertainty_fresh. Relatedly, I think it's better to let the data tell the combiner what to do: that is, if the CCDData have uncertainties, do the error propogation, but if there are no uncertainties do the "fresh" method. A middle ground (which @SaOgaz preferred and I am OK with) is to have use_input_uncertainties have three options: True, False, and 'auto' (default), where "auto" is the behavior I outlined above.
  4. Related to the above: where possible, the combiners should completely defer to the CCDData's uncertainty propogation methods. That's hard/impossible for MedianCombiner I think, but for the others it should be fine because they are basic arithmetic operations.
  5. If possible, AverageCombiner should be implemented as a subclass of SumCombiner that just does the same thing and then divides by len(ccd_list).

Oh, and I agree with @mwcraig the old one should stay in, but with a deprecation warning (probably at class-creation-time?)

@SaOgaz
Copy link
Author

SaOgaz commented Feb 25, 2019

About halfway though removing the data_arr instance attribute, I realized this is going to be a problem for the sigma_clipping, clip_extreme, and minmax_clipping methods. What are people's thoughts on turning these three methods into functions that will run on a list of CCDData objects? I'd like to avoid users (or the code) having to re-run the same sigma clipping etc. on the same dataset multiple times, if the case of the user wanting the same data set and clipping with different combination types.

@SaOgaz
Copy link
Author

SaOgaz commented Feb 27, 2019

@eteq I've actually just run across another problem when trying to use the built in CCDData arithmetic. Right now the combination math is getting done on masked numpy arrays. This means when a sum is done, for example, the masks are used (and as you know, uncertainties are calculated fresh using that masking information).

The current CCDData arthimetic methods do not use the masks. Now I could probably apply the masks to the data pre-calculation as a work around, but that wouldn't account for the masked values in the CCDData built in uncertainty calculation... so defeating our original purpose...

Since I'm posting here anyway I should add, the combination of pulling out the data from being an instance variable (something i do think is worthwhile) in combination with trying to use the native CCData type is basically resulting in a complete re-write of ALL the code. It ends up touching every single function/method. So if we do end up continuing down this route it will require a lot more time* on my part.

*lot more time meaning more than a few more hours, maybe a half week of effort including updating the tests and docs etc.

@mwcraig
Copy link
Member

mwcraig commented Mar 7, 2019

@SaOgaz -- I'll try to digest your comments by early next week. I need to review the current code base and your changes to make sure I understand the issues you are hitting.

Off the top of my head, I think CCDData's uncertainty propagation is only going to be useful for sum or average combine (at best).

Just so I understand does this

Since I'm posting here anyway I should add, the combination of pulling out the data from being an instance variable (something i do think is worthwhile) in combination with trying to use the native CCData type is basically resulting in a complete re-write of ALL the code.

mean all of the code in combine? Or in CCDData?

@SaOgaz
Copy link
Author

SaOgaz commented Mar 8, 2019

@mwcraig, all of the code in combine.

Base automatically changed from master to main March 16, 2021 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants