Restore GalSim unit tests #534

gostevehoward · 2017-01-18T18:40:14Z

Right now, three out of the five cases exercised in test_galsim_benchmarks.jl are commented out because they're not working. I think I had all benchmark cases working well at some point but they weren't in unit tests so (as was inevitable) they got broken, and then when new cases broke they simply got commented out (broken window theory). We should get all five cases working again, and we shouldn't comment them out in the future, and maybe we should add more cases to the unit test.

This will also be helped by adding better summary output to the GalSim benchmark script (probably as a result of #510). The point is to get people to notice when their changes degrade GalSim benchmark performance; currently that just doesn't happen.

@rgiordan

The text was updated successfully, but these errors were encountered:

gostevehoward · 2017-01-18T19:21:07Z

For what it's worth, I am seeing the simple_star benchmark working just fine on current master; are you seeing otherwise?

│ Row │ label         │ field                     │ ground_truth │ single_inferred │ joint_inferred │ error_sds  │
├─────┼───────────────┼───────────────────────────┼──────────────┼─────────────────┼────────────────┼────────────┤
│ 1   │ "simple_star" │ "X center (world coords)" │ 0.005335     │ 0.005335        │ 0.005335       │ NA         │
│ 2   │ "simple_star" │ "Y center (world coords)" │ 0.005335     │ 0.005335        │ 0.005335       │ NA         │
│ 3   │ "simple_star" │ "Brightness (nMgy)"       │ 40           │ 39.9827         │ 39.9827        │ 0.0433057  │
│ 4   │ "simple_star" │ "Color band 1-2 ratio"    │ 3.99098      │ 4.00119         │ 4.00119        │ 0.149164   │
│ 5   │ "simple_star" │ "Color band 2-3 ratio"    │ 1.88395      │ 1.88477         │ 1.88477        │ 0.0434862  │
│ 6   │ "simple_star" │ "Color band 3-4 ratio"    │ 1.3179       │ 1.31725         │ 1.31725        │ 0.0492514  │
│ 7   │ "simple_star" │ "Color band 4-5 ratio"    │ 1.16982      │ 1.16992         │ 1.16992        │ 0.00864011 │
│ 8   │ "simple_star" │ "Probability of galaxy"   │ 0            │ 0.00593759      │ 0.00528062     │ NA         │

gostevehoward · 2017-01-18T23:22:20Z

Well the simple_star and star_with_noise benchmarks seem to be passing for me on current master (after Jeff merged #507). galaxy_with_noise fails but, as I now recall and as was commented in the code, it's a result of the bias from load_active_pixels!, issue #482. The solution is supposed to be the shiny new component-detection/initialization system in #157, which is a larger project.

My feeling is we ought to allow an override right now for the GalSim benchmarks, setting noise_fraction very low (like -0.5 or even -1, effectively disabling the filter). It seems better than commenting out test cases.

rgiordan · 2017-01-18T23:42:45Z

Definitely better than commenting out test cases, but not as good as having test cases that actually work. Sorry if this is a question already answered elsewhere, but I've been out of the loop: if noise biases GalSim results so badly, why do we believe our results on actual catalogs?

#157 is a good idea, but it seems like it might take a while. Is there any particular reason not to go back to the way active pixels used to be selected before Jeff re-did it to use the image? (See my Dec 9th comment on #482.) Or am I misunderstanding the problem?

rgiordan · 2017-01-18T23:50:50Z

Yes, after rebasing I now see the correct galaxy probability estimates. That's good! I don't quite see how #507 could have fixed it, though, do you @gostevehoward ?

gostevehoward · 2017-01-18T23:52:08Z

For what it's worth, I don't have a lot of context on this and don't know the right answer; I currently don't have an opinion on either the new way or the old way. I did ask why this doesn't affect SDSS/stripe82 in #482 and Jeff said (see last comment): I think it does affect SDSS and stripe 82, but not that much, in part because an 8x8 grid around the center of the source always gets included no matter how bright the light source is. Also, the noise_fraction is set pretty low--iirc any pixel 50% higher than the background gets included. Many sdss sources are quite a bit brighter than that.

…

On Wed, Jan 18, 2017 at 3:42 PM Ryan ***@***.***> wrote: Definitely better than commenting out test cases, but not as good as having test cases that actually work. Sorry if this is a question already answered elsewhere, but I've been out of the loop: if noise biases GalSim results so badly, why do we believe our results on actual catalogs? #157 <#157> is a good idea, but it seems like it might take a while. Is there any particular reason not to go back to the way active pixels used to be selected before Jeff re-did it to use the image? (See my Dec 9th comment on #482 <#482>.) Or am I misunderstanding the problem? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#534 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABC_kBn_2vS1TAwjU9d-X4mPI47PxB58ks5rTqN1gaJpZM4LnPgB> .

rgiordan · 2017-01-18T23:57:35Z

@jeff-regier , any reason these arguments about stripe 82 don't apply equally to GalSim? If not, why do you think the GalSim results are biased? (Disclaimer: I haven't run these tests myself, so I'm just taking everyone's word that they are failing because of load_active_pixels.)

gostevehoward · 2017-01-18T23:58:44Z

I don't know if or why #507 "fixed" it, no. It might be that Celeste finds one of two minima, one for the galaxy fit and one for the star fit, and that which minimum gets found is very sensitive to minor details right now on this case. That is, the problem is masked by #507 but not really fixed. (Or maybe another commit slipped in close to #507; I didn't look.)

…

On Wed, Jan 18, 2017 at 3:51 PM Steve Howard ***@***.***> wrote: For what it's worth, I don't have a lot of context on this and don't know the right answer; I currently don't have an opinion on either the new way or the old way. I did ask why this doesn't affect SDSS/stripe82 in #482 and Jeff said (see last comment): I think it does affect SDSS and stripe 82, but not that much, in part because an 8x8 grid around the center of the source always gets included no matter how bright the light source is. Also, the noise_fraction is set pretty low--iirc any pixel 50% higher than the background gets included. Many sdss sources are quite a bit brighter than that. On Wed, Jan 18, 2017 at 3:42 PM Ryan ***@***.***> wrote: Definitely better than commenting out test cases, but not as good as having test cases that actually work. Sorry if this is a question already answered elsewhere, but I've been out of the loop: if noise biases GalSim results so badly, why do we believe our results on actual catalogs? #157 <#157> is a good idea, but it seems like it might take a while. Is there any particular reason not to go back to the way active pixels used to be selected before Jeff re-did it to use the image? (See my Dec 9th comment on #482 <#482>.) Or am I misunderstanding the problem? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#534 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABC_kBn_2vS1TAwjU9d-X4mPI47PxB58ks5rTqN1gaJpZM4LnPgB> .

jeff-regier · 2017-01-19T00:40:24Z

It's possible that load_active_pixels! introduces some bias. But I'd expect that to affect the mog version of Celeste too, not just the FFT version.

To prevent any bias, when you call get_sky_patches, pass it the argument min_radius_pix=50, if 50 is large enough to contain all the pixels that could possibly be relevant. (By default min_radius_pix=8.)

It's currently necessary to override this parameter to make galsim unit tests pass (see jeff-regier#534). This is the simplest way to code it but it's ugly in my opinion. We should discuss alternatives in the PR thread or an issue thread.

* permit overriding min_radius_px from top-level callers It's currently necessary to override this parameter to make galsim unit tests pass (see #534). This is the simplest way to code it but it's ugly in my opinion. We should discuss alternatives in the PR thread or an issue thread. * enable all galsim "unit test" cases now that active radius is expanded * updates calls to load_active_pixels! in unit tests

gostevehoward · 2017-01-26T03:44:23Z

Fixed by #538

gostevehoward added the galsim_benchmarks label Jan 18, 2017

gostevehoward self-assigned this Jan 18, 2017

rgiordan mentioned this issue Jan 19, 2017

FFT version of Celeste misestimates brightness #513

Open

gostevehoward closed this as completed Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore GalSim unit tests #534

Restore GalSim unit tests #534

gostevehoward commented Jan 18, 2017

gostevehoward commented Jan 18, 2017

gostevehoward commented Jan 18, 2017

rgiordan commented Jan 18, 2017

rgiordan commented Jan 18, 2017

gostevehoward commented Jan 18, 2017 via email

rgiordan commented Jan 18, 2017

gostevehoward commented Jan 18, 2017 via email

jeff-regier commented Jan 19, 2017

gostevehoward commented Jan 26, 2017

Restore GalSim unit tests #534

Restore GalSim unit tests #534

Comments

gostevehoward commented Jan 18, 2017

gostevehoward commented Jan 18, 2017

gostevehoward commented Jan 18, 2017

rgiordan commented Jan 18, 2017

rgiordan commented Jan 18, 2017

gostevehoward commented Jan 18, 2017 via email

rgiordan commented Jan 18, 2017

gostevehoward commented Jan 18, 2017 via email

jeff-regier commented Jan 19, 2017

gostevehoward commented Jan 26, 2017