-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suite of validation tests on DC2 extragalactic catalogs #50
Comments
I have a general comment regarding three test: dN/dmag, color-mag and color distribution test. These are really testing different aspects of the same thing, namely the full distribution of magnitudes. I.e. if you had statistically indistiguishable distribution of magnitudes from reality, you would automatically pass all three. The first test looks at 1D histograms of mags, the second test is mag vs dmag and the third is dmag vs dmag correlations (where dmag is delta mag, e.g u-g, e.g. color). My worry is two-fold:
So, my suggestion would be to merge them into a single test that perhaps has more than one test in it.
I think this are nice overall test without going into vagaries of color-color histograms that will never look perfect from galacticus but they also don't really matter that very much for our fundamental science. The mean and scatter of magnitudes are directly connected to the number of detected objects, their SNR, etc., so they are very relevant. |
For photo-z and clusters, they might actually care about some of the color-color distributions though. I agree with the basic idea that we don't want to paint ourselves into a corner by devising validation tests based on different datasets that turn out to be impossible to satisfy. My feeling had been that we may indeed need to check all these different things, but our validation criteria cannot be super tight, and that's how we avoid painting ourselves into a corner. |
@yymao @evevkovacs - Just to collect some basic progress notes here: SL and SN confirmed they care more about the sprinkler. It's useful for the extragalactic catalogs to be at least vaguely reasonable but the existing validation tests are enough to ensure that. I will continue to work with the remaining analysis WGs. |
In this comment I will collect the list of working group contacts for extragalactic catalog validation. Currently several are listed as TBD, so this is not very useful, but I will edit the comment as I get more information:
|
@morriscb is the other contact for PZ, we're planning on discussing tests on Friday and updating shortly after that. |
@sschmidt23 pointed out this thread to me -- I want to concur with Rachel on this: for assessing photo-z performance we care much much more about getting the range of galaxy SEDs correct than the overall luminosities of objects (which is what the magnitude vectors is more sensitive to). The distribution of galaxy colors is our best way of assessing SED. I don't expect Galacticus to be perfect at this by any means but rather the intention of our color tests is to be able to assess which simulations / parameter values improve things vs. make them worse. |
@sschmidt23 @slosar @morriscb @j-dr @erykoff - Thanks for agreeing to represent PZ, CL, and LSS on the extragalactic catalog validation. As a reminder, in the next 2 days we'd like to have the following:
If you have any questions about defining tests / validation criteria / etc., please comment on here or the relevant issue. I am happy to answer questions, as are @yymao and @evevkovacs . Also, they have tried to make it easy to implement new tests without having to learn all the ins and outs of DESCQA -- see https://github.com/LSSTDESC/descqa/blob/master/README.md . |
@jablazek @elikrause @timeifler @joezuntz - Please comment in this thread with the name / GitHub idea of the person who will work on the extragalactic catalog validation for your working group (for TJP I believe one person was asked but may not have confirmed; I did not hear a name for WL yet). See the message above this one for what we are asking those people to do, and direct them to this thread. |
@rmandelb : @patricialarsen has volunteered for TJP. @timeifler, she has been doing WL-related work as well on proto-DC2 and is interested in coordinating with @msimet. |
@rmandelb @jablazek @patricialarsen This is great to hear, Patricia has already reached out to Melanie and myself. |
@rmandelb @yymao Is there a living list of current tests or does simply the list of issues with "validation test" label acts as such? Can you elevate my rights so that I can add "validation:required" to some tests, like for example this galaxy bias/clustering test? (or should I tell you which one I think are required?) |
@slosar - the list of issues w/ "validation test" label is the living list of tests. I would love to elevate your rights but I'm not fancy enough to do that (I have labeling privs but not "giving other people labeling privs" privs, apparently). Perhaps @yymao or @evevkovacs can comment on the difference between the "validation test" and "validation test:required" labels; there are far more of the former than the latter, and I'm not sure how to interpret that. Are you wanting the analysis WGs to flag which ones are particularly important so they can be called "required"? I did not quite realize that so I hadn't requested that from anybody. |
Yes, validation test:required is intended to flag tests which are required by the working groups and which the catalog must pass in order to be satisfactory. Other validation tests are those which have been suggested and may be nice to have but aren't as high priority to implement. |
There is also the table 10 in the planning document which now lives on Github: https://github.com/LSSTDESC/DC2_Repo/tree/master/Documents/DC2_Plan. That list provides a quick overview and has the same required/nice to have distinction. You can edit that table in principle. Yao seems to be the only one who has the power to help with the labels ... |
@yymao @rmandelb Ok, so rachel, could you add "validation test:required" to: |
Done. Thanks for thinking through which ones are more important than the others for LSS. And I believe the bias/clustering test is also required for PZ to achieve its goals with the clustering redshift analysis, as well. |
@j-dr and @erykoff - can you please let us know the status of cluster-related extragalactic validation tests? See e.g. this comment: #50 (comment) earlier in this thread for info on what we are looking for. I'm about to go offline for a day, but Yao, Eve, and others on this thread may be able to answer if you have questions about the process. |
We had a discussion of validation tests within the LSS group a two main issue arose:
|
@slosar Could you please clarify exactly what test(s)/check(s) you are proposing under your first bullet. The galaxy-shape modeling in the extra-galactic catalog is very simple. All we have are sizes for the disk and bulge and we assume n=1 Sersic profiles for disks and n=4 profiles for bulges to get half-light radii. The value of the magnification is given at the galaxy location. I think you are proposing a check that is better done on the image simulation result rather than the extra-galactic catalog, but I may have misunderstood. What validation data set and criterion are you proposing to use for second bullet? Validating a 2d histogram is not as straightforward as validating a 1-pt distribution and I was wondering what you had in mind. |
@slosar - thanks for the feedback from LSS. To answer your questions:
|
@evevkovacs @rmandelb Thanks for your quick responses:
If that is OK, will write both issues and then rachel probably needs to close 7 and 11. I think all the work that already went into them will of course keep on being very useful. |
@slosar - For (1): the catalog has both pre- and post-lensing positions (in addition to the pre- and post-lensing magnitudes and sizes that I mentioned earlier). I assume that there is an intent to use the post-lensing quantities for everything including positions when making the sims. You are correct that we could use the statistical properties of pre- and post-lensing positions for a flux-limited sample to test these correlations. For (2): this sounds reasonable to me. HSC can give the overall normalization of the number density across all z in the mag bins, and DEEP2 can give the dN/dz within the mag bins. I agree that it would be best to combine these into a single test rather than having separate dN/dmag and dN/dz tests. @evevkovacs - since you were asking about what Anze intended as well, are you comfortable with this suggestion given his clarification? @slosar - I agree about what needs to be done to the issues, but I want to give Eve a chance to comment on the way you've framed this test before we do that. |
Patricia Larsen will comment on 1). For 2), we can change the N(z) tests to check the normalization. Can you point me to the datasets? |
Maybe I am missing something, but I thought that the observational data had redshift information and therefore selection cuts could be made to match what is in the simulated catalog if need be. |
@yymao - true, there is always a redshift cut in mocks. But the issue is that if you're going to i=25, then we expect a few % of objects above z=3, but ~40% of objects above z=1. So if we do a test with a tolerance of 20%, we don't care if the mock has zmax=3. We care very much if it has zmax=1. My concern is about whether the redshift cut is sufficiently low that we're expecting to lose of order 10% or more of the galaxies we'd see in the real survey, in which case the validation test is invalid. @evevkovacs - we don't in general have redshift information for imaging surveys. We have photo-z, but they are not sufficiently good to use in validation tests. For some of the validation tests we're using here, the validation dataset is SDSS or DEEP2, which provides spectroscopic information. That's why those tests can be defined easily in z ranges. But if our validation dataset is from an imaging survey like HSC, then we can't make the validation test in z ranges. |
@katrinheitmann - I guess I figured we want these tests to be generally useful, so we have to assume that some mocks will have strict z limitations. I'm OK with saying we should design the tests for the ideal case, but if we do that, then I would strongly advocate for the test to not be run at all if the mock catalog (like protoDC2) has some condition that makes the test invalid. For example, all tests that integrate implicitly across all z for a survey the depth of HSC or LSST should all be disabled if the mock catalog has a zmax that is too low (say below z=2). Otherwise we will have the system generating plots that people will use to draw completely wrong conclusions. Is that possible to do? |
@rmandelb thanks for the clarification! And to your technical question, yes, a test can decide not to generate any plot (or do whatever alternative things) if it finds the catalog's max z is, say, less than 2. Is it fair to say we need to mocks to have max z > 2? We can probably check if that's sufficient by run the test on buzzard (max z ~ 2.1). And looking ahead to cosmoDC2, what redshift cut do we think it'll have, @katrinheitmann @evevkovacs? |
Yes, certainly. The test writer is free to specify conditions as she/he sees fit. For eaxample, it would be simple to set a requirement on the maximum redshift delivered by the catalog and if that requirement is not satisfied, the catalog is skipped. |
Sorry if I am missing something, but how can we check whether that's sufficient by running the test on buzzard? My proposal would be to take our best understanding of dN/dz for the faintest magnitude limit for which we test the dN/dmag, integrate that to find the max redshift for which we'd be missing more than, say, 5% of the galaxies, and set that as the max redshift for the dN/dmag test. |
@rmandelb sorry, you're right. I was thinking that we can just check if Buzzard matches to HSC dN/dmag, but then, of course, even if it doesn't match, we still don't know whether it is due to insufficient max z or something else. |
I think we'll be missing >5% of galaxies with a z>2 cut even by i~23 or so... |
This may not be a complete list, but for the WLWG we will definitely need a power spectrum test (eg #35) and an ellipticity distribution test (#14). There was also a suggestion that #14 be done as a function of redshift--I'm not sure yet if that's required or desired, but I wanted to check if you'd consider that a separate validation test, or an implementation issue for the existing ellipticity distribution test. |
It would not be a separate test. I am working on the ellipticity distribution. z ranges can be added in a configuration file. Do you have an idea of what z bins would be of interest? That would be helpful in configuring the plots etc. |
@rmandelb : I've gotten back further from my email and seen your pushback directly :) Yes, we certainly shouldn't compare dN/dm to redshift-incomplete samples. However, I don't see that as a reason to drop dN/dm entirely but rather as a driver to disregard it where it is irrelevant. We need to keep in mind though that dN/dmag/dz will only be at all well-constrained (and not that well given small survey areas) to r |
@rmandelb @janewman-pitt-edu Ok, so it seems that HSC is deep enough but without z-s and with larger cosmic variance and on the other hand DEEP2 can give some information on redshifts, but is incomplete. So I think there are two ways to generate tests:
I have a slight preference for the first option as it has two advantages: i) it naturally grows with growing catalogs (i.e. if catalog doesn't go beyond z=1, fine you don't compare there) and ii) if there are internal tensions between the two datasets they become immediately obvious. My understanding is that rachel supports this option to... However, I don't feel knowledgeable enough about this to judge if it is doable. |
HSC has much smaller cosmic variance than DEEP2... I think once you break DEEP2 into differential magnitude bins you're already getting dodgy. I believe the N(<m, z) constraints much more. |
One follow-up thought: we could implement this as N(<m, z) with a variety of limiting magnitudes rather than as N( m, z) in differential magnitude bins. I think that'd work better, and we could use the DEEP2 extrapolations (with a grain of salt) to do them. |
Another pair of required tests from the WLWG: galaxy size distributions and galaxy position angle distributions (assuming our ellipticity distribution test is for |e|, not e1 and e2). We're working on validation criteria now. |
There is size-magnitude test under development. |
@msimet does the galaxy size-magnitude test satisfies the WL WG's need in terms of validating the galaxy size distributions? If not, can you suggest a more direct test? @evevkovacs It is true that the current galaxy position angle distribution in protoDC2 is just a uniform distribution, but that doesn't mean that it satisfies the WG's need and we don't need a validation test for it. @msimet what kind of galaxy position angle distribution the WL WG would want to see? |
For position angle, we don't need anything more complicated than a KS test comparison to a flat distribution. I'm not sure about the size-magnitude test--was the plan for a full distribution of sizes as a function of magnitude, or just something like the mean? We care about the small-end slope of the size distribution (because it determines the effect of our selection function) and also that sizes are approximately correct so they're the right resolution relative to the PSF. The size-magnitude test should satisfy the latter, but I'm not sure about the former. I'll speak to the WG about validation data sets for the size issue. In addition, I can contribute code for the KS test if you'd like, and the size test if we need to do something different than what you had planned for magnitude. |
@rmandelb I don't think there is unnecessary checks, and bug tests are very important tests 🙂. This one is easy to implement, so let's just do it. We can open a new issue for checking position angle distribution. |
To answer the earlier question @evevkovacs - we probably don't need anything finer than the tomographic bins will be; I'm told a good proxy for this would be 5 (1st year) or 10 (10th year) bins in redshift with approximately equal number of objects in each bin. I think the existing size-luminosity test has some of the information we need, but not all of it; we also want a straight-up dN/dsize so we can see what's happening at the small end. I can code up a test for this, and (probably) use the same COSMOS data used in issue #13 for validation. |
Excellent. If you are at hack day, I can point you to some test examples that will help with this. |
I'll be there in the morning, so I'll definitely try to find you before I have to leave, thanks! |
@chto @yymao Sorry for posting this here. I couldn't find an issue for the CLF test.I am having a problem with the test. It is crashing with an error: |
This epic issue serves as the general discussion thread for all validation tests on the extragalactic catalogs in the DC2 era.
Note: Please feel free to edit the tables in this particular comment of mine since we will use them to keep track of the progresses of validation tests
➡️ Required tests that we have identified (for DC2):
➡️ Tests that are not currently required but good to have:
Analysis WGs are encouraged to join this discussion and to provide feedback on these validation tests. This epic issue is assigned to the Analysis Coordinator @rmandelb, and will be closed when the Coordinator deems that we have implemented a reasonable set of validation tests and corresponding criteria for DC2.
@yymao, @evevkovacs, and @katrinheitmann can provide support to the implementation of these validation tests in the DESCQA framework. In addition to GitHub issues, discussions can also take place on the #desc-qa channel on LSSTC Slack.
P.S. The corresponding issue in DC2_Repo is LSSTDESC/DC2-production#30
The text was updated successfully, but these errors were encountered: