Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrum.Char.SpatialAxis.Coverage.Location for GAIA DR4 Spectra #13

Open
mcdittmar opened this issue Feb 6, 2025 · 5 comments
Open

Comments

@mcdittmar
Copy link
Collaborator

email received 20240705: from Jos De Bruijne

Dear Mark,

I am contacting you as curator of the IVOA Spectrum Data Model Recommendation. We are preparing the Gaia ESA Archive for Gaia DR4 and, unfortunately, we see ourselves forced to serve mean (=stacked) and epoch spectra (from both the BP/RP and RVS instruments) that violate “your” IVOA standard by not providing the mandatory metadata field “Spectrum.Char.SpatialAxis.Coverage.Location”. I provide some more details below and my main question to you is: will this undesirable but likely unavoidable violation have any negative consequences for discoverability, usability, or interoperability of these data products?

Note: the affected spectra products are going to be served via the DataLink protocol.

Thanks for sharing your insights!

Jos

Cc Jonathan, our valued IVOA gurus in the Gaia DPAC consortium Mark and Markus, and my ESA and DPAC colleagues Héctor, David, Nigel, and Enrique

First an explanation of why we face this seemingly easy problem

The “problematic” metadata field is “Char.SpatialAxis.Coverage.Location.Value” which reflects the (celestial) position of the aperture for the spectrum. The problem with this field is linked to how the DPAC data processing consortium works: whereas the processing of the spectra is nearing completion and these should undergo validation later this year to meet the DR4 release schedule, the final astrometry (which is the source of the metadata field) still needs consolidation which cannot be completed before October 2025. At that time, it is too late to update the spectra (metadata) by adding the “Location.Value” coordinates since, by that time, the spectra have already been prepared, ingested in the archive, and internally cleared for publication. I fully realise that this sounds like a silly situation and limitation, but reality is that, with a data processing consortium with 6 data processing centres, some 400 people working in parallel, more than 100 data products with complex interdependencies (that necessitates a very complex quality filtering, integrity, and consistency framework), and some 600 TB of data for 2 billion sources, implementing seemingly simple changes in the consortium workflow cannot be done without endangering the release date.

Second our assessment of why this is not a disaster

We understand the principle that “Char.SpatialAxis.Coverage.Location.Value” is declared as a mandatory metadata field since without specifying the celestial coordinates of the telescope pointing / aperture (slit, fiber, extraction window, …), a spectrum can in general not be interpreted. In the case of all Gaia spectra, however, the aperture – by definition – is centred on the object of interest, the identifier of which is recorded in the “Target.Name” field. Note: of course, the problem of “Char.SpatialAxis.Coverage.Location.Value” not being available in time to be recorded in the spectra metadata also applies to the optional metadata field that records the target object coordinates (“Target.Pos”) so that field will also not be present.

@mcdittmar
Copy link
Collaborator Author

Reply: Markus D. 20240709

Hi Jos,

Disclaimer: I'm not the curator of SDM; I'm just an adopter giving my
adopter opinion.

On Fri, Jul 05, 2024 at 10:37:34AM +0000, Jos De Bruijne wrote:

I am contacting you as curator of the IVOA Spectrum Data Model
Recommendationhttps://www.ivoa.net/documents/SpectrumDM/20231215/REC-SpectrumDM-1.2-20231215.pdf.
We are preparing the Gaia ESA Archive for Gaia DR4 and,
unfortunately, we see ourselves forced to serve mean (=stacked) and
epoch spectra (from both the BP/RP and RVS instruments) that
violate “your” IVOA standard by not providing the mandatory
metadata field “Spectrum.Char.SpatialAxis.Coverage.Location”. I
provide some more details below and my main question to you is:
will this undesirable but likely unavoidable violation have any
negative consequences for discoverability, usability, or
interoperability of these data products?

Having coverage.location mandatory has been a problem in other
circumstances, too: there was a time when the overwhelming majority
of SDM-serialised spectra were theoretical and thus by definition had
no location (on the sky, at least). So, this problem has persisted
since the very first days of the SDM, and, mainly because there have
always been other priorities, nobody really worried about it, arguing
"well, if you are using a theortical spectrum, it had better be in a
context where nobody will look at the coverage".

Now, I give you your case is a bit different in that the spectra
would have a position. On the other hand, you're Gaia and hence
famous for the forseeable future (and at least until DR5). I'd hence
say that if the spectra come with an easily findable DR4 id, every
astronomer will be able to figure out the sky position. This will
not be automatic, but I don't think that will be a big problem in
practice, and it's at least automatisable.

So: since clients looking at SDM metadata (and there aren't so many
of those to begin with) have to deal with missing locations anyway,
and for consumers just having the dataset, a DR4 source_id will be
enough to make up for a missing coverage.location, I would be
relaxed.

@mcdittmar
Copy link
Collaborator Author

Reply: Mark T.

For a bit more context, I note that in the Simple Spectral Access
protocol, steps are currently under way to accommodate this issue
(with theoretical services in mind): the proposed SSA Erratum #4
https://wiki.ivoa.net/twiki/bin/view/IVOA/SSA-1_1-Err-4
makes the Char.SpatialAxis.Coverage elements OPTional instead of
MANdatory. That change was stimulated by looking at SSA services,
but perhaps it should be extended to the DM as well as the access
protocol.

@mcdittmar
Copy link
Collaborator Author

The above appears to resolve the concern about consequences, but I think there are still open questions:

  • should the model be updated to make this field optional?
  • should we recommend HOW they should identify the datasets as GAIA DR4 ... which field does this go into?
  • Markus explains that the reason they are mandatory in Spectrum has to do with distinguishing between Theoretical and Observed spectra.. is DataID.DataSource sufficient for that purpose? (allowing the term to become optional)

NOTE: the model says: MANDATORY must provide a value, but that values may be UNKNOWN if the value exists, but is unknown... is this expressed as an empty value? or the string "UNKNOWN".. that may effect what they should do for that field.
NOTE: DataID.DataSource already indicates "pointed", "theory", "artificial", "composite" (from the Table). The text says that the definition is in SSAP, which is rather backwards..

@mcdittmar
Copy link
Collaborator Author

Input from Markus D. summarized below. See: http://mail.ivoa.net/pipermail/dm/2025-February/006570.html

The above appears to resolve the concern about consequences, but I think there are still open questions:

  • should the model be updated to make this field optional?

I'd very much support that to the extent that I'd write an Erratum,
in particular because I've handed out position-less SDM files for
more than a decade by now. Gently prod me to get me started.

  • should we recommend HOW they should identify the datasets as GAIA DR4 ... which field does this go into?

In discovery, that would be covered in Obscore's obs_collection and
its SSAP counterpart. Having it in the dataset certainly isn't
wrong. As an informal convention and until we've finally gotten rid
of more or less ad-hoc utype usage, I'd say a PARAM with a utype of
obscore:dataid.collection would be the least surprising measure.

But again: is there a usage scenario?

  • Markus explains that the reason they are mandatory in Spectrum has to do with distinguishing between Theoretical and Observed spectra.. is DataID.DataSource sufficient for that purpose? (allowing the term to become optional)

NOTE: the model says: MANDATORY must provide a value, but that values may be UNKNOWN if the value exists, but is unknown... is this expressed as an empty value? or the string "UNKNOWN".. that may effect what they should do for that field. NOTE: DataID.DataSource already indicates "pointed", "theory", "artificial", "composite" (from the Table). The text says that the definition is in SSAP, which is rather backwards..

Q: Do we need to add something to identify Theoretical?

Good question. I'd say we need a usage model for that.

Clearly, the distinction between observed and "something else" is
very important in discovery, and for that, we've had
SimpleDALRegExt's DataSource
http://docs.g-vo.org/schemadoc/schemas/SSA-v1_2_xsd/simpleTypes/DataSource.html
for a long time. Splat's use of that, for one, is fairly popular.

Do we have indications that we need this machine-readably in the
dataset itself?

@mcdittmar
Copy link
Collaborator Author

  • should we recommend HOW they should identify the datasets as GAIA DR4 ... which field does this go into?

In discovery, that would be covered in Obscore's obs_collection and its SSAP counterpart. Having it in the dataset certainly isn't wrong. As an informal convention and until we've finally gotten rid of more or less ad-hoc utype usage, I'd say a PARAM with a utype of obscore:dataid.collection would be the least surprising measure.

But again: is there a usage scenario?

I think the usage scenario is basically the question submitted at the top of this issue.

"We understand the principle that “Char.SpatialAxis.Coverage.Location.Value” is declared as a mandatory metadata field since without specifying the celestial coordinates of the telescope pointing / aperture (slit, fiber, extraction window, …), a spectrum can in general not be interpreted. In the case of all Gaia spectra, however, the aperture – by definition – is centred on the object of interest, the identifier of which is recorded in the “Target.Name” field. "

The Spectrum model has Spectrum.DataID.Collection already, which would be good place to put the "GAIA DR4" information.
That would facilitate the thread described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant