Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow archive_database_update to deal with missing PID values #1423

Closed
stscijgbot-jwql opened this issue Jan 8, 2024 · 1 comment · Fixed by #1427
Closed

Allow archive_database_update to deal with missing PID values #1423

stscijgbot-jwql opened this issue Jan 8, 2024 · 1 comment · Fixed by #1427

Comments

@stscijgbot-jwql
Copy link
Collaborator

Issue JWQL-160 was created on JIRA by Bryan Hilbert:

Inspired by the MAST Help Desk question INC0196521

archive_database_update is currently failing because there is a NIRCam program number that is returned in our pyvo query, but then not present in the results of a later MAST query. This is resulting in a key error because the dictionary of program numbers is missing '4566'.

The reason why 4566 is missing from the MAST results is not yet clear. But in the meantime, we shouldn't allow archive_database_update to fail and skip its update on all the other new data just because it can't find 4566. (Especially since NIRCam is the first instrument checked.)

We should update the code so that in this case, it just skips 4566 and moves on to the next program. But then we also need it to alert us to the failure so that we can look into that particular program.

On a related note, even though archive_database_update has been failing, the Dashboard claims that all has been well with the times the code has run. The log files do not report SUCCESS at the end, which should be what the dashboard is looking for. So it seems that there must be some loophole that is causing the dashboard to not see and report this as a failure.

Here is some text I added to the MAST Help Desk ticket above that describes the problem:

I've dug into this a bit from the JWQL side. It turns out that our script that checks for new data in MAST has been failing recently. So that explains the lack of recent data in JWQL. But, it's not clear to me why our script is failing. I suspect there may be something odd about PID 4566, which is the program causing our crashes. We get a list of proposals by instrument in a couple of different ways in our code base.

The first way, using pyvo, does include program 4566 in the results (for NIRCam):
(hopefully STARS doesn't mangle the formatting of this code too much)

tap_service = vo.dal.TAPService("https://vao.stsci.edu/caomtap/tapservice.aspx")
tap_results = tap_service.search(f"""select distinct prpID from CaomObservation where collection='JWST'
and maxLevel>0 and insName like '{instrument.lower()}%'""")
prop_table = tap_results.to_table()
proposals = prop_table['prpID'].data
inst_proposals = sorted(proposals.compressed(), reverse=True)

But with the other method we use, querying MAST's Filtered database for NIRCam, does not include 4566 in the results:

service = "Mast.Jwst.Filtered.{}".format(instrument)
params = {"columns": "program, category",
"filters": [{'paramName': 'instrume', 'values': [instrument]}]}
response = Mast.service_request_async(service, params)
results = response[0].json()['data']

Get all unique dictionaries

unique_results = list(map(dict, set(tuple(sorted(sub.items())) for sub in results)))

Make a dictionary of {program: category} to pull from

proposals_by_category = {d['program']: d['category'] for d in unique_results}

Do you have any idea why program 4566 is not included in the results from MAST? Observations 1 and 2 for 4566 executed on Dec 25.

@bhilbert4 bhilbert4 linked a pull request Jan 11, 2024 that will close this issue
@stscijgbot-jwql
Copy link
Collaborator Author

Comment by Bryan Hilbert on JIRA:

Bradley Sappington has made changes to make archive_database_update robust against this kind of error. Now, if a program is missing from the MAST results but is present in the pyvo results, JWQL will simply declare that program's category type to be MISSING and move on, rather than crashing.

After talking to Dick Shaw and some further investigation, it looks like we are hitting the MAST_QUERY_LIMIT for our NIRCam queries in data_contatiners.get_proposals_by_category(). We have this limit set to 500,000. When I increase the limit, it looks like the NIRCam query returns 540,000 results. So this is the reason why program 4566 has been missing in our results. In the short term, we can increase the MAST query limit. In the long term, we need to find a better way to get a list of proposal numbers and category types for a given instrument. I'll make a separate JIRA issue for that work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant