You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inspired by the MAST Help Desk question INC0196521
archive_database_update is currently failing because there is a NIRCam program number that is returned in our pyvo query, but then not present in the results of a later MAST query. This is resulting in a key error because the dictionary of program numbers is missing '4566'.
The reason why 4566 is missing from the MAST results is not yet clear. But in the meantime, we shouldn't allow archive_database_update to fail and skip its update on all the other new data just because it can't find 4566. (Especially since NIRCam is the first instrument checked.)
We should update the code so that in this case, it just skips 4566 and moves on to the next program. But then we also need it to alert us to the failure so that we can look into that particular program.
On a related note, even though archive_database_update has been failing, the Dashboard claims that all has been well with the times the code has run. The log files do not report SUCCESS at the end, which should be what the dashboard is looking for. So it seems that there must be some loophole that is causing the dashboard to not see and report this as a failure.
Here is some text I added to the MAST Help Desk ticket above that describes the problem:
I've dug into this a bit from the JWQL side. It turns out that our script that checks for new data in MAST has been failing recently. So that explains the lack of recent data in JWQL. But, it's not clear to me why our script is failing. I suspect there may be something odd about PID 4566, which is the program causing our crashes. We get a list of proposals by instrument in a couple of different ways in our code base.
The first way, using pyvo, does include program 4566 in the results (for NIRCam):
(hopefully STARS doesn't mangle the formatting of this code too much)
tap_service = vo.dal.TAPService("https://vao.stsci.edu/caomtap/tapservice.aspx")
tap_results = tap_service.search(f"""select distinct prpID from CaomObservation where collection='JWST'
and maxLevel>0 and insName like '{instrument.lower()}%'""")
prop_table = tap_results.to_table()
proposals = prop_table['prpID'].data
inst_proposals = sorted(proposals.compressed(), reverse=True)
But with the other method we use, querying MAST's Filtered database for NIRCam, does not include 4566 in the results:
Bradley Sappington has made changes to make archive_database_update robust against this kind of error. Now, if a program is missing from the MAST results but is present in the pyvo results, JWQL will simply declare that program's category type to be MISSING and move on, rather than crashing.
After talking to Dick Shaw and some further investigation, it looks like we are hitting the MAST_QUERY_LIMIT for our NIRCam queries in data_contatiners.get_proposals_by_category(). We have this limit set to 500,000. When I increase the limit, it looks like the NIRCam query returns 540,000 results. So this is the reason why program 4566 has been missing in our results. In the short term, we can increase the MAST query limit. In the long term, we need to find a better way to get a list of proposal numbers and category types for a given instrument. I'll make a separate JIRA issue for that work.
Issue JWQL-160 was created on JIRA by Bryan Hilbert:
Inspired by the MAST Help Desk question INC0196521
archive_database_update is currently failing because there is a NIRCam program number that is returned in our pyvo query, but then not present in the results of a later MAST query. This is resulting in a key error because the dictionary of program numbers is missing '4566'.
The reason why 4566 is missing from the MAST results is not yet clear. But in the meantime, we shouldn't allow archive_database_update to fail and skip its update on all the other new data just because it can't find 4566. (Especially since NIRCam is the first instrument checked.)
We should update the code so that in this case, it just skips 4566 and moves on to the next program. But then we also need it to alert us to the failure so that we can look into that particular program.
On a related note, even though archive_database_update has been failing, the Dashboard claims that all has been well with the times the code has run. The log files do not report SUCCESS at the end, which should be what the dashboard is looking for. So it seems that there must be some loophole that is causing the dashboard to not see and report this as a failure.
Here is some text I added to the MAST Help Desk ticket above that describes the problem:
I've dug into this a bit from the JWQL side. It turns out that our script that checks for new data in MAST has been failing recently. So that explains the lack of recent data in JWQL. But, it's not clear to me why our script is failing. I suspect there may be something odd about PID 4566, which is the program causing our crashes. We get a list of proposals by instrument in a couple of different ways in our code base.
The first way, using pyvo, does include program 4566 in the results (for NIRCam):
(hopefully STARS doesn't mangle the formatting of this code too much)
tap_service = vo.dal.TAPService("https://vao.stsci.edu/caomtap/tapservice.aspx")
tap_results = tap_service.search(f"""select distinct prpID from CaomObservation where collection='JWST'
and maxLevel>0 and insName like '{instrument.lower()}%'""")
prop_table = tap_results.to_table()
proposals = prop_table['prpID'].data
inst_proposals = sorted(proposals.compressed(), reverse=True)
But with the other method we use, querying MAST's Filtered database for NIRCam, does not include 4566 in the results:
service = "Mast.Jwst.Filtered.{}".format(instrument)
params = {"columns": "program, category",
"filters": [{'paramName': 'instrume', 'values': [instrument]}]}
response = Mast.service_request_async(service, params)
results = response[0].json()['data']
Get all unique dictionaries
unique_results = list(map(dict, set(tuple(sorted(sub.items())) for sub in results)))
Make a dictionary of {program: category} to pull from
proposals_by_category = {d['program']: d['category'] for d in unique_results}
Do you have any idea why program 4566 is not included in the results from MAST? Observations 1 and 2 for 4566 executed on Dec 25.
The text was updated successfully, but these errors were encountered: