Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Science Tutorial Fixes #218

Merged
merged 16 commits into from
Jun 28, 2023
Merged

Broken Science Tutorial Fixes #218

merged 16 commits into from
Jun 28, 2023

Conversation

smk0033
Copy link
Contributor

@smk0033 smk0033 commented Jun 6, 2023

Updating science tutorials that are broken. Reference ticket: https://github.com/orgs/NASA-IMPACT/projects/29?pane=issue&itemId=27480247 (issues are under each individual ticket). Wanted to go through and fix errors to make sure notebooks were fully functional before going in and updating per new guidelines.

Fixes so far:

  • NISAR tutorial: updated paths
  • GEDI_L2B: updated CMR host and removed a search limit, otherwise there was an "invalid token" error when searching for the granule

Will commit other science tutorials as they get fixed, and will create a separate PR for technical tutorial fixes after.

@smk0033 smk0033 added bug Something isn't working documentation Improvements or additions to documentation labels Jun 6, 2023
@smk0033 smk0033 requested review from wildintellect and jjfrench June 6, 2023 20:20
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@wildintellect wildintellect left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smk0033
Copy link
Contributor Author

smk0033 commented Jun 6, 2023

Sorry, definitely meant GEDI_L2B tutorial in the commit above

@smk0033 smk0033 force-pushed the broken-science-fixes branch from a657262 to c92ddd8 Compare June 6, 2023 20:46
@smk0033
Copy link
Contributor Author

smk0033 commented Jun 7, 2023

@wildintellect I created a new data directory for the .h5 file that downloads. Should the HH files also go into that new directory? I wasn't quite sure on how to change the path for those if so, so they still just download into the NISAR file

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 7, 2023

View / edit / reply to this conversation on ReviewNB

wildintellect commented on 2023-06-07T19:55:13Z
----------------------------------------------------------------

Use the same block fro the NISAR example

# set data directory
dataDir = './data'

check if directory exists -> if directory doesn't exist, directory is created

if not os.path.exists(dataDir):
    os.mkdir(dataDir)

then the getData(./) can become getData(dataDir)



@smk0033 smk0033 mentioned this pull request Jun 13, 2023
6 tasks
@smk0033
Copy link
Contributor Author

smk0033 commented Jun 13, 2023

Tutorials updated so far: NISAR, GEDI_L2B and GEDI_L4B.
GEDI_L4A is in progress.
@wildintellect since the GEDI_02A/L2A dataset was removed from STAC, should we use another collection or convert to a CMR tutorial?

@smk0033 smk0033 requested a review from wildintellect June 13, 2023 20:51
@omshinde omshinde self-requested a review June 15, 2023 21:42
@smk0033 smk0033 marked this pull request as ready for review June 16, 2023 18:23
@smk0033
Copy link
Contributor Author

smk0033 commented Jun 16, 2023

I think this PR is fully ready for review. I'll probably put GEDI_L2A in it's own PR since I'll have to use a new collection

@omshinde
Copy link
Contributor

Thanks @smk0033
I am working on reviewing this PR.
CC: @wildintellect @jjfrench

@omshinde omshinde self-requested a review June 20, 2023 19:50
@@ -2,15 +2,15 @@
"cells": [
Copy link
Contributor

@omshinde omshinde Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #7.    maap = MAAP(maap_host="api.ops.maap-project.org")

Please check if the host should be api.ops.maap-project.org or api.maap-project.org.


Reply via ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:08:51Z
----------------------------------------------------------------

ade.ops.maap-project.org is modified to ade.maap-project.org .

Is api.ops.maap-project.org going to be same or should it be api.maap-project.org ? (Line #2)

PS: I checked both and both work as of now.


smk0033 commented on 2023-06-20T20:16:37Z
----------------------------------------------------------------

I'll test it with api.maap-project.org again, it may have originally not worked due to that bug (or maybe I had gotten it confused by accident) so I'll update it if it runs using api.maap-project.org

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:08:52Z
----------------------------------------------------------------

Check typo

Shenadoah -> Shenandoah national park

(https://www.nps.gov/shen/index.htm)


smk0033 commented on 2023-06-20T20:14:09Z
----------------------------------------------------------------

Good catch, I definitely didn't notice that. Thanks!!

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:13:24Z
----------------------------------------------------------------

maap = MAAP(maap_host="api.ops.maap-project.org")

Should the host be api.ops.maap-project.org or api.maap-project.org?


jjfrench commented on 2023-06-20T20:33:07Z
----------------------------------------------------------------

For maap.profile.account_info()["username"]: api.maap-project.org for ade.maap-project.org, api.ops.maap-project.org for ade.ops.maap-project.org

They don't have job submissions setup for api.maap-project.org yet though, you'll only get 500's

smk0033 commented on 2023-06-20T20:40:58Z
----------------------------------------------------------------

Yeah, sometimes it works and sometimes it doesn't, at least with the .ops in it

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:13:25Z
----------------------------------------------------------------

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 14
      1 inputs = dict(
      2     aoi=aoi,
      3     doi="L4A",
   (...)
     11     output="gedi_subset.gpkg"
     12 )
---> 14 result = maap.submitJob(
     15     identifier="gedi-subset",
     16     algo_id="gedi-subset_ubuntu",
     17     version="0.6.0",
     18     queue="maap-dps-worker-32gb",
     19     username=username,
     20     **inputs,
     21 )
     23 job_id = result["job_id"]
     24 job_id or result

File /maap-py/maap/maap.py:331, in MAAP.submitJob(self, **kwargs)
    329 job = DPSJob()
    330 job.set_submitted_job_result(response)
--> 331 job.retrieve_attributes()
    332 return job

File /maap-py/maap/dps/dps_job.py:116, in DPSJob.retrieve_attributes(self)
    115 def retrieve_attributes(self):
--> 116     self.retrieve_status()
    117     if self.status.lower() in ["succeeded", "failed"]:
    118         try:

File /maap-py/maap/dps/dps_job.py:76, in DPSJob.retrieve_status(self)
     70 logger.debug(headers)
     71 response = requests.get(
     72     url=url,
     73     verify=self.__not_self_signed,
     74     headers=headers
     75 )
---> 76 self.set_job_status_result(RequestsUtils.check_response(response))
     77 return self.status

File /maap-py/maap/utils/requests_utils.py:25, in RequestsUtils.check_response(dps_response)
     22 @staticmethod
     23 def check_response(dps_response):
     24     if dps_response.status_code not in [200, 201]:
---> 25         raise RuntimeError('response is not 200 or 201. code: {}. details: {}'.format(dps_response.status_code,
     26                                                                                       dps_response.content))
     27     return dps_response.content.decode('UTF-8')

RuntimeError: response is not 200 or 201. code: 500. details: b'<ows:ExceptionReport xmlns:wps="http://www.opengis.net/wps/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:schemaLocation="http://schemas.opengis.net/wps/2.0/wps.xsd" xmlns:ows="http://www.opengis.net/ows/2.0"><ows:Exception exceptionCode="FailedGetResult" locator="GetResult"><ows:ExceptionText>Failed to get job result of job with id: status. Aborting retrieving information of job because status is None. If you don\'t see expected results, please contact administrator of DPS</ows:ExceptionText></ows:Exception></ows:ExceptionReport>'

I am getting above output for this config!

However, it works after changing version: 0.5.0 instead of 0.6.0 . I got to know of the current version from the Jobs UI after following https://docs.maap-project.org/en/latest/system_reference_guide/faq/run_dps_job.html


@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:13:27Z
----------------------------------------------------------------

Line #23.    job_id = result["job_id"]

It seems the DPS object is not subscriptable.

Issue:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)
Cell In[23], line 23
1 inputs = dict(
2 aoi=aoi,
3 doi="L4A",
(...)
11 output="gedi_subset.gpkg"
12 )
14 result = maap.submitJob(
15 identifier="gedi-subset",
16 algo_id="gedi-subset_ubuntu",
(...)
20 **inputs,
21 )
---> 23 job_id = result["job_id"]
24 job_id or result

TypeError: 'DPSJob' object is not subscriptable

Although printing result returns a dictionary like output as follows:

`{'job_id': '32701077-4c49-4a52-aa92-ecf07ea39e38', 'status': 'Accepted', 'machine_type': None, 'architecture': None, 'machine_memory_size': None, 'directory_size': None, 'operating_system': None, 'job_start_time': None, 'job_end_time': None, 'job_duration_seconds': None, 'cpu_usage': None, 'cache_usage': None, 'mem_usage': None, 'max_mem_usage': None, 'swap_usage': None, 'read_io_stats': None, 'write_io_stats': None, 'sync_io_stats': None, 'async_io_stats': None, 'total_io_stats': None, 'error_details': None, 'response_code': 200, 'outputs': []}

Workaround:

Replace #23 and #24 by:

job_id = vars(result)['_DPSJob__id']

job_id or result

vars will convert Python object to a dictionary.



wildintellect commented on 2023-06-22T17:40:13Z
----------------------------------------------------------------

Is this officially documented in the system reference section, we should be consistent across examples.

@review-notebook-app
Copy link

review-notebook-app bot commented Jun 20, 2023

View / edit / reply to this conversation on ReviewNB

omshinde commented on 2023-06-20T20:13:28Z
----------------------------------------------------------------

Line #7.        response.raise_for_status()

Getting the following output:

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)
Cell In[55], line 15
7 @backoff.on_predicate(
8 backoff.constant,
9 lambda status: status not in ["Deleted", "Succeeded", "Failed"],
10 interval=120,
11 )
12 def wait_for_job(job_id: str) -> str:
13 return job_status_for(job_id)
---> 15 job_status = wait_for_job(job_id)
17 job_status

File /opt/conda/lib/python3.10/site-packages/backoff/_sync.py:48, in retry_predicate.<locals>.retry(*args, **kwargs)
39 elapsed = timedelta.total_seconds(datetime.datetime.now() - start)
40 details = {
41 "target": target,
42 "args": args,
(...)
45 "elapsed": elapsed,
46 }
---> 48 ret = target(*args, **kwargs)
49 if predicate(ret):
50 max_tries_exceeded = (tries == max_tries_value)

Cell In[55], line 13, in wait_for_job(job_id)
7 @backoff.on_predicate(
8 backoff.constant,
9 lambda status: status not in ["Deleted", "Succeeded", "Failed"],
10 interval=120,
11 )
12 def wait_for_job(job_id: str) -> str:
---> 13 return job_status_for(job_id)

Cell In[45], line 7, in job_status_for(job_id)
5 def job_status_for(job_id: str) -> str:
6 response = maap.getJobStatus(job_id)
----> 7 response.raise_for_status()
9 root = ET.fromstring(response.text)
10 status_element = root.find(".//{http://www.opengis.net/wps/2.0}Status")

AttributeError: 'str' object has no attribute 'raise_for_status'

It seems the return type of maap.getJobStatus is string and it is creating issue with subscripting.

response = maap.getJobStatus(job_id)

type(response)



Copy link
Contributor Author

smk0033 commented Jun 20, 2023

Good catch, I definitely didn't notice that. Thanks!!


View entire conversation on ReviewNB

Copy link
Contributor Author

smk0033 commented Jun 20, 2023

I'll test it with api.maap-project.org again, it may have originally not worked due to that bug (or maybe I had gotten it confused by accident) so I'll update it if it runs using api.maap-project.org


View entire conversation on ReviewNB

Copy link
Member

jjfrench commented Jun 20, 2023

For maap.profile.account_info()["username"]: api.maap-project.org for ade.maap-project.org, api.ops.maap-project.org for ade.ops.maap-project.org
It probably has to do with where we registered the algorithm that you'll hit 500's


View entire conversation on ReviewNB

Copy link
Contributor Author

smk0033 commented Jun 20, 2023

Yeah, sometimes it works and sometimes it doesn't, at least with the .ops in it


View entire conversation on ReviewNB

@smk0033 smk0033 requested review from omshinde and jjfrench June 21, 2023 16:42
@smk0033
Copy link
Contributor Author

smk0033 commented Jun 21, 2023

Re-requesting reviews to see if this looks good to have approved and merged

@smk0033 smk0033 requested a review from emileten June 21, 2023 18:36
Copy link
Collaborator

Is this officially documented in the system reference section, we should be consistent across examples.


View entire conversation on ReviewNB

@@ -2,15 +2,15 @@
"cells": [
Copy link
Collaborator

@wildintellect wildintellect Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code showed how to find the collection shortname. We should either drop this section (and link to an example of How to find a Collection name), or revert back.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous way didn't work for some reason, not really sure why but it wouldn't pull that collection even though the short name was correct. There was also something wrong with pprint and it wouldn't print it, so that's why we reverted to this way of pulling with the short name

Copy link
Member

@jjfrench jjfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good!

@smk0033 smk0033 merged commit 6a1b343 into develop Jun 28, 2023
@smk0033 smk0033 deleted the broken-science-fixes branch June 28, 2023 15:16
@omshinde
Copy link
Contributor

Thanks for the changes @smk0033 and @jjfrench ..Although, I am late to the party, it looks good to me as well! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants