Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jobs UI: make it easy to copy/paste the output file paths #656

Closed
rtapella opened this issue Jan 11, 2023 · 23 comments
Closed

jobs UI: make it easy to copy/paste the output file paths #656

rtapella opened this issue Jan 11, 2023 · 23 comments
Assignees
Labels
DPS Data Processing Subsystem Usability Created based on usability test or usability issue
Milestone

Comments

@rtapella
Copy link
Collaborator

Jobs UI Usability Test

Typically someone will go from the "output" (== "product") section of a job result and go to that folder.

  1. make it easy to copy/paste a PATH to a file so you can cd into it in the Terminal
  2. make it easy to go to the path of an output in the Jupyter file-browser

The file itself is not as important to copy/paste as the folder that it's in (for cd)

@rtapella rtapella added the Usability Created based on usability test or usability issue label Jan 11, 2023
@marjo-luc marjo-luc self-assigned this Jan 11, 2023
@rtapella
Copy link
Collaborator Author

"For notebooks there is a "show in file browser" option. That would work"

@grallewellyn
Copy link
Collaborator

I know I can get the path of the file by extracting it from the product url, but is it also possible to use job_dir? I don't see it in the job information from dps, only job_dir_size
Private Zenhub Image

@marjo-luc

@rtapella
Copy link
Collaborator Author

rtapella commented Nov 2, 2023

the jobs UI already has the path of the outputs. this is just to make it easier to either browse to them, or copy/paste the path for Terminal commands.

does that help?
just the folder is okay.

@grallewellyn
Copy link
Collaborator

@rtapella Where is the path to the outputs? I don't see it in the 'jobInfo' object
I also don't see the path in the Jobs UI

@rtapella
Copy link
Collaborator Author

rtapella commented Nov 2, 2023

If you select a job in the list the bottom panel should have details of that job. One of the sub-tabs is Outputs, which should list the path(s).

@grallewellyn
Copy link
Collaborator

I only see the links
Private Zenhub Image

@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 5, 2024

To update on this ticket, we are going to add the file path to the jobs object by modifying the maap-api-nasa repository get_mozart_job_info function (highest level so changes to that job trickle down).
It will be done by parsing the product urls, for the folder names "dps_output" or "triaged-jobs" or "triaged_job"
@marjo-luc Will the output of all successfully completed jobs always be put into either of those 3 folders?

@marjo-luc
Copy link
Member

@grallewellyn As far as I know, that's correct.

@sujen1412 Do you recall what process creates the dps_output directory we see in a workspace? I'd like to confirm the above is correct.

@sujen1412
Copy link
Collaborator

sujen1412 commented Jan 8, 2024

The dps_output dir is mounted in from the workspace bucket for the logged in user and the job outputs are placed in that directory by DPS. It is created the first time a user runs a job by the DPS.

@marjo-luc
Copy link
Member

It is created the first time a user runs a job by the DPS

Do you know what creates the directory when a job is run for the first time by a user? MAAP-API?

@sujen1412
Copy link
Collaborator

Since that path is an S3 object path, there is no concept of a directory here.
The dps_output path shows up when the DPS uploads the users algorithms output files to S3.

The files here are S3 objects with key containing dps_output in them. Viewing them in the workspace using S3Fuse makes them look like a directory structure.

@marjo-luc
Copy link
Member

DPS uploads the users algorithms output files to S3

What in the DPS does this?

@sujen1412
Copy link
Collaborator

The dataset_ingest script is responsible for pushing datasets to S3. https://github.com/hysds/hysds/blob/1054c0588ff7a8b9875932581010d37502662a2e/hysds/dataset_ingest.py#L595

@grallewellyn grallewellyn added this to the 3.1.5 milestone Jan 11, 2024
@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 11, 2024

@sujen1412 What are the cases when nothing is in "products_staged"? I have a couple job examples with products_staged as an empty array. The job failed, but some other failed jobs have urls in their "products_staged" (ie /triaged_job/...)

@sujen1412
Copy link
Collaborator

This particular example is one of the cases where there was no configured dataset recognition which means the job might not have created an output directory and the timestamp seems like its really old job metadata before job_triage was implemented.

Do you see this in newer jobs ? Say in the last 2 months?

@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 11, 2024

Sujen and I resolved this. The problem was the job failed to even download the users container so it never started docker which caused it to never produce output or triage. I only had a couple jobs that were missing the products_staged, so we will leave the file paths empty for those jobs since there really isn't anything to put

In ops, I have a triaged-jobs folder and a dps_output folder (in my private bucket), but I do not have a triaged_job folder so I don't know where the failed jobs are going since this is the file path outputs gives. Are other people seeing the same thing? Does anyone know what might be causing this problem so I can look into it?

Also, it seems like I will need to hard code in adding my-private-bucket to the dps_output file path so that my "Open in File Browser" button can accurately go to the file

@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 18, 2024

The update from the hackathon:
We need to change the devfiles from

- name: s3fs-volume
  mountPath: /projects/triaged-jobs
  subPath: triaged-jobs
  mountPropagation: HostToContainer

to

- name: s3fs-volume
  mountPath: /projects/triaged_job
  subPath: triaged-jobs
  mountPropagation: HostToContainer

So that triaged_job is now a directory that contains the folders with failed jobs. @bsatoriu What would making this change entail?

Also, it is okay to hard code in my-private-bucket to the file path, we want to make sure dps_output is private

@bsatoriu
Copy link
Collaborator

@grallewellyn just make sure these changes are committed to the appropriate devfiles (e.g. https://github.com/MAAP-Project/maap-workspaces/tree/main/devfiles/vanilla/devfile). I will publish these changes manually for the v3.1.4 release.

I have a story for v3.1.5 to automate this step: #894

@grallewellyn
Copy link
Collaborator

grallewellyn commented Jan 18, 2024

@bsatoriu Do we want this done for the v3.1.4 release or is development supposed to be done?

Also, dps_output is outside my-private-bucket in DIT, but inside my-private-bucket for ops. Do we want to make this consistent that both dit and ops have dps_output inside my-private bucket?

@marjo-luc
Copy link
Member

Can we push this to v3.1.5?

@rtapella
Copy link
Collaborator Author

Sure. I think this was a bit more complicated than we expected and if the opportunity cost on other features is too high, it's okay to push back a release.

@grallewellyn
Copy link
Collaborator

What is the use case for mozart_response["result"]["job"]["job_info"]["metrics"]["products_staged"] being multiple elements? I wrote a jupyter notebook to check all usernames from the shared-buckets folder, get all their jobs (a total of 2839 which seems low, but that is what is being returned by https://ade.maap-project.org/serverliuxxxxx-ws-jupyter/server-3100/jupyter-server-extension/listUserJobs?username=), and there were no instances of products_staged being multiple elements.

@bsatoriu
Copy link
Collaborator

@grallewellyn this PR fixes the file url copy bug for the triaged-jobs mounted directory: MAAP-Project/maap-api-nasa#101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DPS Data Processing Subsystem Usability Created based on usability test or usability issue
Projects
None yet
Development

No branches or pull requests

5 participants