Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup docker template access to EFS #302

Merged
merged 22 commits into from
Jan 8, 2024

Conversation

laspsandoval
Copy link
Contributor

@laspsandoval laspsandoval commented Jan 3, 2024

Change Summary

Overview

Sets up docker template that can be used to push images to the ECR, provides instructions of how to do that, and provides access to EFS containing SPICE kernels.

New Files

  • Dockerfile
    • Dockerfile that installs imap_processing and its dependencies
  • docker.rst
    • Provides description of how to build and run docker image. Also contains information of how to use the image in the ECR

Deleted Files

None

Updated Files

  • run_processing.py
    • I added a really brief example of how spice kernels from the ECR could be accessed.
  • index.rst
    • Added docker.rst to toctree.
  • poetry.lock
    • Updated

Testing

  • I made some minor changes to sds-data-manager, which I will describe in another PR for that repo. I then populated the EFS using the lambda Tenzin created (worked beautifully by the way). And then I executed a Step Function manually and looked at the Batch Job log. Picture is attached.
Screenshot 2024-01-03 at 11 30 07 AM

@laspsandoval laspsandoval linked an issue Jan 3, 2024 that may be closed by this pull request
Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in sds-data-manager? You have AWS information here, and the processing repository should be agnostic to any AWS work and able to be run locally.

I'm not following why we need all these steps... pip install imap_processing should get us all the scripts we need to do any processing steps. So it should be a 2-3 line Dockerfile I'd hope. (You can also remove the install git if you change to install from a zip archive for now)

  • The script should be installed and accessed as imap_processing --flags.
  • If tools/ are needed during processing, then those should probably be moved up into the main package to get their dependencies automatically?

Dockerfile Outdated

# Copy over only the necessary scripts
COPY imap_processing/run_processing.py $IMAP_PROCESS_DIRECTORY/run_processing.py
COPY tools $IMAP_PROCESS_DIRECTORY/tools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to copy tools folder? I think it's only used locally to create XTCE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the example I had of how to use the kernels, which is what I was using the tools folder for. But I removed the example.

Dockerfile Outdated
RUN pip install git+https://github.com/IMAP-Science-Operations-Center/imap_processing.git@dev

# Copy over only the necessary scripts
COPY imap_processing/run_processing.py $IMAP_PROCESS_DIRECTORY/run_processing.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should come in the pip package. If it didn't, we could find out why.

Dockerfile Outdated
RUN mkdir -p /mnt/spice

# Define the entrypoint of the container
ENTRYPOINT ["python", "/opt/imap/run_processing.py"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it but I wonder if you could change this to the pip path. Then you won't need to copy in above line.

Suggested change
ENTRYPOINT ["python", "/opt/imap/run_processing.py"]
ENTRYPOINT ["python", "<pip path>/imap_processing/run_processing.py"]

I believe you can get by doing pip show imap_processing

To build the image run the following command from the directory containing the Dockerfile. You might add -t option to tag your image
and --rm to remove intermediate containers after the build is done.

`docker build -t <image name> --rm .`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`docker build -t <image name> --rm .`
`docker build -t <image name>:<tag name> --rm .`


Now we can run our image.

`docker run --rm -it --volume="$(pwd)/imap_processing/efs:/mnt/spice" <image name> --instrument <instrument> --level <level>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`docker run --rm -it --volume="$(pwd)/imap_processing/efs:/mnt/spice" <image name> --instrument <instrument> --level <level>`
`docker run --rm -it --volume="$(pwd)/imap_processing/efs:/mnt/spice" <image name>:<tag name> --instrument <instrument> --level <level>`

Comment on lines 29 to 37
Build the Docker image.

`docker build -t <image name> .`

Tag the image and push to the ECR.

`docker tag <tag> <ECR URI>`

`docker push <ECR URI>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the place where we would set image tag with version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can assign the tag name during the build :)

Copy link
Contributor

@tech3371 tech3371 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Thank you!

@@ -36,11 +37,10 @@ def _parse_args():
f"The data level to process. Acceptable values are: {processing_levels}"
)

parser = argparse.ArgumentParser(description=description)
parser = argparse.ArgumentParser(prog="imap_cli", description=description)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what this was but looks like it was from poetry. cool

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, since this is mostly documentation at this point I'm fine with keeping it in this repository since it is more public-facing. But, I think the Dockerfile should be moved out of our base repository level.

@@ -7,17 +7,18 @@

Use
---
python run_processing.py <instrument> <data_level>
python cli.py <instrument> <data_level>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imap_cli --instrument <instrument> --level <data_level>

@@ -162,11 +162,12 @@ def process(self):
print(f"Processing IMAP-Ultra {self.level}")


if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need the if __name__ == "__main__" call main() block for running it as python cli.py` So it is best to leave that in there for either option.

Dockerfile Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this doesn't live at the root level. Maybe move it into an examples/ folder?

Also perhaps call it some_unique_name.Dockerfile or Dockerfile.some_unique_name so it isn't just a base Dockerfile?

Copy link
Collaborator

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear explanations and descriptions, I think this is nice with the explicit example.

I would still like to see the cli_args removed if that isn't actually needed/used.


from imap_processing import instruments, processing_levels


def _parse_args():
def _parse_args(cli_args: list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the cli_args here and in main()? I was pretty sure in my testing that there was not extra args passed into the script's main function call...

https://docs.python.org/3/library/argparse.html#parsing-arguments

In a script, parse_args() will typically be called with no arguments, and the ArgumentParser will automatically determine the command-line arguments from sys.argv.

To build the image run the following command from the directory containing the Dockerfile. You might add -t option to tag your image
and --rm to remove intermediate containers after the build is done.

`docker build -t <image name>:<tag name> --rm .`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to give an example of how to build the image you added? You'll need the -f Dockerfile.name now because it isn't the default Dockerfile (sorry for the misdirection there)!

Suggested change
`docker build -t <image name>:<tag name> --rm .`
`docker build -f examples/Dockerfile.efs -t <image name>:<tag name> --rm .`

@laspsandoval laspsandoval merged commit 62b3703 into IMAP-Science-Operations-Center:dev Jan 8, 2024
14 checks passed
@bourque
Copy link
Collaborator

bourque commented Jan 31, 2024

@all-contributors please add @laspsandoval for infrastructure and ideas

Copy link
Contributor

@bourque

I've put up a pull request to add @laspsandoval! 🎉

@laspsandoval laspsandoval self-assigned this Feb 7, 2024
laspsandoval added a commit to laspsandoval/imap_processing that referenced this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup docker template access to EFS
4 participants