Cloudify reV #335

MRossol · 2021-10-20T16:08:48Z

Why this feature is necessary:
Enable the reV team and general public to run reV at scale in the cloud using OEDI datasets as inputs.

A possible solution is:
Current cloud readiness of reV's modules:

Gen: Cloud ready using HSDS and WTK/NSRDB data in OEDI
Econ: Cloud ready using HSDS and WTK/NSRDB data in OEDI
Aggregation: Needs exclusion files to be loaded into HSDS, plan is to load FY20 and FY21 final inclusion layers into OEDI before 10/29. Complex aggregation schemes would require all exclusion layers to be loaded into HSDS/OEDI.
Supply Curve: For internal runs we can use the xmission tables we've generated. For the public we need to determine:
a) If they can use our xmission tables or if we can "anonymize" our xmission tables for public use, otherwise
b) We need to create a lower fidelity open-source set of xmission tables

Potential performance issue:

HSDS currently has very limited throughput interms of number of concurrent requests. This will be an issue for generation. Fortunately we hope to ameliorate the issue by moving HSDS to lambda. This should be done by the end of FY21 Q1 and should enable nearly infinite scalability in terms of parallel requests. To properly take advantage of this, it could be useful to implement this feature in rex: parallel gets for list slices rex#111
Using gen/econ .h5 files in agg and supply curve. To solutions:
1. Move the files from S3 to local storage (not needed with LFS below)
2. Implement cloud_fs/s3fs into rex: Add ability to use cloud_fs/s3fs along with HSDS rex#112

Cloud hardware / submission ideas:

The simplest solution would be to use AWS's HPC like infrastructures either Parallel Cluster or Batch.
- Pros:
  - Can use the SLURM infrastructure in reV/rex
  - Has a luster (LFS) filesystem that syncs to S3 attached to support writing outputs
- Cons:
  - More complicated/expensive than using a single EC2 instance
A more flexible solution would be to dockerize reV and integrate cloud_fs to handle transfer of output files from the compute storage to S3.
- Pros:
  - Runnable on nearly any hardware (EC2, ECS, Kubernetes, Lambda)
  - Likely cheaper
- Cons:
  - Need to consider attached storage for writing outputs
  - Would need a third party solution to launch complicated (multi-node/job) runs

Charge code
reV

Urgency / Timeframe
FY22 design Doc

grantbuster · 2021-11-17T23:12:08Z

Notes and stuff:
Configure aws hpc: https://www.hpcworkshops.com/03-hpc-aws-parallelcluster-workshop/04-configure-pc.html
Ssh into login node and squeue access: https://www.hpcworkshops.com/03-hpc-aws-parallelcluster-workshop/07-logon-pc.html
Sbatch commands just like eagle: https://www.hpcworkshops.com/03-hpc-aws-parallelcluster-workshop/08-run-1stjob.html
high performance file storage and transfer to s3 (not even really sure if necessary if reading from s3/hsds?): https://www.hpcworkshops.com/04-amazon-fsx-for-lustre.html

In this lab, the cluster has 0 compute nodes when starting and maximum size set to 8 instances. AWS ParallelCluster will grow and shrink between the min and max limits based on the cluster utilization and job queue backlog.

A GP2 Amazon EBS volume will be attached to the head-node then shared through NFS to be mounted by the compute nodes on /shared. It is generally a good location to store applications or scripts. Keep in mind that the /home directory is shared on NFS as well.

SLURM will be used as a job scheduler

grantbuster · 2021-12-21T16:30:29Z

PR #339

grantbuster · 2021-12-21T17:32:51Z

Implemented here: https://github.com/NREL/reV/tree/main/examples/aws_pcluster

Still need work to upload data for exclusions / transmission costs but no technical barriers for that.

MRossol added the feature New feature or request label Oct 20, 2021

MRossol mentioned this issue Oct 20, 2021

parallel gets for list slices NREL/rex#111

Open

MRossol assigned grantbuster and MRossol Oct 20, 2021

grantbuster closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloudify reV #335

Cloudify reV #335

MRossol commented Oct 20, 2021 •

edited

Loading

grantbuster commented Nov 17, 2021

grantbuster commented Dec 21, 2021

grantbuster commented Dec 21, 2021

Cloudify reV #335

Cloudify reV #335

Comments

MRossol commented Oct 20, 2021 • edited Loading

grantbuster commented Nov 17, 2021

grantbuster commented Dec 21, 2021

grantbuster commented Dec 21, 2021

MRossol commented Oct 20, 2021 •

edited

Loading