Ecarton/cumulus 3751 from 18.5.2 #3900

etcart · 2025-01-10T16:24:30Z

Summary: task which takes granules and a target collection, updating granules to belong to that target collection in s3 and cumulus datastore, idempotently

Addresses CUMULUS-3751: Workflow task that updates a granule to a new collection

Changes

adds new task to update S3 and cumulus data stores moving granules across collections
adds integration tests and associated resources in the example project
re-distributes ecs cluster resources among ecs tasks in example project

PR Checklist

Update CHANGELOG
Unit tests
Ad-hoc testing - Deploy changes and test manually
Integration tests

* Adds 18.3.4 to change log * Adds 18.3.4 docs

* Release 19 * Release 19 commit after step 8

* imf retry for db/test-execution.js * sync granule imfs * mistake in sleep import * some clearer debug information in syncgranules test * sync_granules test is failed by running in a loop??? * why is this granuleId just wrong * lint fixes * ensure extraneous keys are bleedover from another test * typo in debug printout * typo in debug printout * turn up timeout in aws-client * json typo * clean up granules and granuleExecutions after each serial test to prevent mis-grabs * remove retry because there's something else going wrong * getWorkflowNameIntersectFromGranuleIds gets unique names * linter cleanups * newline cleanup * tests for get endpoint in separate file (serialized and cleaned up) * cleanup from excising test-granules-get * linter fixes * put test accidentally removed * need to check unreleased syntax * changelog for unit tests * bringing in changes from latest master * pulling over tests from latest master * bring granules tests inline with es removal changes * remove uncertain test retry * getWorfklowNameIntersectFromGranuleIds sorts in js to avoid timestamp batch * testing consistency of behavior across time collisions and single/multi granule * linter fixes * remove onlys * postgres returns timestamp as min * min only on numberOfGranules === 1 * semicolon

) * add missing quote update moment version * Update changelog * update pg version * remove hoex

* Fix isThrottlingException function to check error name * update changelog and add name/code check in errors * linter fix * changelog * typo fix --------- Co-authored-by: Hailiang Zhang <hailiang.zhang@nasa.gov> Co-authored-by: etcart <amberhosen@gmail.com>

…s_table (#3805) Co-authored-by: etcart <amberhosen@gmail.com>

* Update deployment templates for Aurora Serverless v2 (#3623) * update CL * update terraform templates to serverless v2 * add terraform variable validation * remove upgrade variables * add prevent_destroy = true * add prevent_destroy = true * CUMULUS-3670 Develop upgrade/migration process Aurora Serverless v1 to v2 (#3643) * remove prevent_destroy to allow automated CI migrations * set force_ssl = 0 (#3658) Co-authored-by: Tim Clark <tim.clark@nasa.gov> * [CUMULUS-3671]: Update docs for Serverless V2 (#3666) * initial commit * serverless v2 doc updates * Update serverless V2 docs * Fix lint issue * set DISABLE_PG_SSL: true to support CI * fix lint error * set disableSSL = true * remove DISABLE_PG_SSL * set rejectUnauthorized: 'false' * update CL for v2 changes * fix changelog * add migration notes to changelog, add v2 docs to sidebar * fix changelog --------- Co-authored-by: Tim Clark <tim.clark@nasa.gov> Co-authored-by: Nate Pauzenga <npauzenga@gmail.com>

) * Update AWS errors to use the V3 error classes * Fix lint * Import aws sdk directly to avoid circular dep * Update CL * Remove module in favor of aws imports directly * Revert change to ThrottlingException error type * Add comments * Fix lint * Remove unnecessary dependency * add debug logging for CI * update type and debug comment * temporarily revert to name checks * Remove logging and type check on conditional exception. Instance of does not work in this case. I believe we're calling the service "dynamodbDocClient" using the non-V3 syntax. * Fix lint * Update tests to throw correct aws-sdk error * Update tests with new aws-sdk error types * Import error type correctly * Correctly import sfn error * Instantiate errors like I know what I'm doing * Basic syntax 🤦 * update tests * Remove unnecessary comment * fixup for clarity * Update test for clarity * Update test fixture and logging for consistency

* Allow override of sfEventSqsLambda timeout with associated queue adjustments * Update CHANGELOG * Respond to PR feedback * Update per PR feedback request

* CUMULUS-3906 - Update to ORCA v10.0.0 * Resolved CL conflict. * Removes required wording for 3906 from CL

* Fixes merge conflict * Adds diff link for v18.5.0

…e-granules-cmr-metadata-… (#3791) * Added excludeFileRegex configuration to update-granules-cmr-metadata-file-links (#3790) Updated tests to exercise new file-exclusion feature * linter fixes * remove explicit null for un-found regexpattern * switch to logging when no excludable files found * changelog broken into multiple lines * linter fixes in changelog * name in changelog after lambda function name * remove TODO. non-mocked is a truer representation of function * small refactor * typo in passthrough of fileregex * nyc values with new tests * version requirement update * fixed merge weirdness * fix jsonpath in the other places it's flagged * remove unneeded explicit pin in aws-client * check like instead of deepequal on credentials return --------- Co-authored-by: Mike Dorfman <42116953+mikedorfman@users.noreply.github.com>

* Fix cumulus versions due to bad merge * Update aws-sdk versions to revert bad merge

* update dependencies to latest cma, cma-python, cumulus-process * changelog * fixed shas in locks * whitelist jsonpath for buiggy audit behavior * remove incorrect changelog entry --------- Co-authored-by: etcart <amberhosen@gmail.com>

* CUMULUS-3891: Add fastGet download option to sftp data file download * add sftpFastDownload config * fix fastDownload boolean vs string * add unit tests * fix aws-client services unit test * test SFTP_DEBUG * remove only * add changelog entry * remove unused code * remove jsonpath from common * update latest-version and add jsonpath-plus to audit-ci * serial * update readme remove serial * add sftp test

My first PR :-)

…3830) * Update Orca version * Update orca variables for v10 release series * Update orca var to default value * Update Orca version to official release

…ync-granules (#3823) * Iniital commit updating sync-granule behavior * Clean up comments * Update schema config to match changes * Update typings * Fix sync-granules typing * Update config docs to true * Update task README * Update CHANGELOG * Minor fix * Update spec tests with new default hashed granId path * Update @cumulus/types to allow for explicit export of api/collections * Abstract typings to seperate file * Update _ingestGranule param based on PR feedback * Fix unit test not updated on merge * Add method unit tests for collection(name/version)From methods * Fix integration helper * Add hashed path to SyncGranules

…zed (#3832) * CUMULUS-3919:Added terraform variables disableSsl and rejectUnauthorized * disableSsl->disableSSL

…s services. (#3838) * Allowing force_new_deployment to be configurable for ecs services. * Update CHANGELOG --------- Co-authored-by: Michael Hall <mlh0079@uah.edu>

* guarantee non-numeric nonNumericString * bringing in pg8.13 snyk suggestion * trying to kcik up the sync-granules task error * trying to get repeatable tries to sync-granules * focus on just the important part in syncgranule * trying to get publish to work * does only this need to change to import? * move to import function import * reverting bad code for testing reasons * changelog * keep string length the same, no reason to twiddle this

* Release 19.1.0 (#3816) * version bump * Update CL * Update docs * Update CL link * Add note for clarity * update missed deps * Ecarton/cumulus 3928 imf work (#3831) (#3841) * guarantee non-numeric nonNumericString * bringing in pg8.13 snyk suggestion * trying to kcik up the sync-granules task error * trying to get repeatable tries to sync-granules * focus on just the important part in syncgranule * trying to get publish to work * does only this need to change to import? * move to import function import * reverting bad code for testing reasons * changelog * keep string length the same, no reason to twiddle this Co-authored-by: etcart <37375117+etcart@users.noreply.github.com> * fix no top level await (#3843) (#3844) Co-authored-by: etcart <37375117+etcart@users.noreply.github.com> --------- Co-authored-by: etcart <37375117+etcart@users.noreply.github.com>

* Update CHANGELOG from release 18.5.1 * Add CHANGELOG footer * Fix CHANGELOG

jennyhliu · 2025-01-17T16:32:43Z

.eslintignore

@@ -36,3 +36,5 @@ _book/**
 /packages/object-store/*.js
 /packages/lzards-api-client/*.d.ts
 /packages/lzards-api-client/*.js
+/tasks/move-granule-collections/*.js
+/tasks/move-granule-collections/*.d.ts


Are these necessary? I don't see other tasks written in typescripts listed here, e.g. tasks/lzards-backup

no, I'm removin them, these were necessary beofre I moved my compilation products to a dist directory

remove unnecessary eslints

CHANGELOG.md

jennyhliu · 2025-01-17T18:29:07Z

example/cumulus-tf/ecs_move_granule_collections.asl.json

Is this workflow definition? Then it should only have definition, e.g. "event.$": "$", not any example data.

dacde25
fixed (this commit actually introduces a different mistake that's fixed here)
d46049b

jennyhliu · 2025-01-17T18:32:24Z

example/cumulus-tf/ecs_move_granule_collections.tf

If we will have a new endpoint to invoke this workflow, then the workflow should be included as cumulus core tf-modules/ingest or archive. So the users don't need to copy these two their cumulus-tf deployment.
But if the granules applyWorkflow works, then it can be a custom workflow.

as I had envisioned this this was sort of an "example workflow" but if the intention is that this is called by API and does a fairly standard thing, I'll try to move it to tf-modules. I think in principle it should be configurable from existing variables (e.g. prefix) without needing someone to rejigger anything custom.

jennyhliu · 2025-01-17T20:56:32Z

example/spec/parallel/moveGranuleCollections/move-granule-collection-spec-utils.js

+
+const getTargetFiles = (targetUrlPrefix) => [
+  {
+    bucket: 'cumulus-test-sandbox-protected',


The buckets are hardcoded and won't work for all stacks.

fixed here
moving to non-hardcoded buckets

jennyhliu · 2025-01-17T20:58:19Z

example/spec/parallel/moveGranuleCollections/MoveGranuleCollectionWorkflowSpec.js

+    finalFiles = getTargetFiles(targetUrlPrefix);
+    //upload to cumulus
+    try {
+      await setupInitialState(stackName, sourceUrlPrefix, targetUrlPrefix);


We need a few 'real' granules which are published to CMR for integration test.

I think this will need to come once we are updating cmr as a part of this workflow?

jennyhliu · 2025-01-17T21:47:00Z

example/spec/parallel/moveGranuleCollections/MoveGranuleCollectionWorkflowSpec.js

+
+const activityStep = new ActivityStep();
+
+describe('The MoveGranuleCollection workflow using ECS', () => {


What are the source and target collections? I can't tell from the specs.

What are the difference between MoveGranuleCollectionWorkflowSpec and MoveGranuleCollectionsSpec?

The intent is to run movegranulecollectionsSoec as a test of the functionality of the lambda, while the workflow is meant to run the workflow as it would exist with example data, to be added to with CMR, lizards etc when those tickets are done. Right now the only functional difference is that it tests the lambda in the ECS deployment

jennyhliu · 2025-01-17T21:54:27Z

example/spec/parallel/moveGranuleCollections/move-granule-collection-spec-utils.js

+
+const getSourceCollection = (sourceUrlPrefix) => (
+  {
+    files: [


This should be part of the collection configuration, and shouldn't need to specify here. We ingest different granules/files to s3 to avoid conflicts.

jennyhliu · 2025-01-17T21:57:03Z

example/spec/parallel/moveGranuleCollections/move-granule-collection-spec-utils.js

+  }
+);
+
+const getTargetCollection = (targetUrlPrefix) => ({


Same comment: This should be part of the target collection configuration

packages/api-client/src/granules.ts

jennyhliu · 2025-01-17T22:03:04Z

packages/cmrjs/src/types.ts

@@ -0,0 +1,6 @@
+export interface CMRFile {


Is this necessary?

jennyhliu · 2025-01-17T22:18:51Z

tasks/move-granule-collections/README.md

+| ---------- | ---- | ------- | ------ | -----------
+
+| buckets | object | (required) | | Object specifying AWS S3 buckets used by this task
+| collection | object | (required) | | The cumulus-api collection object


This is 'target' collection, right? Should this be part of the task input? Since the task can be invoked to move granules to different collections.

jennyhliu · 2025-01-17T22:32:18Z

tasks/move-granule-collections/src/index.ts

+  sourceGranules: Array<ApiGranuleRecord>,
+  targetGranules: Array<ApiGranuleRecord>,
+  targetCollectionId: string
+): Promise<void> {


May need to reverse the order of updates. Since if granule updates fail, the collection is updated, then the granule won't be processed again.

jennyhliu · 2025-01-17T22:37:53Z

tasks/move-granule-collections/src/index.ts

+  };
+}
+
+async function buildTargetGranules(


So buildTargetGranules does not only build but also update granule metadata in s3?
Where is the metadata got removed?
I'm confused with the sequence of events.

This builds the granule records as they should exist once we're done, so that they can later be used as a roadmap for the updates in s3 and pg

Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>

etcart · 2025-01-20T16:13:22Z

I'm thinking of pulling the example ecs deployment here. it was put in place to allow for exceeding 15 minutes (and other potential resource constraints). but splitting into subsets of granule_ids should allow us to control for that, an date ecs example adds (unnecessary?) complexity to this pr

…e collection

… to 3.726 due to unit failure in > 3.729.0 (#3905) * Pin @aws-sdk/client-s3 to 3.726 due to unit failure in > 3.729.0 * Fixup * Fix dependency concern * Update CHANGELOG

jennyhliu · 2025-01-23T16:17:37Z

tasks/move-granule-collections/.gitignore

I don't see other tasks/* have this file

removed here
remove gitignore from task

jennyhliu · 2025-01-24T17:39:00Z

example/cumulus-tf/ecs_move_granule_collections.asl.json

How are we going to avoid the start of the workflow triggering the granule updates, since the input payload has granules?

paulpilone and others added 30 commits August 28, 2024 13:03

CUMULUS-3848: Merge release v18.3.4 updates. (#3787)

f1ae590

* Adds 18.3.4 to change log * Adds 18.3.4 docs

CUMULUS-3799: Prepare Release 19.0 (#3786) (#3789)

8098923

* Release 19 * Release 19 commit after step 8

Fix ecs_cluster autoscaling cf template, upgrade package versions (#3793

ca9ac12

) * add missing quote update moment version * Update changelog * update pg version * remove hoex

Merge branch 'master' of https://github.com/nasa/cumulus

ef1dacc

reintroduce Migration Count Report to migrations with async_operation…

0f0cd98

…s_table (#3805) Co-authored-by: etcart <amberhosen@gmail.com>

Merge branch 'master' of https://github.com/nasa/cumulus

67b6145

Jk/cumulus 3020 (#3800)

3192e08

* Allow override of sfEventSqsLambda timeout with associated queue adjustments * Update CHANGELOG * Respond to PR feedback * Update per PR feedback request

CUMULUS-3906 - Update to ORCA v10.0.0 (#3813)

2b25cfa

* CUMULUS-3906 - Update to ORCA v10.0.0 * Resolved CL conflict. * Removes required wording for 3906 from CL

CUMULUS-3897 - Merges v18.5.0 CL updates (#3817)

82abf69

* Fixes merge conflict * Adds diff link for v18.5.0

CUMULUS-3897 - Adds release 18.5.0 website docs (#3819)

151ec9c

Reverts ORCA v10.0.0 update (#3821)

5342484

Fix cumulus versions due to bad merge (#3826)

edb9f7b

* Fix cumulus versions due to bad merge * Update aws-sdk versions to revert bad merge

Just adding in my creds to the CI. (#3827)

0f0b794

My first PR :-)

[CUMULUS-3921] Test new Orca version and update example deployment (#…

199cdcd

…3830) * Update Orca version * Update orca variables for v10 release series * Update orca var to default value * Update Orca version to official release

CUMULUS-3919:Added terraform variables disableSSL and rejectUnauthori…

02b42fb

…zed (#3832) * CUMULUS-3919:Added terraform variables disableSsl and rejectUnauthorized * disableSsl->disableSSL

Minor document typo/fix (#3834)

c16fe6f

CUMULUS-3931: Allowing force_new_deployment to be configurable for ec…

d1e0311

…s services. (#3838) * Allowing force_new_deployment to be configurable for ecs services. * Update CHANGELOG --------- Co-authored-by: Michael Hall <mlh0079@uah.edu>

fix no top level await (#3843)

d340f1e

Merge branch 'master' of https://github.com/nasa/cumulus

adac427

Update CHANGELOG from release 18.5.1 (#3845)

9b51d05

* Update CHANGELOG from release 18.5.1 * Add CHANGELOG footer * Fix CHANGELOG

jennyhliu reviewed Jan 17, 2025

View reviewed changes

etcart and others added 3 commits January 17, 2025 19:08

Update CHANGELOG.md

392aa85

Co-authored-by: jennyhliu <34660846+jennyhliu@users.noreply.github.com>

WIP still figuring out the right way to update the cmr file

06f5f68

Merge branch 'ecarton_rebase-3751' into ecarton/cumulus-3751-from-18.5.2

8ce2e2e

etcart and others added 13 commits January 21, 2025 11:31

move to using api to get granule data from ID

db8c1da

shift tests to using granuleId as input

6c9fa7a

test files have granule id payload instead of granule payload

cae4527

sourceCollection in code and tests to allow update with details befor…

ab0f609

…e collection

small format fix

0134166

config and linter errors

4a59394

more linting

65f4673

fix input configuration

148a291

double check my inputs

21fe9f9

let me make sure the config is right

dacde25

CUMULUS-3967 -- Forward port of release fix -- Pin @aws-sdk/client-s3…

762b655

… to 3.726 due to unit failure in > 3.729.0 (#3905) * Pin @aws-sdk/client-s3 to 3.726 due to unit failure in > 3.729.0 * Fixup * Fix dependency concern * Update CHANGELOG

think this needs to be corected

d46049b

remove unnecessary eslints

707e101

jennyhliu reviewed Jan 23, 2025

View reviewed changes

etcart added 3 commits January 23, 2025 19:32

moving to non-hardcoded buckets

6bd075e

CL for memory reservation

5180b4a

remove gitignore from task

534a951

jennyhliu reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ecarton/cumulus 3751 from 18.5.2 #3900

Ecarton/cumulus 3751 from 18.5.2 #3900

etcart commented Jan 10, 2025 •

edited

Loading

jennyhliu Jan 17, 2025

etcart Jan 22, 2025

etcart Jan 24, 2025

jennyhliu Jan 17, 2025

etcart Jan 24, 2025

jennyhliu Jan 17, 2025

etcart Jan 20, 2025

jennyhliu Jan 17, 2025 •

edited

Loading

etcart Jan 24, 2025

jennyhliu Jan 17, 2025 •

edited

Loading

etcart Jan 22, 2025

jennyhliu Jan 17, 2025 •

edited

Loading

jennyhliu Jan 17, 2025

etcart Jan 18, 2025

jennyhliu Jan 17, 2025 •

edited

Loading

jennyhliu Jan 17, 2025

jennyhliu Jan 17, 2025

jennyhliu Jan 17, 2025

jennyhliu Jan 17, 2025

jennyhliu Jan 17, 2025

etcart Jan 18, 2025

etcart commented Jan 20, 2025

jennyhliu Jan 23, 2025

etcart Jan 24, 2025

jennyhliu Jan 24, 2025


		const activityStep = new ActivityStep();

		describe('The MoveGranuleCollection workflow using ECS', () => {

Ecarton/cumulus 3751 from 18.5.2 #3900

Are you sure you want to change the base?

Ecarton/cumulus 3751 from 18.5.2 #3900

Conversation

etcart commented Jan 10, 2025 • edited Loading

Changes

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennyhliu Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennyhliu Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennyhliu Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jennyhliu Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etcart commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etcart commented Jan 10, 2025 •

edited

Loading

jennyhliu Jan 17, 2025 •

edited

Loading

jennyhliu Jan 17, 2025 •

edited

Loading

jennyhliu Jan 17, 2025 •

edited

Loading

jennyhliu Jan 17, 2025 •

edited

Loading