Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add import bulk/resume commands #1091

Merged
merged 25 commits into from
Oct 18, 2024
Merged

feat: add import bulk/resume commands #1091

merged 25 commits into from
Oct 18, 2024

Conversation

cristiand391
Copy link
Member

@cristiand391 cristiand391 commented Oct 14, 2024

What does this PR do?

Adds 2 new commands:

data import bulk

take a CSV with fields of an object and bulk insert them into the org:

happy path

Screen.Recording.2024-10-16.at.10.57.01.AM.mov

failed to import all records

Screen.Recording.2024-10-16.at.11.07.10.AM.mov

job failure

Screenshot 2024-10-16 at 11 08 37 AM

job aborted

Screen.Recording.2024-10-16.at.11.12.26.AM.mov

data import resume

resume an async/timed out import

Screen.Recording.2024-10-16.at.11.17.31.AM.mov

testing instructions

checkout PR locally and sf plugins link it, you can use the following CSV files available in plugin-data for testing:
test/test-files/data-project/data/bulkUpsertLarge.csv (big)
test/test-files/data-project/data/bulkUpsert.csv (smol)

What issues does this PR fix or reference?

@W-13656292@

isState: true,
filename: BulkImportRequestCache.getFileName(),
stateFolder: Global.SF_STATE_FOLDER,
ttl: Duration.days(7),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobId: string;
processedRecords?: number;
successfulRecords?: number;
failedRecords?: number;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON output:

async: jobID
sync: jobID and num of processed/successful/failed records

});

try {
await job.poll(5000, timeout.milliseconds);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5s for the polling interval, we could add a --poll-interval flag if users want to customize this later.

jobId: jobInfo.id,
processedRecords: jobInfo.numberRecordsProcessed,
successfulRecords: jobInfo.numberRecordsProcessed - (jobInfo.numberRecordsFailed ?? 0),
failedRecords: jobInfo.numberRecordsFailed,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API only gives us the total of records processed (that includes success/failure) and failed ones so we need calculate successful qty.
https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/get_job_info.htm

ms.stop('failed');
throw messages.createError(
'error.jobFailed',
[jobInfo.errorMessage, conn.getUsername(), job.id],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (jobInfo.state === 'Aborted') {
ms.stop('failed');
// TODO: replace this msg to point to `sf data bulk results` when it's added (W-12408034)
throw messages.createError('error.jobAborted', [conn.getUsername(), job.id], [], err as Error);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aborted jobs don't include an error msg, we log the sf org open .. msg so they can view more details in the org.

char: 'w',
unit: 'minutes',
summary: messages.getMessage('flags.wait.summary'),
defaultValue: 5,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data import resume has a default wait time of 5 minutes.
This is on purpose so that if you don't specify --wait you still have a good chance the job finishes on time.

data upsert resume and data delete resume have --wait with 0 as a default, so on a first run without --wait you have to re-run them again with a bigger timeout if the job is still running.

char: 'i',
length: 18,
startsWith: '750',
exactlyOne: ['use-most-recent'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following @VivekMChawla's comments about specificity:
https://salesforce-internal.slack.com/archives/G02K6C90RBJ/p1722277847033949

sf data import resume is invalid, you have to pass either --job-id or --use-most-recent

flags['job-id'],
flags['use-most-recent'],
undefined,
undefined
Copy link
Member Author

@cristiand391 cristiand391 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 4th arg is supposed to be api-version but I didn't add that flag here because the API version used when creating the job is cached by data import bulk so we don't need to pass it here.

const numberRecordsFailed = data?.numberRecordsFailed ?? 0;

if (data?.numberRecordsProcessed) {
return (data.numberRecordsProcessed - numberRecordsFailed).toString();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we default to 0 on L84 for failed records if no API info is available yet (this state update happens while job is being processed) but only render the Successful records block if the API returns numberRecordsProcessed.
This makes oclif/mso render a spinner instead of 0 when there's no processed data (first seconds of a job run or if it failed).

const jobInfo = await job.check();

// send last data update so job status/num. of records processed/failed represent the last update
ms.goto('Processing the job', jobInfo);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if job.poll on L138 throws then the state rendered is from the previous poll update so we do one last update with fresh data from the org (job.check()) so the last state rendered is accurate (job state, record counter).

@cristiand391 cristiand391 marked this pull request as ready for review October 16, 2024 14:19
@cristiand391 cristiand391 requested a review from a team as a code owner October 16, 2024 14:19
src/commands/data/import/bulk.ts Outdated Show resolved Hide resolved
src/commands/data/import/bulk.ts Outdated Show resolved Hide resolved
src/commands/data/import/bulk.ts Outdated Show resolved Hide resolved
src/commands/data/import/resume.ts Outdated Show resolved Hide resolved
src/commands/data/import/resume.ts Outdated Show resolved Hide resolved
@mdonnalley
Copy link
Contributor

QA

🔴 bulk import with --wait

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --wait 10

 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 4ms
 ✔ Processing the job 6.01s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: ◝

 Status: JobComplete
 Job Id: 750Oy00000Dh2gcIAB
 Elapsed Time: 7.61s

It works as expected but Failed records is left with an unresolved spinner

🟡 bulk import

Question: is it supposed to be async by default?

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 4.05s

 Status: UploadComplete
 Job Id: 750Oy00000DgnMcIAJ
 Elapsed Time: 4.06s

Run "sf data import resume --job-id 750Oy00000DgnMcIAJ" to resume the operation.

🟢 bulk import with --async

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --async
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 1.41s

 Status: UploadComplete
 Job Id: 750Oy00000Dgu7yIAB
 Elapsed Time: 1.41s

Run "sf data import resume --job-id 750Oy00000Dgu7yIAB" to resume the operation.

🟢 resume an async import with --job-id

❯ sf data import resume --job-id 750Oy00000DgnMcIAJ
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 1.11s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000DgnMcIAJ
 Elapsed Time: 1.13s

🟢 resume an async import with --use-most-recent

❯ sf data import resume --use-most-recent
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 1.11s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000Dh2IPIAZ
 Elapsed Time: 1.14s

🟡 use invalid id for resume

I would have expected an error about the id not being valid but instead got this. Not sure if that's right or not

❯ sf data import resume --job-id 750Oy00000Dh2IPIAA
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.
Error (CannotCreateResumeOptionsWithoutAnOrgError): Cannot create a cache entry without a valid org.

🔴 links in terminal that doesn't support links

you need to provide your own fallback so that terminal-link doesn't insert non-visible whitespace characters, which is what I think causes this issue. See https://github.com/salesforcecli/plugin-deploy-retrieve/blob/main/src/utils/deployStages.ts#L78

Oct-16-2024 12-53-54

🟢 async insert and resume with large csv

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsertLarge.csv --sobject account --async
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────── Importing data (async) ─────────────

 ✔ Creating ingest job 3.29s

 Status: UploadComplete
 Job Id: 750Oy00000DgjAyIAJ
 Elapsed Time: 3.32s

Run "sf data import resume --job-id 750Oy00000DgjAyIAJ" to resume the operation.

~/repos/trailheadapps/dreamhouse-lwc on  main via ⬢ v20.15.0 took 5.7s
❯ sf data import resume --job-id 750Oy00000DgjAyIAJ
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ◯ Creating ingest job - Skipped
 ✔ Processing the job 2m 30.23s
   ▸ Processed records: 76380
   ▸ Successful records: 76380
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Oy00000DgjAyIAJ
 Elapsed Time: 2m 30.79s

🟢 handles failures

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --line-ending CRLF --wait 10
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 3ms
 ✘ Processing the job 11.83s
   ▸ Processed records: ✘
   ▸ Successful records: ✘
   ▸ Failed records: ✘

 Status: Failed
 Job Id: 750Oy00000Dh2gdIAB
 Elapsed Time: 13.42s

Error (JobFailedError): Job failed to be processed due to:
ClientInputError : LineEnding is invalid on user data. Current LineEnding setting is CRLF

To review the details of this job, run:
sf org open --target-org test-f7zxeasp0pzx@example.com --path "/lightning/setup/AsyncApiJobStatus/page?address=%2F750Oy00000Dh2gdIAB"

@cristiand391
Copy link
Member Author

It works as expected but Failed records is left with an unresolved spinner

should be fixed now (data import resume was handling it correctly, all MSO stuff is merged into one class now)

🟡 bulk import
Question: is it supposed to be async by default?

yes, same as data bulk delete/upsert. I can't find why that decision was made.
data import resume does have a default wait time on purpose:
#1091 (comment)

🟡 use invalid id for resume

fixed by writing a separate cache resolver for data import like we did for data export.

🔴 links in terminal that doesn't support links

added the fallback

@cristiand391
Copy link
Member Author

cristiand391 commented Oct 17, 2024

@mdonnalley this is ready for review/qa again, thanks!

also, please squash-merge the PR when it's good to go (bunch of commits in my branch)

EDIT:
⚠️ don't merge until Juliet reviews the msgs/help text.

`job.poll` is in a try/catch, when a failure happens it throws but also
emits `error`. We avoid stopping MSO on `error` because we want to send
a last update in the `catch` block
@mdonnalley
Copy link
Contributor

🟢 bulk import with --wait

❯ sf data bulk import -f ~/repos/salesforcecli/plugin-data/test/test-files/data-project/data/bulkUpsert.csv --sobject account --wait 10

 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.

 ───────────────── Importing data ─────────────────

 ✔ Creating ingest job 1.49s
 ✔ Processing the job 11.32s
   ▸ Processed records: 10
   ▸ Successful records: 10
   ▸ Failed records: 0

 Status: JobComplete
 Job Id: 750Ei00000EQhxKIAT
 Elapsed Time: 12.93s

🟢 use invalid id for resume

❯ sf data import resume --job-id 750Ei00000EQjmGIAz
 ›   Warning: @salesforce/plugin-data is a linked ESM module and cannot be auto-transpiled. Existing compiled source will be used
 ›   instead.
Error (BulkRequestIdNotFoundError): Could not find a cache entry for job ID 750Ei00000EQjmGIAz.

🟢 links in terminal that doesn't support links

@mdonnalley mdonnalley merged commit b379335 into main Oct 18, 2024
11 checks passed
@mdonnalley mdonnalley deleted the cd/data-import-bulk branch October 18, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants