W-16486240 feat: `data export bulk/resume` #1035

cristiand391 · 2024-08-16T17:00:31Z

What does this PR do?

This PR adds 2 new commands:

`data export bulk`

allows to bulk export records as csv or json.
Unlike data query --bulk, it can handle million of records by using node streams to export to a file.

forcedotcom/cli#1995

`data export resume`

resume a export operation started by data export bulk

Testing notes

Checkout the PR, build and sf plugins link.
You can query the ScratchOrgInfo object in our na40 hub (it has ~1.5M records), try exporting multiple fields in differrent formats, monitor memory usage.

What issues does this PR fix or reference?

@W-16486240@

[skip ci]

jsforce's bulk2 query pagination keeps batches in memory and does multiple passes (csv from API -> parse csv -> json -> back to csv). `data export` commands will be doing the pagination manually to improve memory consumption (we parse/write on each batch and drop it)

messages/data.export.bulk.md

cristiand391 · 2024-10-07T16:21:55Z

src/bulkDataRequestCache.ts

+      isState: true,
+      filename: BulkExportRequestCache.getFileName(),
+      stateFolder: Global.SF_STATE_FOLDER,
+      ttl: Duration.days(7),


bulk ingest/query job results are available for 7 days after being created:
https://developer.salesforce.com/docs/atlas.en-us.252.0.salesforce_app_limits_cheatsheet.meta/salesforce_app_limits_cheatsheet/salesforce_app_limits_platform_bulkapi.htm

cristiand391 · 2024-10-07T16:32:06Z

src/bulkUtils.ts

+    body,
+    headers,
+  };
+}


small wrapper around jsforce's HttpApi class to be able to make requests and get response body and body.

conn.request only returns the body, we need the sforce-locator header to fetch the next batch.

cristiand391 · 2024-10-07T16:40:18Z

src/bulkUtils.ts

+  };
+}
+
+export async function exportRecords(


we can't use jsforce's bulk2.query because it fetches all batches in memory, does 2/3 passes (raw csv from API -> parse -> back to csv) so you get out of memory quickly mid/big queries.

This helper function fetches 1 batch, parses it (only if output=json, API returns batches as CSV so we can write it to a file), and drops it.

src/bulkUtils.ts

cristiand391 · 2024-10-07T16:44:49Z

src/bulkUtils.ts

+
+      if (!locator) {
+        // first write, start JSON array
+        jsonWritable.write(`[${EOL}`);


we output all batches as a JSON array of records, this line starts it.

cristiand391 · 2024-10-07T16:46:07Z

src/bulkUtils.ts

+      // eslint-disable-next-line no-await-in-loop
+      await pipeline(
+        Readable.from(res.body),
+        new csvParse({ columns: true, delimiter: ColumnDelimiter[outputInfo.columnDelimiter] }),


same options as jsforce (columns: true):
https://github.com/jsforce/jsforce/blob/main/src/csv.ts

sf data export bulk allows to specify the delimiter so we pass it here for csv parsing.

cristiand391 · 2024-10-07T16:46:31Z

src/bulkUtils.ts

+          // eslint-disable-next-line @typescript-eslint/explicit-function-return-type
+          transform(chunk, _encoding, callback) {
+            if (recordsWritten === totalRecords - 1) {
+              callback(null, `  ${JSON.stringify(chunk)}${EOL}]`);


if writing the last record, close the JSON array.

cristiand391 · 2024-10-07T16:48:07Z

src/bulkUtils.ts

+      await pipeline(
+        locator
+          ? [
+              Readable.from(res.body.slice(res.body.indexOf(EOL) + 1)),


if locator => we already wrote the first batch so we skip the first line (CSV header) of the next batches to have all merged in 1 CSV file.

cristiand391 · 2024-10-07T16:49:11Z

src/commands/data/export/bulk.ts

+            {
+              name: 'result-format',
+              // eslint-disable-next-line @typescript-eslint/require-await
+              when: async (flags): Promise<boolean> => flags['result-format'] === 'csv',


allow --column-delimiter only if exporting as CSV.

cristiand391 · 2024-10-07T16:49:42Z

src/commands/data/export/bulk.ts

+            {
+              name: 'result-format',
+              // eslint-disable-next-line @typescript-eslint/require-await
+              when: async (flags): Promise<boolean> => flags['result-format'] === 'csv',


cristiand391 · 2024-10-07T16:51:53Z

src/commands/data/export/bulk.ts

+export type DataExportBulkResult = {
+  jobId?: string;
+  totalSize?: number;
+  filePath: string;


async export returns filePath and jobId
sync export returns filePath and totalSize

test/commands/data/export/bulk.nut.ts

cristiand391 · 2024-10-07T16:55:44Z

test/testUtil.ts

+  );
+
+  expect(totalQty).to.equal(recordCount);
+}


test helper to ensure:

a valid CSV file was written (it parses it)

the total number of records processed by the job were written

cristiand391 · 2024-10-07T16:57:11Z

test/testUtil.ts

+  const lengthRes = await exec(`jq length ${filePath}`, { shell: 'pwsh' });
+
+  expect(parseInt(lengthRes.stdout.trim(), 10)).equal(totalqty);
+}


test helper to ensure:

a valid JSON file was written (jq fails it not)

the total number of records processed by the job were written

queried fields were written on each record

messages/data.export.bulk.md

messages/data.export.resume.md

package.json

src/bulkDataRequestCache.ts

shetzel · 2024-10-07T21:34:23Z

src/commands/data/export/bulk.ts

+      } catch (err) {
+        const error = err as Error;
+        ms.stop(error);
+        throw err;


Maybe an enhancement to MultiStageOutput.stop() would be to accept an unknown so this pattern wouldn't be all over the commands. It's promoting an anti-pattern with err as Error coercion.

test/commands/data/export/bulk.nut.ts

* fix: edit the messages for the two new data bulk commands * Update data.export.bulk.md

cristiand391 · 2024-10-08T19:19:42Z

src/bulkDataRequestCache.ts

+      pollingOptions: { pollTimeout: 0, pollInterval: 0 },
+    } satisfies Pick<ResumeOptions['options'], 'operation' | 'query' | 'pollingOptions'>;
+
+    if (typeof jobIdOrMostRecent === 'boolean') {


So I was doing overloads for this but eslint suggested it was an overkill 😛
https://typescript-eslint.io/rules/unified-signatures/

[skip ci]

cristiand391 added 30 commits August 15, 2024 11:52

chore: first iteration with json writter

0856e2d

chore: json writable

9ebfe02

chore: export resume

13e51d7

chore: refactor

ec7c6e7

chore: bulkRequestCache refactor

ec726d2

[skip ci]

feat: support column-delimiter

398bc3c

feat: support --line-ending

de4ccb9

feat: support --query-file

af0edb9

fix: --line-ending depends on --result-format=csv

ecb9ce8

fix(export:resume): better polling options

5770277

chore: refactor/address TODOs

a286dc5

feat: multi-stage-output, part 1

976d500

chore: update snapshot

5a64fa2

[skip ci]

fix: support JSON export

b23e1c0

chore: mso in export resume

b0370af

fix: support all CSV delimiters

cd073f6

chore: remove org flag in export resume

444115d

chore: allow column-delimiter only with csv output

fd0cf20

fix: allow resul-format=json without column-delimiter

76dcf9b

chore: mso post-stage looks better

664cdb1

chore: improve error handling

daa8882

test: add export bulk NUTs

109ef0d

chore: update cmd snapshot

8c29209

Merge remote-tracking branch 'origin/main' into cd/bulk-export

0cef857

chore: bump jsforce

d9b817c

fix: export bulk resume doesn't have target-org

fc50451

test: jq gets the right path

fd31c1e

test: isolated run

bc9fa3e

test: pwsh wants double-quotes?

fd861bc

cristiand391 requested a review from a team as a code owner October 7, 2024 16:09

cristiand391 commented Oct 7, 2024

View reviewed changes

messages/data.export.bulk.md Show resolved Hide resolved

cristiand391 commented Oct 7, 2024

View reviewed changes

src/bulkUtils.ts Outdated Show resolved Hide resolved

cristiand391 commented Oct 7, 2024

View reviewed changes

test/commands/data/export/bulk.nut.ts Outdated Show resolved Hide resolved

cristiand391 commented Oct 7, 2024

View reviewed changes

cristiand391 added 3 commits October 7, 2024 14:25

chore: remove console.log

de9b4cf

fix: create/append but fail if exists

bb8f0a3

chore: remove comments

9b59de1

shetzel requested changes Oct 7, 2024

View reviewed changes

jshackell-sfdc and others added 4 commits October 8, 2024 10:55

edit messages for two new "data bulk" commands (#1084)

e433e71

* fix: edit the messages for the two new data bulk commands * Update data.export.bulk.md

chore: rename var

0d68787

chore: refactor cache resolver

e30ed11

chore: improve cache error name

fd6969f

cristiand391 commented Oct 8, 2024

View reviewed changes

chore: add reference to WI

e95e500

[skip ci]

shetzel approved these changes Oct 8, 2024

View reviewed changes

shetzel merged commit 97be039 into main Oct 8, 2024
1 check passed

shetzel deleted the cd/bulk-export branch October 8, 2024 22:48

iowillhoit changed the title ~~feat: data export bulk/resume~~ W-16486240 feat: data export bulk/resume Jan 27, 2025

W-16486240 feat: data export bulk/resume #1035

W-16486240 feat: data export bulk/resume #1035

Uh oh!

Conversation

cristiand391 commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

data export bulk

data export resume

Testing notes

What issues does this PR fix or reference?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cristiand391 Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

W-16486240 feat: `data export bulk/resume` #1035

W-16486240 feat: `data export bulk/resume` #1035

cristiand391 commented Aug 16, 2024 •

edited

Loading

`data export bulk`

`data export resume`

cristiand391 Oct 7, 2024 •

edited

Loading