Enable PKChunking for sf_query_bulk #46

JordanLewis312 · 2020-01-23T15:24:41Z

I am attempting to query a large table (~30M records) from Salesforce to R using the SalesforceR package, and hope to do so without splitting up into many queries.

The following query succeeds, if I limit the records:

OrderItemsSF <- sf_query_bulk("SELECT Id,Order,Product,Quantity FROM OrderItems LIMIT 1000000",object_name = "OrderItems", api_type = "Bulk 1.0", verbose = TRUE, max_attempts = 1000)

This query fails ("Error: column name 'result' must not be duplicated") whenever it takes longer than 10 minutes, which fits with SF's documentation saying bulk queries retry after 10 minutes. I believe the answer is to enable PKChunking to automatically separate my query into smaller batches, but I am having trouble finding a working syntax for this.

I have tried the following:

OrderItemsSF <- sf_query_bulk("SELECT Id,Order,Product,Quantity FROM OrderItems LIMIT 1000000",object_name = "OrderItems", api_type = "Bulk 1.0", verbose = TRUE, max_attempts = 1000, control = sf_control(PKChunkingHeader = list(Sforce-Enable-PKChunking= TRUE)))

This results in the error "Error in catch_errors(httr_response) : ClientInputError: Sforce-Enable-PkChunking doesn't have a valid value. The same error results from using FALSE, and according to the documentation (see link 2 at bottom), FALSE is the default for this parameter, so that doesn't make sense!

Other syntax attempts, like the one below, succeed on queries under 10 minutes and fail over 10 minutes, suggesting that R is ignoring this text and not actually PKChunking:

OrderItemsSF <- sf_query_bulk("SELECT Id,Order,Product,Quantity FROM OrderItems LIMIT 1000000",object_name = "OrderItems", api_type = "Bulk 1.0", verbose = TRUE, max_attempts = 1000, control = list(Sforce-Enable-PKChunking= TRUE))

I'm at a loss for what seems like a simple syntax issue, and can't find any specific examples online of PKChunking in a SalesforceR query, despite the documentation saying this can be done. I'd greatly appreciate some guidance here.

The text was updated successfully, but these errors were encountered:

StevenMMortimer · 2020-01-26T01:17:21Z

First, the issue may not be related to PKChunking. If you are seeing "Error: column name 'result' must not be duplicated" that is a dplyr error when there are multiple columns with the same name in the resulting CSV. It is possible that your query is completing, but the results are not being parsed correctly back into R when it tries to load the recordset. If you go to https://na1.salesforce.com/750, are you able to see that the query finished successfully? (replace "na.1" in the URL with your Salesforce domain)

Next, there are two ways to pass control arguments:

# Method 1
result <- sf_query("SELECT Id, Order, Product, Quantity FROM OrderItems LIMIT 1000000",
                   object_name = "OrderItems",
                   PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE), 
                   api_type = "Bulk 1.0")

# Method 2
result <- sf_query("SELECT Id, Order, Product, Quantity FROM OrderItems LIMIT 1000000",
                   object_name = "OrderItems",
                   control = sf_control(PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE)), 
                   api_type = "Bulk 1.0")

This appears to match the first style you gave. However, I have not tested each header parameter individually so it is possible that it is being passed, just not be in the way Salesforce wants it (e.g. Sforce-Enable-PKChunking: chunkSize=50000; startRow=00130000000xEftMGH). If you turn on the verbose parameter, does the printed header look like the example above? Have you tried specifying the chunk size? Like this:

result <- sf_query("SELECT Id, Order, Product, Quantity FROM OrderItems LIMIT 1000000",
                   object_name = "OrderItems",
                   control = sf_control(PKChunkingHeader = list(chunkSize = 50000)), 
                   api_type = "Bulk 1.0")

That example above should show something like Sforce-Enable-PKChunking: chunkSize=50000 in the printed verbose message.

Lastly, the second style you tried out to pass the argument is not the correct syntax because it doesn't tell the control argument that the list you are passing is for setting the PKChunkingHeader header, so I would expect that to be either ignored or fail.

Fix so that a character version of the logical TRUE/FALSE will be properly formatted. This arose from having to handle TRUE/FALSE as valid values but also the elements in the list themselves that need to be formatted and imply TRUE. Fixes #46

Update the process by which the Bulk 1.0 query functions pull down results because it is different when the data is PKChunked #46

StevenMMortimer · 2020-07-20T00:08:08Z

@JordanLewis312 Apologize for the extremely long delay, but finally got to the heart of the issue. It had two distinct parts:

First, the value to turn on PKChunking can either be TRUE/FALSE or details like chunkSize that imply TRUE, which wasn't being handled well by the code.
Second, when users invoke PKChunking Salesforce splits up the query results into batches. This is not typical for Bulk 1.0 queries which usually return all their results in a single CSV, so I had to rewrite some functions to account for queries that appear to have been PKChunked.

This fix is already on GitHub and will be included in the new release of the package (v0.2.0) coming in the next few days. Upgrade your version of {salesforcer} package to (>= 0.1.4.9999), then you can invoke like this:

library(salesforcer)
sf_auth()
contacts <- sf_query("SELECT Id, Name FROM Contact", 
                     object_name = "Contact", 
                     api_type = "Bulk 1.0",
                     PKChunkingHeader = list(`Sforce-Enable-PKChunking` = TRUE),
                     # alternatively you can turn on using specific chunking options like this ...
                     # PKChunkingHeader = list(`chunkSize` = 500),
                     interval_seconds = 10,
                     max_attempts = 200)
contacts
#> # A tibble: 383 x 2
#>    Id                 Name                           
#>    <chr>              <chr>                                     
#> 1 0033s000012NdFhAAK John Doe
#> 2 0033s000012NkzwAAC Jane Doe
#> 3 0033s000012NkzxAAC Jane Doe
#> 4 0033s000012NkzyAAC Jane Doe
#> 5 0033s000012NkzzAAC Jane Doe
#> # … with 378 more rows

As a heads up, I'll also answer your post over on StackOverflow in case others stumble across it there (https://stackoverflow.com/questions/59863547/salesforcer-enable-pkchunking-for-salesforce-bulk-query-in-r)

Fix so that a character version of the logical TRUE/FALSE will be properly formatted. This arose from having to handle TRUE/FALSE as valid values but also the elements in the list themselves that need to be formatted and imply TRUE. Fixes #46

Update the process by which the Bulk 1.0 query functions pull down results because it is different when the data is PKChunked #46

StevenMMortimer self-assigned this Jan 26, 2020

StevenMMortimer added the question or help Clarification or help may suffice to resolve label Jan 26, 2020

StevenMMortimer closed this as completed Apr 22, 2020

StevenMMortimer added the bug Unintended behavior that should be corrected label Jul 18, 2020

StevenMMortimer added this to the 0.2.0 milestone Jul 18, 2020

StevenMMortimer added a commit that referenced this issue Jul 19, 2020

Update Bulk Query to parse PKChunking query batches

fae3c49

Update the process by which the Bulk 1.0 query functions pull down results because it is different when the data is PKChunked #46

StevenMMortimer added a commit that referenced this issue Jul 21, 2020

Update Bulk Query to parse PKChunking query batches

222bd5d

Update the process by which the Bulk 1.0 query functions pull down results because it is different when the data is PKChunked #46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable PKChunking for sf_query_bulk #46

Enable PKChunking for sf_query_bulk #46

JordanLewis312 commented Jan 23, 2020

StevenMMortimer commented Jan 26, 2020

StevenMMortimer commented Jul 20, 2020

Enable PKChunking for sf_query_bulk #46

Enable PKChunking for sf_query_bulk #46

Comments

JordanLewis312 commented Jan 23, 2020

StevenMMortimer commented Jan 26, 2020

StevenMMortimer commented Jul 20, 2020