Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested Parent-to-Child Relationship Query Hangs #35

Closed
yongks opened this issue Sep 2, 2019 · 8 comments
Closed

Nested Parent-to-Child Relationship Query Hangs #35

yongks opened this issue Sep 2, 2019 · 8 comments
Assignees
Labels
bug Unintended behavior that should be corrected could not reproduce Could not be reproduced question or help Clarification or help may suffice to resolve
Milestone

Comments

@yongks
Copy link

yongks commented Sep 2, 2019

Thanks for the great library. Had been using bulk query for a while.

I hit into maximum limit of records returned per API call (2000 rows). The workaround is to break down into multiple queries to work around the limitation.

I had used python library simple-salesforce before, and never hit into such limitation. With some research, it seems python library implemented query more feature (pagination) which enable it to retrieve up to 50,000 rows per API call. The research leads me to:
.
https://help.salesforce.com/articleView?id=000339353&type=1&mode=1

Not sure if there is a plan to implement this here. Thanks.

@StevenMMortimer StevenMMortimer self-assigned this Sep 2, 2019
@StevenMMortimer
Copy link
Owner

StevenMMortimer commented Sep 2, 2019

@yongks Thanks for making a note. The QueryMore method is already implemented within the sf_query() function, so there shouldn't be a limitation like you are suggesting. It is a recursive function (calls itself) if the query response "done" attribute is not TRUE. If not done, then the function uses the nextRecordsUrl, or queryLocator as it's referred to in the SOAP API, to paginate through to the end of the query.

I cannot replicate the issue, but if you want to test the recursive behavior and report back any findings that would be great. The source code you can run is available here: https://github.com/StevenMMortimer/salesforcer/blob/master/R/query.R

@yongks
Copy link
Author

yongks commented Sep 3, 2019

Thanks for the quick response. I tested queryall=TRUE on the below SOQL (notice that there is an aggregated result "(Select ..... From FeedComments) returning dataframe within dataframe. This is the reason why I have to use 'REST' and not 'Bulk 1.0'.

My results are:
a. Data Range 2018-01-10 to 2018-01-17 (good), 640 rows (excluding df within FeedComments)
b. Data Range 2018-01-10 to 2018-01-18 (hang), additional 90 rows (excluding df within FeedComments)

case_feed_soql = sprintf(" 
       SELECT Id, CreatedDate, CreatedById, Body, ParentId, Parent.CaseNumber,
                   (SELECT Id, FeedItemId, ParentId, CreatedDate, CreatedById, CommentBody FROM FeedComments )  
            FROM CaseFeed 
        WHERE   
            Type = 'TextPost' AND 
            CreatedDate >= 2018-01-17T00:00:00Z AND 
            CreatedDate <   2018-01-18T00:00:00Z
        ORDER BY CreatedDate ASC
  ")
  
  case_feed.df    = sf_query(case_feed_soql,queryall = TRUE)```

@StevenMMortimer
Copy link
Owner

I think you believe the queryall argument is doing something different than what it's actually doing. That argument allows you to return archived and deleted records in the query resultset. It does not literally mean "include all results of the query".

In the example you are showing the issue is not the number of records, but the structure of a few specific records that you're trying to query, which have some quirk about them that makes them difficult to parse and are causing the process to hang. Queries are tough to debug for me because I don't have access to your org or your data to see what the root cause of the issue is. There are a few routes to go, all of which depend on the level of data you are allowed to share with me or publicly:

  1. Try switching to the SOAP API by setting api_type="SOAP" to see if it can better handle the parsing
  2. You look at the code that's behind sf_query() and run it step-by-step. Take the output of the httr_response object on Line 112, save it as an .rds object with saveRDS(), and email it to me for further debugging on how to parse the JSON or XML (depending on whether you use the REST or SOAP API). Or you are welcome to work on the parsing yourself.
  3. You can invite me to your organization so I can access the data to test with it.

If you need to send something in a less public forum, you are welcome to email to me details at reportmort@gmail.com.

@StevenMMortimer StevenMMortimer added bug Unintended behavior that should be corrected could not reproduce Could not be reproduced question or help Clarification or help may suffice to resolve labels Sep 3, 2019
@StevenMMortimer StevenMMortimer changed the title Query More than 2000 records per API call Nested Query Hangs Sep 3, 2019
@StevenMMortimer
Copy link
Owner

@yongks Please let me know if you're still having issues. Otherwise, I'll close this issue since I can't reproduce or help without more information. Thanks.

Note for future reference: This issue seems related to #19

@StevenMMortimer
Copy link
Owner

It could be related to something like this since we are not checking the child query for "done" simple-salesforce/simple-salesforce#311

@StevenMMortimer StevenMMortimer changed the title Nested Query Hangs Nested Relationship Query Hangs Sep 14, 2019
@StevenMMortimer StevenMMortimer changed the title Nested Relationship Query Hangs Nested Parent-to-Child Relationship Query Hangs Sep 14, 2019
@BrendanWhittle
Copy link

Hi Guys I have just noticed this and it seems to be to be related to the amount of data you are pulling. When I include a Long Text Area field in my sf_query then my max limit of records that I can pull drops from 2000 to 250. anymore than this and the query hangs. I'm not trying to pull any child parent relationships.

@StevenMMortimer
Copy link
Owner

@BrendanWhittle Thanks so much for adding this context and clue. I'll have to check and see if there are differences in how the query status is marked or the next records URL is provided whenever pulling Long Text Area fields. Maybe the current code is looking in the wrong place for those items to continue paginating in those types of queries.

@StevenMMortimer
Copy link
Owner

@yongks @BrendanWhittle Apologize for the extremely long delay. This is well overdue, but finally got to the heart of the issue with some help from #54 and refactoring the REST and SOAP query functions. The fix is already on GitHub and will be included in the new release of the package (v0.2.0) coming in the next few days. Upgrade your version of {salesforcer} package to (>= 0.1.4.9999) and you should be good to go. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended behavior that should be corrected could not reproduce Could not be reproduced question or help Clarification or help may suffice to resolve
Projects
None yet
Development

No branches or pull requests

3 participants