Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra if in pseudocode fetch all docs #2691

Merged
merged 1 commit into from
Sep 4, 2024
Merged

Conversation

lbercken
Copy link
Contributor

@lbercken lbercken commented Sep 3, 2024

Description

"You can repeat this process until you’ve fetched as many docs as you want, or until the nextCursorMark returned matches the cursorMark you’ve already specified — indicating that there are no more results."

The pseudo code misses the first part.

Solution

Add an extra if for checking whether the number of documents in the result is less than the specified rows. The extra if also prevents an unnecessary extra request.

It could also be shortened, by replacing the two ifs by:

$done = count($results[response][docs]) < $r || $params[cursorMark] == $results[nextCursorMark]

I don't know whether that's good for readability.

In addition this could be transformed into a do while statement.

Tests

None. This is a documentation change.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

Not all of the above is checked, because it is a documentation change.

"You can repeat this process until you’ve fetched as many docs as you want, or until the nextCursorMark returned matches the cursorMark you’ve already specified — indicating that there are no more results."

The pseudo code misses the first part. The extra if also prevents an unnecessary extra request.
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 3, 2024
Copy link
Contributor

@epugh epugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think pseudo code by it's nature can be lengthier if that helps clarity!

@epugh epugh self-assigned this Sep 4, 2024
@epugh epugh merged commit 1ce59fa into apache:main Sep 4, 2024
3 checks passed
epugh pushed a commit that referenced this pull request Sep 4, 2024
"You can repeat this process until you’ve fetched as many docs as you want, or until the nextCursorMark returned matches the cursorMark you’ve already specified — indicating that there are no more results."

The pseudo code misses the first part. The extra if also prevents an unnecessary extra request.
@epugh
Copy link
Contributor

epugh commented Sep 4, 2024

Thank you!

@hossman
Copy link
Member

hossman commented Sep 4, 2024

Hmmmm.... IMO this change does not seem like an overall improvement to the doc

First off, the premise of the PR seems to conflate two diff scenarios...

"You can repeat this process until you’ve fetched as many docs as you want, or until the nextCursorMark returned matches the cursorMark you’ve already specified — indicating that there are no more results."
The pseudo code misses the first part.

There is a difference between the "until you’ve fetched as many docs as you want" scenario (ie: "I as a client want to fetch docs from solr until i have enough to satisfy my purposes" - there is a separate example for that later in the doc) and "solr ran out of documents to return" scenario being demonstrated here.

Adding a "did solr return the full number of rows requested?" check to the code doesn't do anything to demonstrate the first scenario

Second:
The original psuedo-code here explicitly did NOT check for "did solr return the full number of rows requested?" because doing so is only a viable way to optimize away the lsat request in static indexes. In a "Tailing a Cursor" type scenario you absolutely should not check the number of docs returned -- this is a type of use case that is introduced later in the document, and the psuedocode in both sections was designed to be structuraly the same -- except for the addition of loop-forever/sleep wrapped around it

If we're going to change this pseudo code block to do an explicit rows check, then the example should probably note it's an optimization, and the other psuedo code example in the later section on "tailing a cursor" should have a note that you can't use that optimization in this situation.

Third:
This PR only changed the psuedo-code example, even though the very next line after the example says...

Using SolrJ, this pseudocode would be:

...but now the psuedo-code and SolrJ code no longer match.

Likewise the sequential "curl" example (following the SolrJ example) no longer mtaches either, because it still does a final request looking or an unchanged nextCursorMark (even though it notes the previous request request returned less docs then the rows param)

@epugh
Copy link
Contributor

epugh commented Sep 4, 2024

I knew I was going to get in trouble with pseudo code...! I'll roll this back.
One thought, does the pseudo code even help convey thigns? I sort of wondered if it was useful at all.... Thoughts?

@epugh
Copy link
Contributor

epugh commented Sep 4, 2024

and we can fix the q=:&rows=5&start=0&sort=name asc, id asc&cursorMark=* where the q param looks badly formatted..

@hossman
Copy link
Member

hossman commented Sep 4, 2024

I knew I was going to get in trouble with pseudo code...!

:) ... you just gotta remember to review the whole doc holistically.

One thought, does the pseudo code even help convey thigns? I sort of wondered if it was useful at all.... Thoughts?

I don't have a strong opinion ... The idea once upon a time was to avoid the implication to new users that they had to use SolrJ (or python, or whatever) to achieve certain goals.

If you want to rip out the psuedo-code i'm fine with that, but it means re-rwiting all the other examples to use SolrJ (and i don't think this is the only page in the ref-guide with psuedo code?)

epugh added a commit that referenced this pull request Sep 7, 2024
@epugh
Copy link
Contributor

epugh commented Sep 7, 2024

@lbercken I reverted the commit, but would love to try again if you can take @hossman feedback into account...I thought I would be able to jsut "reopen" this PR, but that doesnt' appear to be how it works!

epugh added a commit that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants