Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing pagination for _cat/indices API #14718

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

gargharsh3134
Copy link
Contributor

@gargharsh3134 gargharsh3134 commented Jul 11, 2024

Description

Changes include:

  1. Introducing new _list/indices API which will display paginated results.
  2. Since, for paginated responses, next_token and what element is getting paginated will also have to be displayed, RestTable and Table classes have been extended to accommodate pagination related metadata
  3. New RestListAction (similar to RestCatAction), to output all _list APIs.
  4. Support of paginationStrategies and PageTokens. Different Pagination Strategies can implement PaginationStragety interface. The interface has been kept very open, considering the strategies can greatly vary in terms of what all parameters they might require to parse a token and generate a list of page from a data store (for e.g. ClusterState is a data store for IndexBasedPaginationStrategy, which has been added as part of this PR).

Functional Testing on a local cluster:

_list API:


gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list"       
:‑|
/_list/indices
/_list/indices/{index}


All the indices in the cluster

gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_cat/indices"                                                                                               
green open test-index  FclPNwqmQie3RnpoOYpLTg 2 3 0 0 1.6kb 416b
green open test-index2 k5FNOQcvSV679Hcs1g-OwQ 2 3 0 0 1.6kb 416b
green open index4-test s2AJ5CoYRru-8lbB871pWA 2 3 0 0 1.6kb 416b


Retrieving indices in pages of size=1, using _list API, in plain text format:

gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1"         
green open index4-test s2AJ5CoYRru-8lbB871pWA 2 3 0 0 1.6kb 416b
next_token MSQxNzE5NDI3NDk3NDQ0JDE3MTk0Mjc1NTYzODEkaW5kZXg0LXRlc3Q=

gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1&next_token=MSQxNzE5NDI3NDk3NDQ0JDE3MTk0Mjc1NTYzODEkaW5kZXg0LXRlc3Q="
green open test-index2 k5FNOQcvSV679Hcs1g-OwQ 2 3 0 0 1.6kb 416b
next_token MiQxNzE5NDI3NTM3MzcwJDE3MTk0Mjc1NTYzODEkdGVzdC1pbmRleDI=

gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1&next_token=MiQxNzE5NDI3NTM3MzcwJDE3MTk0Mjc1NTYzODEkdGVzdC1pbmRleDI="
green open test-index FclPNwqmQie3RnpoOYpLTg 2 3 0 0 1.6kb 416b
next_token null

Retrieving indices in pages of size=1, using _list API, in JSON format:

gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1&format=json&pretty"                                                 
{
  "next_token" : "MSQxNzE5NDI3NDk3NDQ0JDE3MTk0Mjc1NTYzODEkaW5kZXg0LXRlc3Q=",
  "indices" : [
    {
      "health" : "green",
      "status" : "open",
      "index" : "index4-test",
      "uuid" : "s2AJ5CoYRru-8lbB871pWA",
      "pri" : "2",
      "rep" : "3",
      "docs.count" : "0",
      "docs.deleted" : "0",
      "store.size" : "1.6kb",
      "pri.store.size" : "416b"
    }
  ]
}
gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1&format=json&pretty&next_token=MSQxNzE5NDI3NDk3NDQ0JDE3MTk0Mjc1NTYzODEkaW5kZXg0LXRlc3Q="
{
  "next_token" : "MiQxNzE5NDI3NTM3MzcwJDE3MTk0Mjc1NTYzODEkdGVzdC1pbmRleDI=",
  "indices" : [
    {
      "health" : "green",
      "status" : "open",
      "index" : "test-index2",
      "uuid" : "k5FNOQcvSV679Hcs1g-OwQ",
      "pri" : "2",
      "rep" : "3",
      "docs.count" : "0",
      "docs.deleted" : "0",
      "store.size" : "1.6kb",
      "pri.store.size" : "416b"
    }
  ]
}
gkharsh@bcd07443f159 OpenSearch % curl "localhost:9200/_list/indices?size=1&format=json&pretty&next_token=MiQxNzE5NDI3NTM3MzcwJDE3MTk0Mjc1NTYzODEkdGVzdC1pbmRleDI="
{
  "next_token" : null,
  "indices" : [
    {
      "health" : "green",
      "status" : "open",
      "index" : "test-index",
      "uuid" : "FclPNwqmQie3RnpoOYpLTg",
      "pri" : "2",
      "rep" : "3",
      "docs.count" : "0",
      "docs.deleted" : "0",
      "store.size" : "1.6kb",
      "pri.store.size" : "416b"
    }
  ]
}

Related Issues

Resolves #14258

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added Cluster Manager enhancement Enhancement or improvement to existing feature or request labels Jul 11, 2024
Copy link
Contributor

❌ Gradle check result for 4924a71: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

github-actions bot commented Aug 5, 2024

❌ Gradle check result for 0cfb0e9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

❌ Gradle check result for d665722: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@dblock
Copy link
Member

dblock commented Aug 29, 2024

This looks like a great start.

I would consider (don't feel strongly about it though) wrapping additional metadata into metadata to support other fields like total count.

{
  "metadata": {
      "next_token" : "MSQxNzE5NDI3NDk3NDQ0JDE3MTk0Mjc1NTYzODEkaW5kZXg0LXRlc3Q=",
      "total_count": 200
   },
  "indices" : [
    {
      "health" : "green",
      "status" : "open",
      "index" : "index4-test",
      "uuid" : "s2AJ5CoYRru-8lbB871pWA",
      "pri" : "2",
      "rep" : "3",
      "docs.count" : "0",
      "docs.deleted" : "0",
      "store.size" : "1.6kb",
      "pri.store.size" : "416b"
    }
  ]
}

From the implementation POV, can we build RestIndicesListAction to be a lot thinner where all pagination parts are generic except data fetching? I think a goal should be that to opt into _list/* one adds a ListAction interface to some existing classes and fills out the parts that actually fetch a page of data given a token and that's it. Otherwise it's too easy to accidentally start returning a different object from a non-paginated API vs. a paginated API, which I think is a red flag.

final boolean includeUnloadedSegments = request.paramAsBoolean("include_unloaded_segments", false);
final String requestedToken = request.param("next_token");
final int pageSize = Integer.parseInt(request.param("size", DEFAULT_LIST_INDICES_PAGE_SIZE_STRING));
final String requestedSortOrder = request.param("sort", "ascending");
Copy link
Member

@dblock dblock Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that I think should be thinned out. Instead of receiving a RestRequest it should be a PaginatedRestRequest where a bunch of parameters that are not list index specific (token, etc.) were parsed outside of RestIndicesListAction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dblock for the feedback. Makes sense to declutter this and also avoid duplication of code. Let me think through a bit more on, if and how can this be achieved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have moved things to a common place now. RestIndicesList action is now directly extending RestIndicesAction and all the existing parameters are hence only getting parsed at one place. While, the pagination related params can be parsed individually by the list/paginated APIs.

// Fetch all the indices from clusterStateRequest for a paginated query.
RestIndicesAction.RestIndicesActionCommonUtils.sendClusterStateRequest(
indices,
IndicesOptions.lenientExpandHidden(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one should be from the request as the cluster-state will be source of truth to resolve the user expression.


@Override
public RestChannelConsumer doListRequest(final RestRequest request, final NodeClient client) {
final String[] indices = Strings.splitStringByCommaToArray(request.param("index"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the request parameters should be common for cat and list indices and should be parsed in a single place. The only additional request parameter the list action should parse is paginatedToken.

Harsh Garg added 2 commits September 3, 2024 09:12
Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 2bc80f0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

*/
public class RestIndicesListAction extends RestIndicesAction {

protected static final int MAX_SUPPORTED_LIST_INDICES_PAGE_SIZE_STRING = 1000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this as a placeholder for now. Need to decide on whether should we impose such limits, and if yes, what should be the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can test to see what should be reasonable default in terms of latency,

indicesToBeQueried,
paginationMetadata
);
groupedListener.onResponse(clusterStateResponse);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment below needs an update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is GetSettings call will require indicesOptions from the request itself. Other calls aren't resolving the wildcards in a way GetSettings call does. The existing tests start to fail when GetSettings call is made with lenient indicesOptions. Need to check on how to call this behaviour out now

server/src/main/java/org/opensearch/common/Table.java Outdated Show resolved Hide resolved
…oken interface

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

github-actions bot commented Sep 4, 2024

❌ Gradle check result for f1b5016: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

private final PaginatedQueryResponse paginatedQueryResponse;
private final List<String> indicesFromRequestedToken;

public IndexBasedPaginationStrategy(PaginatedQueryRequest paginatedQueryRequest, String paginatedElement, ClusterState clusterState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor seems to have the core implementation of retrieving the page data. Should we move to a seperate init(clusterState) method and have construtor for only extracting paginatedQueryRequest and setting up member variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified the logic and moved out the implementation to private methods. Please see if the concern is now addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It might be good to define a static method retriveIndexPageFromClusterState (clusterState, lastRetrievedPage): IndexPage given that constructor is processing the core logic
  • paginatedElement can be a static constant and need not be passed by caller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.


public String generateEncryptedToken() {
return PaginationStrategy.encryptStringToken(
posToStartPage + "$" + creationTimeOfLastRespondedIndex + "$" + queryStartTime + "$" + nameOfLastRespondedIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • could the nameOfLastRespondedIndex contain the seperator $ ?
  • can we define constants to map the position of token to fields ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the nameOfLastRespondedIndex contain the seperator $ ?

Yeah, sorry this was one of the open point. "$" is an allowed index name, and was kept as a placeholder.

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

❌ Gradle check result for 0621975: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

sortedIndicesList.size()
);
return new PaginatedQueryResponse(
positionToStartNextPage >= sortedIndicesList.size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this check required given that you are performing min above ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised implementation has a slightly different logic, please check.

private final String sort;
private final int size;

public PaginatedQueryRequest(String requested_token, String sort, int size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: underscore in variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

*/
public interface PaginationStrategy<T> {

String DESCENDING_SORT_PARAM_VALUE = "descending";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ascending and descending be enum constants ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just a static string should be sufficient. While defining the specs, we can call this out as an ENUM and provide the values inside specs itself. Please let me know if it looks incorrect.

private final PaginatedQueryResponse paginatedQueryResponse;
private final List<String> indicesFromRequestedToken;

public IndexBasedPaginationStrategy(PaginatedQueryRequest paginatedQueryRequest, String paginatedElement, ClusterState clusterState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It might be good to define a static method retriveIndexPageFromClusterState (clusterState, lastRetrievedPage): IndexPage given that constructor is processing the core logic
  • paginatedElement can be a static constant and need not be passed by caller

paginatedQueryRequest.getSort()
);
this.indicesFromRequestedToken = getIndicesFromRequestedToken(sortedIndicesList, paginatedQueryRequest);
this.paginatedQueryResponse = getPaginatedResponseFromRequestedToken(sortedIndicesList, paginatedQueryRequest, paginatedElement);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not the case that indicesFromRequestedToken is the data corresponding to the page. Why are we passing sortedIndicesList again to getPaginatedResponseFromRequestedToken ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the new logic to get the paginated response object and let me know if it's still a concern.

* Utility method to get list of indices sorted by their creation time.
*/
static List<IndexMetadata> getListOfIndicesSortedByCreateTime(final ClusterState clusterState, String sortOrder) {
return clusterState.metadata().indices().values().stream().sorted((metadata1, metadata2) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define two comparatorFn -> asc and desc and use it based on DESCENDING_SORT_PARAM_VALUE. This is to avoid branching inside the comparatorFn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined a new comparator, but still kept only a single method. Since it's just one comparison that changes, thought it's better to avoid code duplication. But still open to defining two separate methods, if you feel that would bring out more clarity

if (!Objects.equals(currentIndexAtLastSentPosition.getIndex().getName(), requestedToken.nameOfLastRespondedIndex)) {
// case denoting already responded index/indices has/have been deleted/added in between the paginated queries.
// find the index whose creation time is just after/before (based on sortOrder) the index which was last responded.
if (!DESCENDING_SORT_PARAM_VALUE.equals(paginatedQueryRequest.getSort())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets define two different nextPageFn based on sort order. We can link them with sort enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have defined a new predicate for filtering and comparator for sorting, please check again.

Comment on lines 99 to 103
while (requestedPageStartIndexNumber > 0
&& (sortedIndicesList.get(requestedPageStartIndexNumber - 1)
.getCreationDate() > requestedToken.creationTimeOfLastRespondedIndex)) {
requestedPageStartIndexNumber--;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is it guranteed that the prevPage returned had all the elements less than (or) equal to creationTimeOfLastRespondedIndex ? Can we add it to comment and relevant method in comment.

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

❌ Gradle check result for 3bedc82: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -57,9 +59,11 @@
*/
public abstract class AbstractCatAction extends BaseRestHandler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want to change AbstractCatAction at all? we can keep the changes in AbstractListAction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added AbstractListAction which is now extending AbstractCatAction as discussed, please see if it resolves the concern now.

Comment on lines 70 to 71
if (!Objects.equals(paginatedQueryRequest.getSort(), "ascending")
&& !Objects.equals(paginatedQueryRequest.getSort(), "descending")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use asc or desc and also use constants instead of hard coded values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

return sort;
}

public String getRequestedTokenStr() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRequestedToken remove Str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

* Class specific to paginated queries, which will contain common query params required by a paginated API.
*/
@PublicApi(since = "2.17.0")
public class PaginatedQueryRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just PaginatedRequestParams

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to PageParams as discussed.

* @return boolean denoting whether the RestAction will output paginated responses or not.
* Is kept false by default, every paginated action to override and return true.
*/
public boolean isActionPaginated() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to add AbstractListAction instead of adding in AbstractCatAction as list are separate APIs anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

this.paginatedQueryResponse = getPaginatedResponseForRequestedToken(paginatedQueryRequest.getSize(), sortedIndicesList);
}

private static Predicate<IndexMetadata> getMetadataListFilter(String requestedTokenStr, String sortOrder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getIndicesFilter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

return indexMetadata -> {
if (indexMetadata.getIndex().getName().equals(requestedToken.nameOfLastRespondedIndex)) {
return false;
} else if (indexMetadata.getCreationDate() == requestedToken.creationTimeOfLastRespondedIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creationTimeOfLastRespondedIndex - LastIndexCreationTime
nameOfLastRespondedIndex. - LastIndexName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

};
}

private static Comparator<IndexMetadata> getMetadataListComparator(String sortOrder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getIndexComparator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

};
}

private List<IndexMetadata> getMetadataListForRequestedToken(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getIndicesForToken

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in the revision.

return sortedIndicesList.subList(0, Math.min(paginatedQueryRequest.getSize(), sortedIndicesList.size()));
}

private PaginatedQueryResponse getPaginatedResponseForRequestedToken(int pageSize, List<IndexMetadata> sortedIndicesList) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getResponseForToken

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not returning actual response?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this is just for returning the token. Renamed appropriately.

Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Signed-off-by: Harsh Garg <gkharsh@amazon.com>
Copy link
Contributor

❌ Gradle check result for 8a39c52: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cluster Manager enhancement Enhancement or improvement to existing feature or request v2.17.0
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Paginating ClusterManager Read APIs] Paginate _cat/indices API.
4 participants