Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option for "track_total_hits" on AppSync Searchable Template #1079

Open
1 of 2 tasks
MtthwBrwng opened this issue Dec 8, 2022 · 1 comment
Open
1 of 2 tasks

Comments

@MtthwBrwng
Copy link

MtthwBrwng commented Dec 8, 2022

Describe the feature you'd like to request

Currently total response is capped at 10,000 since that's the maximum default. This appears to be due to a breaking changed that was made in ElasticSsearch 7.x which added "track_total_hits": true as an option. When set to false (and the current default) it is capped to 10,000, mainly from what I can understand due to performance tuning since to derive the accurate count you have to hit every record that is match vs 10,000 and stop. Reading through the documentation it also appears to be able to accept an Int to manually override the number of tracked hits.

Elastic Search Track Total Hit Docs

Describe the solution you'd like

Add an additional option to the @searchable directive- for example @searchable(trackTotalHits: true) to allow end user to specify the expected behavior. This'll allow the total field to return an accurate count.

Since track_total_hits can also accept a number possibly allow for setting the maximum tracked hits - for example @searchable(trackTotalHits: 25000) would limit the total tracked hits returned to 25,000

This'll allow end users to be as granular as needed for their own performance tuning/needs and not cause any breaking changes since it's an optional param that defaults to false currently.

Changes required appear to be in both amplify-graphql-searchable-transformer and the graphql-mapping-template from a quick look into the code.

Describe alternatives you've considered

Since I'm using Amplify the current solution is to manually override the generated .vtl resolver to include "track_total_hits": true, in the request to OpenSearch, as shown below.

#set( $args = $util.defaultIfNull($ctx.stash.transformedArgs, $ctx.args) )
#set( $indexPath = "/plant/_search" )
#set( $allowedAggFields = $util.defaultIfNull($ctx.stash.allowedAggFields, []) )
#set( $aggFieldsFilterMap = $util.defaultIfNull($ctx.stash.aggFieldsFilterMap, {}) )
#set( $nonKeywordFields = ["createdAt", "updatedAt"] )
#set( $keyFields = ["id"] )
#set( $sortValues = [] )
#set( $sortFields = [] )
#set( $aggregateValues = {} )
#set( $primaryKey = "id" )
#if( !$util.isNullOrEmpty($args.sort) )
  #foreach( $sortItem in $args.sort )
    #if( $util.isNullOrEmpty($sortItem.field) )
      $util.qr($sortFields.add($primaryKey))
    #else
      $util.qr($sortFields.add($sortItem.field))
    #end
    #if( $util.isNullOrEmpty($sortItem.field) )
      #if( $nonKeywordFields.contains($primaryKey) )
        #set( $sortField = $util.toJson($primaryKey) )
      #else
        #set( $sortField = $util.toJson("${primaryKey}.keyword") )
      #end
    #else
      #if( $nonKeywordFields.contains($sortItem.field) )
        #set( $sortField = $util.toJson($sortItem.field) )
      #else
        #set( $sortField = $util.toJson("${sortItem.field}.keyword") )
      #end
    #end
    #if( $util.isNullOrEmpty($sortItem.direction) )
      #set( $sortDirection = $util.toJson({"order": "desc"}) )
    #else
      #set( $sortDirection = $util.toJson({"order": $sortItem.direction}) )
    #end
    $util.qr($sortValues.add("{$sortField: $sortDirection}"))
  #end
#end
#foreach( $keyItem in $keyFields )
  #if( !$sortFields.contains($keyItem) )
    #if( $nonKeywordFields.contains($keyItem) )
      #set( $sortField = $util.toJson($keyItem) )
    #else
      #set( $sortField = $util.toJson("${keyItem}.keyword") )
    #end
    #set( $sortDirection = $util.toJson({"order": "desc"}) )
    $util.qr($sortValues.add("{$sortField: $sortDirection}"))
  #end
#end
#foreach( $aggItem in $args.aggregates )
  #if( $allowedAggFields.contains($aggItem.field) )
    #set( $aggFilter = { "match_all": {} } )
  #elseif( $aggFieldsFilterMap.containsKey($aggItem.field) )
    #set( $aggFilter = { "bool": { "should": $aggFieldsFilterMap.get($aggItem.field) } } )
  #else
    $util.error("Unauthorized to run aggregation on field: ${aggItem.field}", "Unauthorized")
  #end
  #if( $nonKeywordFields.contains($aggItem.field) )
    $util.qr($aggregateValues.put("$aggItem.name", { "filter": $aggFilter, "aggs": { "$aggItem.name": { "$aggItem.type": { "field": "$aggItem.field" }}} }))
  #else
    $util.qr($aggregateValues.put("$aggItem.name", { "filter": $aggFilter, "aggs": { "$aggItem.name": { "$aggItem.type": { "field": "${aggItem.field}.keyword" }}} }))
  #end
#end
#if( !$util.isNullOrEmpty($ctx.stash.authFilter) )
  #set( $filter = $ctx.stash.authFilter )
  #if( !$util.isNullOrEmpty($args.filter) )
    #set( $filter = {
  "bool": {
      "must":     [$ctx.stash.authFilter, $util.parseJson($util.transform.toElasticsearchQueryDSL($args.filter))]
  }
} )
  #end
#else
  #if( !$util.isNullOrEmpty($args.filter) )
    #set( $filter = $util.parseJson($util.transform.toElasticsearchQueryDSL($args.filter)) )
  #end
#end
#if( $util.isNullOrEmpty($filter) )
  #set( $filter = {
  "match_all": {}
} )
#end
{
  "version": "2018-05-29",
  "operation": "GET",
  "path": "$indexPath",
  "params": {
      "body":     {
                #if( $context.args.nextToken )"search_after": $util.base64Decode($args.nextToken), #end
                #if( $context.args.from )"from": $args.from, #end
                "size": #if( $args.limit ) $args.limit #else 100 #end,
                "sort": $sortValues,
                "version": false,
                "query": $util.toJson($filter),
                "aggs": $util.toJson($aggregateValues),
                "track_total_hits": true,
                }
  }
}

Additional context

I understand the general concept of what will be needed to add this feature request, only uncertainties are regarding getting values from the directive and brining that into the function for generating the .vtl resolver. With a bit of guidance I could most likely figure this out and implement it.

Is this something that you'd be interested in working on?

  • 👋 I may be able to implement this feature request

Would this feature include a breaking change?

  • ⚠️ This feature might incur a breaking change
@MtthwBrwng MtthwBrwng changed the title Add Option for "track_total_hits" for AppSync Searchable Template Add Option for "track_total_hits" on AppSync Searchable Template Dec 8, 2022
@chrisbonifacio
Copy link
Member

chrisbonifacio commented Dec 9, 2022

Hey @MtthwBrwng, thanks for opening this feature request! Glad to see that manually editing the resolver to include track_total_hits did work! I'll bring this to the team's attention to discuss internally and see if this can be the default behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants