Skip to content

Query DSL: Terms filter to allow for terms lookup from another document #2674

@kimchy

Description

@kimchy

The terms filter requires providing all the terms as part of the filter itself. Allow to automatically extract them from an external document.

Here is an example:

# index the information for user with id 2, specifically, its friends
curl -XPUT localhost:9200/users/user/2 -d '{
   "friends" : ["1", "3"]
}'

# index a tweet, from user with id 2
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
   "user" : "2"
}'

# search on all the tweets that match the friends of user 2
curl -XGET localhost:9200/tweets/_search -d '{
  "query" : {
    "filtered" : {
        "filter" : {
            "terms" : {
                "user" : {
                    "index" : "users",
                    "type" : "user",
                    "id" : "2",
                    "path" : "friends"
                },
                "_cache_key" : "user_2_friends"
            }
        }
    }
  }
}'

The above is higly optimized, both in a sense that the list of friends will not be fetched if the filter is already cached in the filter cache, and with internal LRU cache for fetching external values for the terms filter. Also, the entry in teh filter cache will not hold all the terms reducing the memory required for it.

_cache_key is recommedned to be set, so its simple to clear the cache associated with it using the clear cache API. For example:

curl -XPOST 'localhost:9200/tweets/_cache/clear?filter_keys=user_2_friends'

The structure of the external terms document can also include array of inner objects, for example:

curl -XPUT localhost:9200/users/user/2 -d '{
   "friends" : [
     {
       "id" : "1"
     },
     {
       "id" : "2"
     }
   ]
}'

In which case, the lookup path will be friends.id.

There is an additional cache involved, which caches the lookup of the lookup document to the actual terms. It is by default set to 10mb LRU size, but can be explicitly set using indices.cache.filter.terms.size.

Also, consider using an index with a single shard and fully replicated across all nodes if the "reference" terms data is not large. The lookup terms filter will prefer to execute the get request on a local node if possible, reducing the need for networking.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions