Feature Proposal: onlyIncludeFilterMatches
#46
Replies: 4 comments 6 replies
-
I'm definitely interested in this feature! I think your choices on the edge cases make sense. The lack of pagination makes me a little nervous. We do have some nested lists that can get really big (thousands of items). I think it's probably manageable for us, but maybe not for everyone. The naming "onlyFilterMatches" is a little confusing to me, although I agree with your reasoning for it. I also wonder if we could put it closer to the rest of the filters instead of on the selection part of the query. I think of it more as a filter than a selector. Another angle to consider -- what if I want to unnest and sort the nested items? For example, let's say I want to find all albums longer than X and sort them by their length? And perhaps I also want to include the artist's name alongside each album. I don't think this would be possible with the current syntax. That's probably an entirely different feature, though. We'd probably need to be able to query nested objects directly, as if they were top-level objects. |
Beta Was this translation helpful? Give feedback.
-
Thanks for weighing in here @kevincraft-toast!
Long term, I think we should probably support pagination here (the API feels incomplete until we do so) but in the near team I didn't want to unnecessarily enlarge the work for this proposal. Today, pagination is not supported on nested fields defined as a list, and it is supported on nested fields defined with
What's confusing about it? And do you have alternate suggestions?
I'd be interested in seeing a concrete proposal for what this syntax would be like (want to whip one up?), although putting it as an argument on the field it impacts (e.g.
That's an interesting idea. Something like this perhaps? query Platinum90sAlbums {
artistAlbums(filter: {
soldUnits: {gte: 1000000}
releasedOn: {gte: "1990-01-01", lt: "2000-01-01"}
}) {
nodes {
name
releasedOn
soldUnits
artist {
name
}
}
}
} ....where it would search the I hadn't considered that to solve this problem until you brought it up. From an API design perspective, I quite like it. However, I think it's probably an order-of-magnitude more difficult to implement (maybe 2 orders of magnitude...) as it is a fundamentally different kind of query, and requires more massaging of the data from Elasticsearch/OpenSearch back into the response structure. So my overall thought is that it's a worthwhile feature to propose and implement on its own, but shouldn't preclude us from offering |
Beta Was this translation helpful? Give feedback.
-
Agree - and I think it's the most natural functionality for a client who sometimes provides a filter, and sometimes doesn't when they want to match everything.
Agree, if ES can support it
Yeah, it sometimes reads a little funny depending how I look at it. Tossing out other ideas...
|
Beta Was this translation helpful? Give feedback.
-
Also like the choices here. From a related entities perspective this looks like just the ticket. Examples that resonate: All Artists with an album that contains a track with a specific name - only return the matching tracks |
Beta Was this translation helpful? Give feedback.
-
Context
Current ElasticGraph API
ElasticGraph supports filtering on
nested
documents via theanySatisfy
predicate. Here's an example, from our docs, of what that looks like:However, as noted on that page, there's a caveat to how this works:
While this is reasonable behavior (after all, the
filter
argument is on theartists
field, not on thealbums
field!), I believe there is room to improve here. If a client's primary interest is the nested type (albums, in this case), then they often want to get back only the nested matches, and we'd like to offer a query API feature to achieve that.Elasticsearch/OpenSearch
inner_hits
FeatureElasticsearch and OpenSearch both offer an
inner_hits
feature that should meet our needs here. In the above case of artists and albums, it would allow ElasticGraph to receive the specific albums that matched the filter.Proposal
I propose that we offer a new
onlyIncludeFilterMatches: Boolean
argument onnested
fields. Here's a modified form of the example query from above showing how it would look:The idea behind
onlyIncludeFilterMatches
here is to only return thealbums
that matched the filter. Albums that sold less than1000000
units, or were released outside of the 1990s, would not be returned.As a basic feature, this is straightforward, but there are some interesting edge cases we need to consider.
Edge Case:
onlyIncludeFilterMatches
When There's no FilterConsider this query:
In this case, we're not filtering on
albums
at all, butonlyIncludeFilterMatches: true
has still been provided. In this case, I believe ElasticGraph should return all albums. While there's no explicit filter onalbums
, it's reasonable to consider every query to have an implicit default filter that matches every document. (LikeSELECT * FROM table WHERE TRUE
in SQL terms). With that idea in mind, returning allalbums
makes sense in this case. Moreover, I think the alternatives would all make for a sub-optimal client experience:filter
, noalbums
should be returned. But when a client requestsalbums
to be returned, they clearly want to get some back.onlyIncludeFilterMatches: true
and try again, but I think this is likely to be frustrating for clients.Edge Case:
onlyIncludeFilterMatches
at a Deeply Nested LevelConsider this query:
This query is searching for long album tracks (at least 30 minutes in length!).
onlyIncludeFilterMatches: true
has been provided at thetracks
level, but not at thealbums
level. There are a couple options for how this could behave:name
but an empty list of tracks, even though the album has tracks).I prefer option 2, because it's more explicit and flexible. The client can use
onlyIncludeFilterMatches
at both levels if they want option 1:If we instead made option 1 the default semantic, I don't think there's any similar way a client could opt-in to option 2.
Of course, we may be restricted here based on what Elasticsearch/OpenSearch exposes. However, the example shown at Hierarchical levels of nested object fields and inner hits makes me believe it's possible. The
_source
for the overallartists
hit will include all nested albums and tracks on the artist. Theinner_hits
on the response includesoffset
information for each level of nesting. I believe this can be used to (optionally) filter the_source
albums
down to the ones with long tracks when the client passesalbums(onlyIncludeFilterMatches: true) { ... }
.Edge Case:
onlyIncludeFilterMatches
at a Less Shallow Level Compared to the Filter CriteriaHere's another interesting edge case to consider:
This is similar to the last case, but
onlyIncludeFilterMatches
is only being applied at thealbums
level even though the filter criteria is at thetracks
level. The client may (or may not) request the tracks.To be consistent, I believe this should only return the albums that have long tracks. If
tracks
is requested on those albums withoutonlyIncludeFilterMatches: true
, then all tracks on the returned albums should be returned.Feature Name
Originally, I was planning to call this featureonlyMatches
. But I think theFilter
inonlyFilterMatches
provides a contextual clue as to what the argument does:Filter
refers back to thefilter:
argument onartists
. However, I'm realizing thatonlyFilterMatches
can be misread: if you interpret "filter" as a noun, I think it's meaning is pretty clear, but if you interpret it as a verb (e.g. "only filter the matches"), it could be confusing. Please weigh in on what name you'd like to see for this feature!We settled on the name
onlyIncludeFilterMatches
in the discussion below.Inner Hits Options
The Elasticsearch docs mention a few options that are supported here. In the future,
sort
could potentially be useful to expand on this feature, but I consider it out of scope at this time. Thefrom
andsize
options could probably be used to offer some form of pagination in the future, which I again consider to be out of scope at this time.However, we probably need to use the
size
option:Returning only the first three hits isn't ideal. We have a couple of options here:
size
(e.g.Int.MAX_VALUE
) so that we always get back all results.first:
argument alongsideonlyIncludeFilterMatches
so that the client can control it. When not provided, we would honor thegraphql.default_page_size
setting.I lean towards option 1. Option 2 introduces additional complexity. Right now, we return all nested documents; changing it to return the first N would be a semantic change that might impact clients, and is probably only useful if we're also going to expose
pageInfo
at that level so that clients can see if there are more results. I'd rather not take that on at this time.Nested Relay Connections
As it happens, ElasticGraph already supports a way to expose a nested list as a paginated relay connection field. To use this feature, you use
paginated_collection_field
instead offield
:However, this feature is not generally recommended, and is very rarely used. The implementation does not have much efficiency benefit: all elements of the collection will be retrieved when fetching this field from the datastore. The pagination implementation will just trim down the collection before returning it. In addition, the cursor just encodes a numeric index, and if the underlying data is updated between page requests, the cursor may not work as expected. And it may not work correctly when combined with
onlyIncludeFilterMatches: true
.In the future, we may want to leverage the
size
/from
options supported oninner_hits
forpaginated_collection_field
. For now, I'd like to do whatever is simplest, while avoiding bugs and keeping our future options open:onlyIncludeFilterMatches: true
, the great!onlyIncludeFilterMatches: true
argument from the schema for a nestedpaginated_collection_field
. The door will remain open to get them to work together correctly in the future.Consequences
ElasticGraph will offer a powerful new ability for clients to control which nested documents get returned. As per the discussion above, we can ensure it behaves consistently and works correctly with all existing features (with the possible exception of
paginated_collection_field
, where we may choose to omit it).Feedback Wanted
If you have interest in this feature, please review my proposal and let me know what you think!
Beta Was this translation helpful? Give feedback.
All reactions