You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current api version:"2022-07-07"
(however, this behavior was seen on all other versions I tried as well)
Scenario
We have internationalization for documents setup where ids follow the pattern:
{
_id: i18n.<id>.<lang>
}
Started to run into an issue searching filtering based off the _id with the text matching pattern:
*[_id match "*." + $lang]
This works for most documents, but seems to not be matching on a document with the following _id:
// query
*[_id match "*.fr"]
// data
[
{ _id: "i18n.page-2021.fr" } // not matching
{ _id: "i18n.page-abcd.fr" } // matches correctly
]
Testing fields outside of _id
I think this is specific to the _id field, because it seems to match fine on GROQ Arcade when I put the same string as a different field ie:
// query
*[title match "*.fr"]
// data
[
{ title: "i18n.page-2021.fr" } // matches fine here
]
Assumed Cause
I'm guessing it has something to do with an edge case involved in the tokenization of _id being handled differently than other fields. Specifically around the number being present right before the matched text ie:
// query
*[_id match "*.fr"]
// data
[
{ _id: "i18n.page-2021.fr" }, // not matching
{ _id: "i18n.page-abcd.fr" } // matches correctly
]
Alternatives considered
Looked into path() filters, but it seems those only work when the wild card characters are at the end of the path ie:
*[_id in path("**.fr")]
// data
[
{ _id: "i18n.page-2021.fr" }, // no match
]
Is there support for something like this with the path function? is our id structure simply not compatible? Would you recommend a different approach to filter all documents who's _id ends in .$lang ?
Documentation Feedback
In general, I found it tough to find resources on Sanity's tokenization approach and how that maps to text matching. There is this example at the bottom of the Text Matching Section of the Query Cheat Sheet, but it doesn't elaborate much beyond that it doesn't work:
// Note how match operates on tokens!
"foo bar" match "fo*" // -> true
"my-pretty-pony-123.jpg" match "my*.jpg" // -> false
Issue
Current api version:
"2022-07-07"
(however, this behavior was seen on all other versions I tried as well)
Scenario
We have internationalization for documents setup where ids follow the pattern:
Started to run into an issue searching filtering based off the
_id
with the text matching pattern:This works for most documents, but seems to not be matching on a document with the following
_id
:Testing fields outside of
_id
I think this is specific to the
_id
field, because it seems to match fine on GROQ Arcade when I put the same string as a different field ie:Assumed Cause
I'm guessing it has something to do with an edge case involved in the tokenization of
_id
being handled differently than other fields. Specifically around the number being present right before the matched text ie:Alternatives considered
Looked into
path()
filters, but it seems those only work when the wild card characters are at the end of the path ie:Is there support for something like this with the
path
function? is our id structure simply not compatible? Would you recommend a different approach to filter all documents who's_id
ends in.$lang
?Documentation Feedback
In general, I found it tough to find resources on Sanity's tokenization approach and how that maps to text matching. There is this example at the bottom of the Text Matching Section of the Query Cheat Sheet, but it doesn't elaborate much beyond that it doesn't work:
It might be nice to link to the Full-Text Search Operators Section in the cheat sheet.
References
https://www.sanity.io/answers/issue-with-filtering-documents-using-match-query-in-elasticsearch
sanity-io/sanity#1913
The text was updated successfully, but these errors were encountered: