Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROQ Text matching with _id #102

Open
maxyinger opened this issue Mar 10, 2023 · 0 comments
Open

GROQ Text matching with _id #102

maxyinger opened this issue Mar 10, 2023 · 0 comments

Comments

@maxyinger
Copy link

Issue

Current api version: "2022-07-07"
(however, this behavior was seen on all other versions I tried as well)

Scenario
We have internationalization for documents setup where ids follow the pattern:

{
  _id: i18n.<id>.<lang>
}

Started to run into an issue searching filtering based off the _id with the text matching pattern:

*[_id match "*." + $lang]

This works for most documents, but seems to not be matching on a document with the following _id:

// query
*[_id match "*.fr"]

// data
[
  { _id: "i18n.page-2021.fr" } // not matching
  { _id: "i18n.page-abcd.fr" } // matches correctly
]

Testing fields outside of _id

I think this is specific to the _id field, because it seems to match fine on GROQ Arcade when I put the same string as a different field ie:

// query
*[title match "*.fr"]

// data
[
  { title: "i18n.page-2021.fr" } // matches fine here
]

Assumed Cause

I'm guessing it has something to do with an edge case involved in the tokenization of _id being handled differently than other fields. Specifically around the number being present right before the matched text ie:

// query
*[_id match "*.fr"]

// data
[
  { _id: "i18n.page-2021.fr" }, // not matching
  { _id: "i18n.page-abcd.fr" } // matches correctly
]  

Alternatives considered

Looked into path() filters, but it seems those only work when the wild card characters are at the end of the path ie:

*[_id in path("**.fr")]

// data
[
  { _id: "i18n.page-2021.fr" }, // no match
]  

Is there support for something like this with the path function? is our id structure simply not compatible? Would you recommend a different approach to filter all documents who's _id ends in .$lang ?

Documentation Feedback

In general, I found it tough to find resources on Sanity's tokenization approach and how that maps to text matching. There is this example at the bottom of the Text Matching Section of the Query Cheat Sheet, but it doesn't elaborate much beyond that it doesn't work:

// Note how match operates on tokens!
"foo bar" match "fo*"  // -> true
"my-pretty-pony-123.jpg" match "my*.jpg"  // -> false

It might be nice to link to the Full-Text Search Operators Section in the cheat sheet.

References

https://www.sanity.io/answers/issue-with-filtering-documents-using-match-query-in-elasticsearch
sanity-io/sanity#1913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant