Skip to content

Conversation

hrodmn
Copy link
Collaborator

@hrodmn hrodmn commented Sep 29, 2025

There is some ambiguity between the free-text STAC API extension and the OGC Records specification (stac-api-extensions/freetext-search#12), so I am not 100% sure this is the right thing to do, but right now if you pass a string with two terms separated by spaces (but not quoted), pgstac will throw a tsquery syntax error (https://github.com/stac-utils/pgstac/actions/runs/18101825325/job/51506541624).

From the OGC Records Spec:

For multiple search terms that are white space separated, only records that contain all the search terms specified, in the order specified and separated by any number of white spaces in one or more of the searched text fields SHALL be in the result set.

This PR replaces spaces with & which I think is the behavior that most closely aligns with the specification but does not account for the order of the terms like the spec suggests. I tried using distance operators (e.g. <10>) but that checks for an exact distance between terms ("distance equals 10") instead of checking for "less than or equal to 10".

@hrodmn hrodmn changed the title fix: use adjacency operator to represent non-quoted, space-separated terms fix: combine space-separated terms with & operator in free-text search Sep 29, 2025
@hrodmn hrodmn marked this pull request as ready for review September 29, 2025 18:49
@hrodmn hrodmn linked an issue Sep 29, 2025 that may be closed by this pull request
@vincentsarago
Copy link
Member

re-reading the spec, word1 word2 doesn't mean word1 AND word2. Could we just quote the expression?

@hrodmn
Copy link
Collaborator Author

hrodmn commented Sep 30, 2025

re-reading the spec, word1 word2 doesn't mean word1 AND word2.

It depends if you are looking at the OGC Records spec or the free-text search STAC API extension spec 😅. I think the collection-search extension means to be based on the OGC Records spec but it is not correctly interpreting that particular case (see stac-api-extensions/freetext-search#12 for discussion).

I think the OGC spec does mean space-separated words like word1 word2 to be word1 AND word2: "For multiple search terms that are white space separated, only records that contain all the search terms specified, in the order specified and separated by any number of white spaces in one or more of the searched text fields SHALL be in the result set."

Could we just quote the expression?

Yeah, that would probably be pretty close to the same thing as using the distance operator I think, but might give users false negatives if there are indeed extra whitespaces between the terms in the text fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

free-text search with unquoted, space-separated terms does not work
2 participants