-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Giving the tsvector and tsquery stuff a shot. #1019
Conversation
describe "match" do | ||
it "uses @@" do | ||
name.to_tsvector.match("jeb").to_sql.should eq ["SELECT #{QueryMe::COLUMN_SQL} FROM users WHERE TO_TSVECTOR(users.name) @@ TO_TSQUERY($1)", "jeb"] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're onto something with this. It's a nice syntax, and if I'm reading it right I think it means this'll allow me to specify a column type as a tsvec too, which seems necessary to get this to provide reasonably fast searching.
Postgres provides a few variations on to_tsvector that do things a little differently when it comes to operators and escaping. I ended up using websearch_to_tsquery
, and that may be what the intent here is -- it provides for correctly using boolean, unary, negation, and quotation operators as you might expect a web search to do. Maybe there's room to implement all of this function field or maybe Avram should just pick one? I'm not really sure.
The other thing that seems pretty important is the ts_rank stuff which allows you to actually sort results by relevance. ts_rank + websearch_to_tsquery got me a pretty usable search interface quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add websearch_to_tsquery
easily in the same manner here. I'll have to look at the ranking stuff, that does seem pretty important. The extra options are where I got a little concerned. I may be able to add those in, but I wasn't sure how often they're really used. Like, if it's a very small % of people that need the extra options like to_tsquery('english', 'whatever')
then maybe they just fall back to raw SQL again.
The other thing here is I'm making a huge assumption that the right side of @@
will always be cast to tsquery although postgres does allow a few variations. It's just that we don't have a way to map to a Crystal side TsQuery object, so I have to make some assumption on it. My main hope is that this is good enough for the 80% until someone comes along and ports pg_search to crystal 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the parametric index (ts_vector('english'...)
) is out of reach for this stage. In order to accomplish that I'd probably reach for something more search-domain specific than the general purpose Avram query language.
I've posted this elsewhere, but I'll include it here for posterity.
In a migration and model, I setup a ts_vector column like this:
# in the migration:
execute <<-SQL
ALTER TABLE channels
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(summary, ''))
) STORED;
SQL
# and in the model:
table :channels do
column search_vector : String?
end
Then my search query is this:
sql = <<-SQL
WITH search_results AS (
SELECT
channels.id AS channel_id,
ts_rank_cd(channels.search_vector, websearch_to_tsquery($1)) AS rank
FROM channels
WHERE channels.search_vector @@ websearch_to_tsquery($1)
ORDER BY rank DESC
LIMIT 25
)
SELECT
channels.*
FROM search_results
JOIN channels ON search_results.channel_id = channels.id
SQL
AppDatabase.query_all(sql, query_input, as: PoDB::Channel)
The ts_rank_cd stuff is "under the hood," I think. I wonder if it would be possible to have a syntax like this: column.search(query)
which:
- is smart enough to know if the column type is a ts_vector or should be wrapped in a to_tsvector call -- or perhaps issues a warning if it's not?
- wraps query in a websearch_to_tsquery
- runs the match operator
- sets a default order by key
Is that the right thing to do when someone writes that code? Is that even achievable? Is that too much magic?
As an aside, I had to construct this as a CTE because Avram doesn't know what to do with the rank
column, but I need to SELECT it in order to have postgres calculate the rank in order to sort the results. If Avram could discard the column in the response, it would be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oo nice. That helps seeing the example.
Ref #353
This adds in a new
match
method along withto_tsquery
andto_tsvector
to do a bit of full text searching. The implementation technically generates valid sql, but is this "good enough" for a start? 🤷♂️Before I merge this in, I want to get some feedback and see if this approach will be ok for built-in, or if this will require a lot more planning.