Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry records manager: Able to search record by keywords & records list #3480

Closed
t83714 opened this issue Aug 28, 2023 · 3 comments
Closed
Assignees
Milestone

Comments

@t83714
Copy link
Contributor

t83714 commented Aug 28, 2023

Registry Records Manager: Able to search record by keywords & records list

To make the registry record manager more useful for people, we will need offer the basic keyword based searching functionality.

It should come with a record list and quick preview feature as well.

Technical notes:

  • Needs to find a way of doing full text searching in JSONB data with reasonable performance
  • Proper index should be created based on test result of at least the following approaches:
    • pg_trgm extension with Gin index
      • drawback: search JSONB as text i.e. key would be covered
    • leveraging the new full text search feature after postgres 11
      • mainly jsonb_to_tsvector: can generate lexemes for values only (excluding keys)
    • if we use full text search feature, we should make text search config is configurable (default to english)
@t83714 t83714 added this to the v2.3.0 milestone Aug 28, 2023
@t83714 t83714 self-assigned this Aug 28, 2023
@t83714 t83714 modified the milestones: v2.3.0, v2.3.1, v2.3.2 Aug 29, 2023
@t83714 t83714 moved this to Todo in Magda dev plan Aug 29, 2023
@t83714 t83714 moved this from Todo to In Progress in Magda dev plan Sep 4, 2023
@t83714
Copy link
Contributor Author

t83714 commented Sep 4, 2023

Based on tests, full text search feature seems provide reasonable result (at the same level as GIN with pg_trgm extension) but offer:

  • more features: lexemes based true full text search & support extra query syntax (via websearch_to_tsquery)
  • more accurate result: keys are excluded
  • same level performance (compared with GIN with pg_trgm extension plus LIKE search)

Index creation:

CREATE INDEX idx_data_full_text ON recordaspects 
USING GIN ((
 to_tsvector('english', recordid) ||
 to_tsvector('english', aspectid) ||
 jsonb_to_tsvector('english', data, '["string"]')
))

Query would be similar to:

select * from recordaspects where (
 to_tsvector('english', recordid) ||
 to_tsvector('english', aspectid) ||
 jsonb_to_tsvector('english', data, '["string"]')
) @@ websearch_to_tsquery('english', 'my search keywords')

by default, we will use 'english' search config for indexing & searching but allow users to change via helm chart config.
helm chart config changes won't impact existing index created. Users are required to re-created the index with SQL above (with config name changed)

@t83714
Copy link
Contributor Author

t83714 commented Sep 9, 2023

To make sure recordId & aspectId can be "exactly" matched via the full text search query API interface, we need to alter the index creation as:

CREATE INDEX idx_data_full_text ON recordaspects 
USING GIN (jsonb_to_tsvector('english'::regconfig, data, '["string"]'))

And Search would be similar to :

select * from recordaspects where (
 jsonb_to_tsvector('english'::regconfig, data, '["string"]') @@ websearch_to_tsquery('english'::regconfig, 'my search keywords')
OR recordid = 'my search keywords'
OR aspectid = 'my search keywords'
)

@t83714
Copy link
Contributor Author

t83714 commented Sep 26, 2023

closed via #3485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant