Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search in GlobalView #45

Open
Nolski opened this issue Sep 14, 2020 · 2 comments
Open

Search in GlobalView #45

Nolski opened this issue Sep 14, 2020 · 2 comments
Assignees

Comments

@Nolski
Copy link
Contributor

Nolski commented Sep 14, 2020

Overview

GlobalView refers to a single instance of Mediawiki which can query every single project running on a single torque instance. A key feature in GlobalView will be able to run search queries and get results from every project in Torque.

Suggested Solution

I propose that we use the Postgresql Full Text Search functon instead of maintaining a separate database and series of caches as we do now. The benfits I see this approach bringing are:

  • By keeping our search functionality within Postgres we can minimize the number of places that we need to update data when updates are made.
  • Search queries can be made through our ORM which will simplify development with regards to search.
  • We will not need to maintain a search specific database.

An example of the ORM utilizing postgres full text search can be found here

Potential Difficulties

A requirement of search is that our searchable items have a significant amount of unstructured data which only a portion of which must be searchable depending on the permissions of the user performing the search.

A workaround that I propose is that we maintain a column for each permission type. We will then sort the unstructured data into the various permission columns. When we make search queries, depending on the user's permissions, we will query the necessary columns. An example of our searchable table is below:

proposal_name proposal_date ... unstructured_data_confidential unstructured_data_sensitive
Proposal 1 15 July 2020 ... { ... } { ... }
Proposal 2 15 July 2020 ... { ... } { ... }
Proposal 2 15 July 2020 ... { ... } { ... }

note:
I strongly advise that we strive to remove unstructured proposal data wherever possible. Not doing so will make it increasingly difficult to make use of the features relational databased provide to us.

Alternative Solution

I will leave this area sparse for others to fill in and suggest. Our current alternative that I see is we continue to maintain a woosh database with a series of caches for each user type. Continuing to to do this will mean costly rebuilding of search indexes when modifications are made.

@kfogel
Copy link
Member

kfogel commented Sep 15, 2020

Thanks, @Nolski. I don't completely understand what "a column for each permission type" means. A concrete example of a "permission type" in one of our actual Torque instances might clarify it for me. (Feel free to wait until our conversation tomorrow to answer this.)

@Nolski
Copy link
Contributor Author

Nolski commented Sep 18, 2020

@kfogel Unfortunately I'm not exactly clear on how we split what data can and can't be seen by what users. I have been unable to find any documentation describing out permissions model but in general, applications often have a set of permissions where each permission gives access to certain data. For each permission that we determine to exist, we will have a column. Each cell of that column will contain the data which a user with that permission, should have access to. That data will be searchable if the user has that permission.

YaxelPerez added a commit that referenced this issue Sep 22, 2020
This commit rewrites every API endpoint except /search/<sheet_name>,
which requires further consideration as discussed in issue #45. Input
validation reserved for future commits; views will not fail gracefully.
Nolski pushed a commit that referenced this issue Oct 14, 2020
This commit rewrites every API endpoint except /search/<sheet_name>,
which requires further consideration as discussed in issue #45. Input
validation reserved for future commits; views will not fail gracefully.
frankduncan pushed a commit that referenced this issue Mar 21, 2021
This commit rewrites every API endpoint except /search/<sheet_name>,
which requires further consideration as discussed in issue #45. Input
validation reserved for future commits; views will not fail gracefully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants