Skip to content
This repository has been archived by the owner on Feb 23, 2025. It is now read-only.

pref: add fulltext index for bookmarkEntries.text #382

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

hi-ogawa
Copy link
Owner

@hi-ogawa hi-ogawa commented May 11, 2023

todo

  • probably ngram is useless for latin language text
  • is ordering deterministic?
  • check if planetscale supports it
  • check perf, index usage
  • migration staging
pnpm knex-staging migrate:up
pnpm cli-staging resetBookmarkEntriesTextCharacters
  • migration production
pnpm knex-production migrate:up
pnpm cli-production resetBookmarkEntriesTextCharacters

@hi-ogawa hi-ogawa changed the title Perf ngram fulltext pref: add ngram fulltext index for bookmarkEntries.text May 11, 2023
@hi-ogawa hi-ogawa changed the title pref: add ngram fulltext index for bookmarkEntries.text pref: add fulltext index for bookmarkEntries.text May 11, 2023
@hi-ogawa

This comment was marked as outdated.

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented May 11, 2023

It seems we cannot search single character unless you can change server config ngram_token_size (which probably we don't for planetscale db).

Unfortunately, for CJK usage, it's very natural desire to look up with single character especially for beginner, so I might need to come up with some hack.

image

image

EDIT:

Okay, here is a workaround. Only works to search (character)*. Unfortunately, this means we cannot search for common grammatic suffix e.g. ...요, ...고, etc... But probably such need is for super beginner, so it might be fine to not support it.

image

@@ -11,5 +11,6 @@ CREATE TABLE `bookmarkEntries` (
PRIMARY KEY (`id`),
KEY `bookmarkEntries_userId_createdAt_key` (`userId`,`createdAt`),
KEY `bookmarkEntries_videoId_key` (`videoId`),
KEY `bookmarkEntries_captionEntryId_key` (`captionEntryId`)
KEY `bookmarkEntries_captionEntryId_key` (`captionEntryId`),
FULLTEXT KEY `text` (`text`) /*!50100 WITH PARSER `ngram` */
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skeema put extra trailing space after magic comment.
It's a bit too annoying. Not sure if there's any workaround.

@hi-ogawa
Copy link
Owner Author

hi-ogawa commented May 11, 2023

It feels very redundant but maybe to support single character search, we can add extra "multi valued" index https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-multi-valued

@hi-ogawa hi-ogawa marked this pull request as ready for review May 11, 2023 08:42
@hi-ogawa hi-ogawa changed the base branch from master to main December 31, 2023 05:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: improve bookmark text search
1 participant