Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite full-text search support #4823

Open
Tracked by #22950
natemcmaster opened this issue Mar 17, 2016 · 18 comments
Open
Tracked by #22950

SQLite full-text search support #4823

natemcmaster opened this issue Mar 17, 2016 · 18 comments

Comments

@natemcmaster
Copy link
Contributor

SQLite's FTS3 and FTS4 (full-text search) extension modules allow users to create special tables with a built-in full-text index. This extension is usually enabled by default. (I checked: most platforms we support have a version of SQLite compiled with ENABLE_FTS3).

To leverage this, the create table syntax and query syntax are slightly different.

CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT);     /* FTS3 table */
CREATE TABLE enrondata2(content TEXT);                        /* Ordinary table */
SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* FTS search -- fast */
SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* regular search -- slow */

To enable this in EF, we would need to add some kind of configuration option to specific which tables are FTS and make query respond accordingly.

In the meantime, users can work around this with MigrationBuilder.Sql and FromSql.

More docs: https://www.sqlite.org/fts3.html

@rowanmiller
Copy link
Contributor

combined into #1590

@divega divega changed the title Support FTS (full-text search) in SQLite SQLite full-text search support Mar 29, 2018
@divega
Copy link
Contributor

divega commented Mar 29, 2018

Un-combining.

@divega divega reopened this Mar 29, 2018
@Trolldemorted
Copy link

You are talking about fts 3 and 4, will you also support 5?

@bricelam
Copy link
Contributor

Now that SQLitePCL.raw supports FTS5, we'd probably just start with support for that.

@bricelam
Copy link
Contributor

bricelam commented Mar 30, 2018

Some notes:

  • You create an FTS-enabled table instead of an FTS index
  • You can use the match() function to search
    SELECT * FROM email WHERE match($query, body)
  • To search all columns, pass the table identifier as the second argument (super weird)
    SELECT * FROM email WHERE match($query, email)
  • The bm25() (rank), highlight() and snippet() functions also take the table identifier as an argument

@bricelam
Copy link
Contributor

bricelam commented Apr 13, 2018

Daydreaming about this some more:

LINQ like the this...

from e in db.Emails
where EF.Functions.Match(e, query)
orderby EF.Functions.Bm25(e)
select new
{
    e.Id,
    Subject = EF.Functions.Highlight(e.Subject, "<b>", "</b>"),
    Body = EF.Functions.Snippet(e.Body, "<b>", "</b>", "...", 64)
};

...would produce SQL like this.

select
    id,
    highlight(email, subject, '<b>', '</b>') as subject,
    snippet(email, body, '<b>', '</b>', '...', 64) as body
from email
where match(email, $query)
order by bm25(email);

Notice how expressions like Highlight(e.Subject, ...) are translated into highlight(email, subject, ...) and you could write either Match(e, ...) (translates to match(email, ...)) or Match(e.Body, ...) (translates to match(body, ...)).

Obviously, Match, Bm25, Highlight, and Snippet cannot be evaluated on the client and should throw.

@bricelam
Copy link
Contributor

bricelam commented Apr 13, 2018

Note, some functionality is already possible in EF Core today.

  • You can abuse the == operator instead of using match()
  • You can map the hidden rank column instead of using bm25()
  • Of course, you can always drop down to FromSql and write the queries yourself
from e in db.Emails
where e.Body == query
orderby e.Rank
select e;

@groege
Copy link

groege commented Jan 20, 2021

order by bm25(email)

would order by bm25 also work for special letters like umlauts: äöü or something like ß?

@bricelam
Copy link
Contributor

bricelam commented May 5, 2021

@groege It depends on your tokenizer.

@Kalle4242
Copy link

After realizing that all these great features and "daydreams" have not become part of the latest efcore version, is there any announcement when (or if) one can expect this will part of a future version?
Is it possible to up vote the features for upcoming versions?

@ajcvickers
Copy link
Member

@Kalle4242 Yes: 👍 on the issue. We will be publishing a plan for EF7 soon. See release planning for more info.

@Kalle4242
Copy link

Kalle4242 commented Dec 1, 2021

I have a problem calling a select/search statement with FromSqlRaw on a fts5 virtual table. I have two test projects, both using the same model, but I get different results calling the same statement:

        protected override void Up(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.Sql(@"CREATE VIRTUAL TABLE AssetSearch USING fts5(AssetCommonId UNINDEXED, AssetVersionId 
                                                    UNINDEXED, AssetPartId UNINDEXED, Title, Body, Comment, Notice)");
        }

public IQueryable<AssetSearchDbo> Search(string question)
 {
            string query =
                "SELECT " +
                "   \"AssetSearch\".RowId " +
                " , \"AssetSearch\".AssetCommonId " +
                " , \"AssetSearch\".AssetVersionId " +
                " , \"AssetSearch\".AssetPartId " +
                " , snippet( \"AssetSearch\", 3, '>>', '<<', '...', 8 ) As Title " +
                " , snippet( \"AssetSearch\", 4, '>>', '<<', '...', 8 ) As Body " +
                " , snippet( \"AssetSearch\", 5, '>>', '<<', '...', 8 ) As Comment " +
                " , snippet( \"AssetSearch\", 6, '>>', '<<', '...', 8 ) AS Notice " +
                " , bm25(\"AssetSearch\") As Rank " +
                " FROM \"AssetSearch\" " +
               $" WHERE \"AssetSearch\" MATCH ' - {{{{AssetCommonId AssetVersionId AssetPartId}}}} : {question}' " +
                " ORDER BY Rank";
       IQueryable<AssetSearchDbo> searchResults = AssetSearch.FromSqlRaw<AssetSearchDbo>(query);
      return searchResults;
 }

When inspecting the searchResults in one case an enumeration of AssetSearchDbo objects is returned and the results of snippet and rank functions are not computed. The snippet fields contain the entire field values without tagging the "match term" and rank is null.
In the context of the second test project an enumeration of Castle.Proxies.AssetSearchDboProxy objects with correct computed snippet and rank functions is returned.
Furthermore in the second case the AssetSearch DbSet consist already of those proxy objects.

Can anybody imagine which difference makes it work either this or that way?
How can I force the DbContext to use LazyLoading in first case as well (yes it is set in the OptionsBuilder)?
Or, the other way round, what is going wrong in the first case, that lazyloading is not used there?

Maybe there is someone knowing more about the combination of efcore, lazy loading and fts5.
By the way I'm working with VS2022 and Net6.0.
Thanks.

@Kalle4242
Copy link

I'm working on it myself. A difference between the two test project is, that in first on the context is kept open during the entire test. In the second one a new DbContext is created for each test method.
I changed the first one in a way it creates a new DbContext before the search method is called. And this worked!!!
Finding1: At least one prior action on the DbContext leaves it in a state, that it does not work properly in the described case. Maybe s.o. can guess, what this can be.
Finding2: For the moment, there's work-around! Or maybe it is even recommendable to work that way. Maybe s.o. has an advice here.

@radoslawkosinski
Copy link

Do you have any sample project on how to use FTS5 on SQLite EF .NET Core? I followed https://www.bricelam.net/2020/08/08/sqlite-fts-and-efcore.html
but not sure if I understand correctly, I have few problems:
there is FTSPost model which should create "normal" table, however we replace migration of this table with manual migration so the virtual table is created with rowid and Content column:
CREATE VIRTUAL TABLE FTSPost USING fts5(Content);
Now, the FTSPost table(FTSPost virtual table) doesn't have the properties needed to one to one mapping with Post table(which are defined in FTSPost model):
class FTSPost
{
public int RowId { get; set; }
public Post Post { get; set; }
public string Content { get; set; }
public string Match { get; set; }
public double? Rank { get; set; }
}

Can you share the migration and the data model created by this?

@Kalle4242
Copy link

Sorry. I didn't manage to run Brice's example. It also didn't match my requirements. The search content of my solution is not in a database table. My database is a metadata database for different assets which our solution handles. The search data is extracted from the content of those assets and only should be separately stored to fts5 tables for full-text-search purpose. It is not required, so it is superfluous in the metadata database.
As you can see I'm selecting my search hits via a FromSqlRaw call. This is closer to the example of VahidN https://github.com/VahidN/EFCoreSQLiteFTS
But my fts5 virtual table representing object contains a reference to a table of the metadata database. I added it by accident, maybe Brice's example in mind. I did not add it to the modelbuilder in the OnModelCreating method and it workedf or me.
Because it's a navigation property you should make it virtual
public virtual Post Post {get; set; }
Attention: If you add an additional automatic database migration, an index and a foreign key will be added to the up method and deleting them to the down method for this reference. You have to delete this statements. Leaving it in the migration, will overwrite the virual fts5 table with a conrete table with the same name. It took me hours to realize that is looking different in the DB Browser for SqLite, wondering why it is not working anymore.
Hope this will help you.

@radoslawkosinski
Copy link

Thanks, @Kalle4242. I looked at the Vahid's project and also your's suggestions in mind and I was able to create first working solution.
So, I have manual migration to create virtual tables and used FromSQLRaw for querying them, that seems to be working.
BTW, I'm using SQLIteStudio, it displays the letter 'V' when the table is vurtual, it seems that part is missing in DB Broeser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests