Elasticsearch / OpenSearch feature ✨💥 #9108

Sen-Gupta · 2021-04-09T13:14:50Z

Fixes #4316

How to use:

Install ElasticSearch or OpenSearch with Docker compose

OpenSearch Docker Compose file :

opensearch.txt

ElasticSearch Docker Compose file :

elasticsearch.txt

Copy these files in a folder named Docker somewhere safe.
Rename these files extension as .yml instead of .txt.
Open up a Terminal or Command Shell in this folder.
Execute docker-compose -f opensearch.yml -p opensearch up to deploy OpenSearch containers.
Wait for the containers to be fully created.
Stop the docker containers with CTRL+C in the command shell. This will stop the containers from running.
Execute docker-compose -f elasticsearch.yml -p elasticsearch up to deploy ElasticSearch containers.
Always execute only one or the other because they use the same external 9200 port.

Advice : don't remove these files from their folder if you want to remove all their containers at once later on in Docker desktop.

You should get this result in Docker Desktop app :

Set up ElasticSearch or OpenSearch in Orchard Core

Add Elastic Connection in the shell configuration (OrchardCore.Cms.Web appsettings.json file)

"OrchardCore_Elastic": {
    "Url": "http://localhost:9200"
}

Start an Orchard Core instance with VS Code debugger
Go to Orchard Core features, Enable ElasticSearch.

Implementation details

Analyzed and Stored Properties are not very meaningful in context of ElasticSearch.

Analyzed

Analyzed is default for strings in Elastic Search.
By default all string fields are stored twice in elastic as "analyzed" and stored as "text" field type of elastic and again stored as is as in a "keyword" field type of elastic.

So we will have a field called ContentItemId(Text) analyzed and another called ContentItemId.Keyword(as is as) to match on exact values using TermQuery for fields like ContentItemId or emails (Elastic Stores text fields in 2 fields analyzed vs not analyzed, a field ContentItemId.Keyword is created automatically)

ElasticSearch documentation:
https://www.elastic.co/blog/strings-are-dead-long-live-strings
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html

Stored

Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.

ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html

DSL Query Syntax

It is suggested to always use MatchQuery instead TermQuery for text fields in Elastic, where fully confident use (.Keyword) fields for exact match with TermQuery. (e.g. matching id, or fields like email, phone number, hostname)

ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

TODO

Fix Taxonomy Field indexing. (Added "Inherited" in IgnoredFields; may need a better fix). 🐛
Refactor Search form to support both Lucene and Elastic. 🚧
Restructure project names using OrchardCore.Search.X ♻️ 🚚
Add Advance Elastic Configuration. (Cluster connection strings) - Optional (Can be done in Kibana or OpenSearch Dashboards)
Add Advance Support for multilingual search. (Equivalent of Lucene Analyzers) ✨
Refactor UI to hide Analyzed/Stored in case of Elastic as Indexing Options for Fields. ♻️ 💄
Refactor ContentIndexSettings as IContentIndexSettings : See Refactor ContentIndexSettings to IContentIndexSettings ♻️ #10515
Module documentation. 📝

dnfadmin · 2021-04-09T13:15:03Z

All CLA requirements met.

Sen-Gupta · 2021-04-09T13:32:49Z

Please tag as Do not merge!

Skrypt · 2021-04-09T15:13:52Z

Massive PR. Awesome work. Need to review 😉

jtkech · 2021-04-09T21:16:10Z

@Sen-Gupta

Thanks for this work, it may fix one of the remaining concerns when running in a distributed environment

src/OrchardCore.Cms.Web/appsettings.json

sebastienros · 2021-04-09T21:25:41Z

src/OrchardCore.Modules/OrchardCore.Search.Elastic/Startup.cs

+ name: "Elastic.Search",
+ areaName: "OrchardCore.Search.Elastic",
+ pattern: "Search",
+ defaults: new { controller = "Search", action = "Search" }


This is conflicting with the Lucene module, does it mean we shouldn't have both enabled at the same time?

I think we should move the Search feature out of the Lucene module instead and make it use an abstraction on the Search service.

@Skrypt @sebastienros Perhaps evaluate, if we need to follow something like this

OrchardCore.Search (All abstractions related to Search

OrchardCore.Indexing (Abstractions for common indexing configurations)

OrchardCore.Search.Queries (May need to do away with OrchardCore.Search??)

OrchardCore.Search.Components (All Search based components/like drop downs)

OrchardCore.Search.Lucene

OrchardCore.Search.Elastic

OrchardCore.Search.Azure

Skrypt · 2021-04-10T01:54:47Z

My first test of building an index gets me an error.

Sen-Gupta · 2021-04-10T03:39:47Z

@Skrypt , please provide me the ContentTypes and Contents that you included for indexing. I'm able to do build and run MatchAll query perfectly. It seem it has something to do with ContentTypes or the fields included!

…nts for Example

Skrypt · 2021-04-10T03:49:45Z

I used TheBlogTheme recipe. I'm indexing Blog, BlogPost and Article content types. Seems like the issue is related with the FullText custom field.

Sen-Gupta · 2021-04-10T03:51:55Z

I used TheBlogTheme recipe. I'm indexing Blog, BlogPost and Article content types. Seems like the issue is related with the FullText custom field.

Trying to reproduce!

A colleague of mine, also complained about it, i believe he was using the same theme, I'm on agency.

Should be a quick fix. The FullText Field is anyway not required by Elastic, it has it's own way to support FullText Search.

Skrypt · 2021-04-10T03:54:45Z

Right now I'm trying to test this but the ElasticSearch docker container takes half of my PC ram (around 7 gb). So debugging this is not the best experience.

Skrypt · 2021-04-10T04:01:54Z

Ok I confirm that this works if I'm removing the FullText config on the Article Content Type and only allow this Content Type to be indexed in the index.

The FullText Field is anyway not required by Elastic, it has it's own way to support FullText Search.

Please elaborate. Documentation link?

src/OrchardCore.Modules/OrchardCore.Search.Elastic/OrchardCore.Search.Elastic.csproj

Skrypt · 2021-04-10T04:11:14Z

src/OrchardCore.Modules/OrchardCore.Search.Elastic/Services/ElasticIndexManager.cs

+ //else
+ //{
+ // elasticDocument.Set(entry.Name, null);
+ //}


Can you confirm that we can use an Exists query to reach the same goal if we remove these?

Let me first do a RCA and figure out the best solution!

The issue is that if we don't index a value for a Field or Part then it won't find any document when we do a wildcard query.

{ "query": { "wildcard": { "Article.TitlePart": { "value": "*" } } } }

So ElasticSearch documentation recommends using a default null value for these.

https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html

Skrypt · 2021-04-10T05:09:53Z

src/OrchardCore.Modules/OrchardCore.Search.Elastic/Services/ElasticIndexManager.cs

+ case DocumentIndex.Types.Text:
+ if (entry.Value != null && !String.IsNullOrEmpty(Convert.ToString(entry.Value)))
+ {
+ elasticDocument.Set(entry.Name, Convert.ToString(entry.Value));


This needs to also use the DocumentIndexOptions to define if the field is Analyzed or Stored.

@Skrypt

All text fields are analyzed by default in Elastic Search!
(https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html)

Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html

I just kept both of these for later implementations as noted in my first comment with PR. (Advanced Scenarios) ;-)

I think, we should

first solid the basic functionalities, merge with Dev (More people can provide feedback)

Define an architecture leading to abstraction for (Indexing, Querying, Search based components) [This is mandatory so that developers can swap search providers without needed to code)

Build advanced features per provider on the go.

Ok, I looked at these documentation pages and they strongly suggest not adding a Stored custom field because it is already in the _source field. This makes sense. Now, we need to see if we shouldn't do the same with the Lucene implementation so that it be consistent then. Though, from the documentation they say that they allow still to do it so if it's an option we might want to keep it still.

@Skrypt , We must store with with Lucene as there is no way, to build a document when retrieved from Lucene.
Unfortunately, I had to work with Lucene, Solr and Elastic and Azure Search last 10 years at some point of time!

@Skrypt for some reason, while debugging, i found that while added fields to Lucene everything was turned to "String" in OrcharcdCore.Lucene. I'm not sure if that changed, (I remember debugging OrchardCore.Lucene around 4 months back).

I might be wrong though! ;-)

Where do you see this? Because If I'm looking at LuceneIndexManager we use Int32Field, DoubleField and StringField.

DateField, DateTimeField, TimeField are all indexed as String though.

But if you mean the doc.Add(new StringField(entry.Name, "NULL", store)); then it is fine because this is the default NULL value to actually return all results from a wildcard query.

@Skrypt , I see your point!

But if you mean the doc.Add(new StringField(entry.Name, "NULL", store)); then it is fine because this is the default NULL value to actually return all results from a wildcard query.

Makes sense!

Sen-Gupta · 2021-04-11T12:28:58Z

@Sen-Gupta

Thanks for this work, it may fix one of the remaining concerns when running in a distributed environment

@jtkech
You guys have done some amazing work! I'm just helping myself! ;-)

Update from the Github on 11-04-2021

Now it is used to know which kind of index we are indexing (LuceneContentIndexSettings, ElasticContentIndexSettings) and also allows to have separate index settings per Indexing provider.

Skrypt · 2021-10-15T04:52:58Z

Fixed the settings. We have now separate sets of settings per indexing provider.
Removed the "Stored" and "Analyzed" from the Elastic Search settings as we discussed.

infofromca · 2021-10-23T15:36:02Z

I used the following as body:

{
"indexName":"search",
"parameters": "{'term':'explore','from':0,'size':2}",
"query": "{
        'from': {{from}},
        'size':{{size}},
		'query':{
                'bool': {
                'must': [
                    {
                    'match': {
                        'Content.ContentItem.FullText': '{{ term }}'
                    }
                    },
                    {
                    'term': {
                        'Content.ContentItem.ContentType': 'BlogPost'
                    }
                    }
                ]
                }
            }
  }"
}

got exception.
maybe the the body is not correct?
---another question: the body is correct or not? the same body is good for calling lucene api. if both are used diff. body, I think it will confuse dev.
----maybe I did not Install ElasticSearch or OpenSearch with Docker compose, (but can we install it with OC together?), I will install it and try it again.

----but anyway, we should add
if (var elasticSearchResult !=null) under line 68 of ElasticQuerySource.cs

infofromca · 2021-10-23T16:07:12Z

I clicked Match All query under Run Elastic Query, it gave me Lucene .... page title.
then when I returned to the same page, it gave me

Skrypt · 2021-10-23T16:39:46Z

Yes, the SearchAsync method needs to be tested more. To test Queries I suggest you use them with a saved Elastic Query in the admin.

infofromca · 2021-10-24T02:04:59Z

parameterizedQuery on line of ElasticQuerySource.cs has not been used

infofromca · 2021-10-24T02:06:40Z

after Install ElasticSearch or OpenSearch with Docker compose, it is ok now.
for strong code, we still need
if (var elasticSearchResult !=null) under line 68 of ElasticQuerySource.cs

infofromca · 2021-10-24T03:01:16Z

Count is not correct.
I got 4 items, but count is 0

Skrypt · 2021-10-24T03:34:31Z

Ok, but as I said in last Tuesday's meeting. This PR is not ready yet. The only thing that works for now is the Admin Elastic Queries, everything else needs to be adjusted. I'm truncating some parts of this PR in others

See : #10515

We need to fix the core abstractions first. That's where I'm at for now. After that, I'll get back to those issues.

As for the line 68 I'd prefer that it works like it is and that the underlying method returns an empty object like we do in Lucene so that we be consistent. But need to take a look at this one later on. I'm pretty sure there is another issue somewhere in the query code itself.

Piedone · 2021-12-10T00:56:10Z

Could you get back to this, @Sen-Gupta?

Skrypt · 2021-12-10T01:57:44Z

@Piedone This is more like a POC than a complete functional Pull Request. So, it works on the backend but everything we use on the frontend (method helpers) needs to be refactored correctly.

This means also that we need to refactor how we implemented the search form module and move code around by creating an abstraction over it.

So basically, nothing works right now. You can't use this PR to make queries where needed: the frontend ...
This is mainly why @Sen-Gupta asked us if we should refactor the Indexing projects and also rename them which will cause breaking changes.

So, I believe this will only be ready when it's fully tested.
The only part missing for this PR to be totally independent of the OC project is about moving ContentIndexSettings to IContentIndexSettings which we got into discussions about how we should migrate these settings to stay backward compatible and we never agreed on a solution.

So, the first step would be to agree on how we should migrate old recipes to not break them. Else, we introduce this feature only for 2.0 with documentation about the breaking change and don't implement any backward compatibility.

Sen-Gupta · 2021-12-17T15:59:38Z

@Skrypt @Piedone I believe, we should look at holistically about refactoring our current search to be provider based model based on Interfaces and rename projects as suggested at very beginning of this thread!

It may delay a little but it goes a long way!!

Skrypt · 2021-12-17T16:30:12Z

@Sen-Gupta This is where I'm at : #10515
To support multiple indexing provider settings in Orchard Core. That's the only part the ElasticSearch module is dependent on in Orchard Core so far. Refactoring the search module as provider-based can be done in this PR in parallel without issues.

Skrypt · 2022-01-20T20:36:30Z

...OrchardCore.Modules/OrchardCore.Search.Elastic/Views/ElasticContentIndexSettings.Edit.cshtml

@@ -0,0 +1,12 @@
+@model OrchardCore.Search.Elastic.Settings.ElasticContentIndexSettingsViewModel
+
+<h4>Elastic Search</h4>


Here also is the visual distinction.

Skrypt · 2022-01-22T00:03:16Z

Moved to Orchard Core repository owned branch skrypt/elasticsearch

Sen-Gupta added 8 commits April 8, 2021 12:08

Rollback point: QueryProviders

d47002f

Added Elastic Search Core

459123f

Renamed the Project to avoid conflicts with Lucene

5dde4e3

Added OrchardCore style configuration for connection

90d1043

Added support for Elastic

b2f8b48

Added it to Core Targets

6199605

Scoped Index to tenant

3f946a4

Updated the names and path for Areas

20fd786

Sen-Gupta mentioned this pull request Apr 9, 2021

Implement ElasticSearch module #4316

Closed

hishamco added the don't merge label Apr 9, 2021

sebastienros reviewed Apr 9, 2021

View reviewed changes

Comments the accidental push of the Elastic Connection, Keep as comme…

7a8e092

…nts for Example

Skrypt reviewed Apr 10, 2021

View reviewed changes

src/OrchardCore.Modules/OrchardCore.Search.Elastic/OrchardCore.Search.Elastic.csproj Outdated Show resolved Hide resolved

Skrypt reviewed Apr 10, 2021

View reviewed changes

Skrypt and others added 3 commits April 10, 2021 02:50

WIP analyzers

ffacb7e

Fixed Managed Reference Warning issue

c97ee82

Remove Serilog

19d30e6

Merge pull request #1 from OrchardCMS/dev

df68e6a

Update from the Github on 11-04-2021

Refactor usage of ContentIndexSettings as IContentIndexSettings

56e0c64

Now it is used to know which kind of index we are indexing (LuceneContentIndexSettings, ElasticContentIndexSettings) and also allows to have separate index settings per Indexing provider.

Skrypt added 3 commits October 15, 2021 01:04

Admin page titles

80df10a

Fix recipes and Unit Tests

48796c4

Fix startup

dbf07be

Skrypt added the breaking change 💥 Issues or pull requests that introduces breaking change(s) label Oct 15, 2021

Skrypt changed the title ~~Sen/elasticsearch~~ Elasticsearch / OpenSearch feature ✨ Oct 15, 2021

Skrypt changed the title ~~Elasticsearch / OpenSearch feature ✨~~ Elasticsearch / OpenSearch feature ✨ 💥 Oct 15, 2021

Skrypt changed the title ~~Elasticsearch / OpenSearch feature ✨ 💥~~ Elasticsearch / OpenSearch feature ✨💥 Oct 15, 2021

Skrypt added 4 commits October 15, 2021 10:38

Cleanup ⚰️

9b648c7

Cleanup

aef4fce

Fix settings groupId's

957faed

Fix build

466b7cf

Skrypt added the notready label Oct 24, 2021

Skrypt reviewed Jan 20, 2022

View reviewed changes

Skrypt closed this Jan 22, 2022

Skrypt modified the milestones: 1.x, 1.5 Nov 4, 2022

		@@ -0,0 +1,12 @@
		@model OrchardCore.Search.Elastic.Settings.ElasticContentIndexSettingsViewModel

		<h4>Elastic Search</h4>

Elasticsearch / OpenSearch feature ✨💥 #9108

Elasticsearch / OpenSearch feature ✨💥 #9108

Conversation

Sen-Gupta commented Apr 9, 2021 • edited by Skrypt Loading

How to use:

Install ElasticSearch or OpenSearch with Docker compose

Set up ElasticSearch or OpenSearch in Orchard Core

Implementation details

Analyzed

Stored

DSL Query Syntax

TODO

dnfadmin commented Apr 9, 2021 • edited Loading

Sen-Gupta commented Apr 9, 2021

Skrypt commented Apr 9, 2021

jtkech commented Apr 9, 2021

Choose a reason for hiding this comment

Skrypt Apr 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skrypt commented Apr 10, 2021

Sen-Gupta commented Apr 10, 2021

Skrypt commented Apr 10, 2021

Sen-Gupta commented Apr 10, 2021

Skrypt commented Apr 10, 2021

Skrypt commented Apr 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Skrypt Apr 13, 2021 • edited Loading

Choose a reason for hiding this comment

Sen-Gupta Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

Sen-Gupta commented Apr 11, 2021 • edited Loading

Skrypt commented Oct 15, 2021 • edited Loading

infofromca commented Oct 23, 2021 • edited by Skrypt Loading

infofromca commented Oct 23, 2021

Skrypt commented Oct 23, 2021

infofromca commented Oct 24, 2021

infofromca commented Oct 24, 2021

infofromca commented Oct 24, 2021

Skrypt commented Oct 24, 2021 • edited Loading

Piedone commented Dec 10, 2021

Skrypt commented Dec 10, 2021 • edited Loading

Sen-Gupta commented Dec 17, 2021

Skrypt commented Dec 17, 2021

Choose a reason for hiding this comment

Skrypt commented Jan 22, 2022

Sen-Gupta commented Apr 9, 2021 •

edited by Skrypt

Loading

dnfadmin commented Apr 9, 2021 •

edited

Loading

Skrypt Apr 10, 2021 •

edited

Loading

Skrypt commented Apr 10, 2021 •

edited

Loading

Skrypt Apr 13, 2021 •

edited

Loading

Sen-Gupta Apr 15, 2021 •

edited

Loading

Sen-Gupta commented Apr 11, 2021 •

edited

Loading

Skrypt commented Oct 15, 2021 •

edited

Loading

infofromca commented Oct 23, 2021 •

edited by Skrypt

Loading

Skrypt commented Oct 24, 2021 •

edited

Loading

Skrypt commented Dec 10, 2021 •

edited

Loading