Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch / OpenSearch feature ✨💥 #9108

Closed
wants to merge 51 commits into from

Conversation

Sen-Gupta
Copy link

@Sen-Gupta Sen-Gupta commented Apr 9, 2021

Fixes #4316

How to use:

Install ElasticSearch or OpenSearch with Docker compose

OpenSearch Docker Compose file :

opensearch.txt

ElasticSearch Docker Compose file :

elasticsearch.txt

  • Copy these files in a folder named Docker somewhere safe.
  • Rename these files extension as .yml instead of .txt.
  • Open up a Terminal or Command Shell in this folder.
  • Execute docker-compose -f opensearch.yml -p opensearch up to deploy OpenSearch containers.
  • Wait for the containers to be fully created.
  • Stop the docker containers with CTRL+C in the command shell. This will stop the containers from running.
  • Execute docker-compose -f elasticsearch.yml -p elasticsearch up to deploy ElasticSearch containers.
  • Always execute only one or the other because they use the same external 9200 port.

Advice : don't remove these files from their folder if you want to remove all their containers at once later on in Docker desktop.

You should get this result in Docker Desktop app :

133547386-5af33f19-e5a1-426e-8f11-4739bcadc691

Set up ElasticSearch or OpenSearch in Orchard Core

  • Add Elastic Connection in the shell configuration (OrchardCore.Cms.Web appsettings.json file)
"OrchardCore_Elastic": {
    "Url": "http://localhost:9200"
}
  • Start an Orchard Core instance with VS Code debugger
  • Go to Orchard Core features, Enable ElasticSearch.

Implementation details

Analyzed and Stored Properties are not very meaningful in context of ElasticSearch.

Analyzed

Analyzed is default for strings in Elastic Search.
By default all string fields are stored twice in elastic as "analyzed" and stored as "text" field type of elastic and again stored as is as in a "keyword" field type of elastic.

So we will have a field called ContentItemId(Text) analyzed and another called ContentItemId.Keyword(as is as) to match on exact values using TermQuery for fields like ContentItemId or emails (Elastic Stores text fields in 2 fields analyzed vs not analyzed, a field ContentItemId.Keyword is created automatically)

ElasticSearch documentation:
https://www.elastic.co/blog/strings-are-dead-long-live-strings
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html

Stored

Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.

ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html

DSL Query Syntax

It is suggested to always use MatchQuery instead TermQuery for text fields in Elastic, where fully confident use (.Keyword) fields for exact match with TermQuery. (e.g. matching id, or fields like email, phone number, hostname)

ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

TODO

  • Fix Taxonomy Field indexing. (Added "Inherited" in IgnoredFields; may need a better fix). 🐛
  • Refactor Search form to support both Lucene and Elastic. 🚧
  • Restructure project names using OrchardCore.Search.X ♻️ 🚚
  • Add Advance Elastic Configuration. (Cluster connection strings) - Optional (Can be done in Kibana or OpenSearch Dashboards)
  • Add Advance Support for multilingual search. (Equivalent of Lucene Analyzers) ✨
  • Refactor UI to hide Analyzed/Stored in case of Elastic as Indexing Options for Fields. ♻️ 💄
  • Refactor ContentIndexSettings as IContentIndexSettings : See Refactor ContentIndexSettings to IContentIndexSettings ♻️ #10515
  • Module documentation. 📝

@dnfadmin
Copy link

dnfadmin commented Apr 9, 2021

CLA assistant check
All CLA requirements met.

@Sen-Gupta
Copy link
Author

Please tag as Do not merge!

@Skrypt
Copy link
Contributor

Skrypt commented Apr 9, 2021

Massive PR. Awesome work. Need to review 😉

@jtkech
Copy link
Member

jtkech commented Apr 9, 2021

@Sen-Gupta

Thanks for this work, it may fix one of the remaining concerns when running in a distributed environment

src/OrchardCore.Cms.Web/appsettings.json Show resolved Hide resolved
name: "Elastic.Search",
areaName: "OrchardCore.Search.Elastic",
pattern: "Search",
defaults: new { controller = "Search", action = "Search" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is conflicting with the Lucene module, does it mean we shouldn't have both enabled at the same time?

Copy link
Contributor

@Skrypt Skrypt Apr 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the Search feature out of the Lucene module instead and make it use an abstraction on the Search service.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt @sebastienros Perhaps evaluate, if we need to follow something like this

  1. OrchardCore.Search (All abstractions related to Search
  2. OrchardCore.Indexing (Abstractions for common indexing configurations)
  3. OrchardCore.Search.Queries (May need to do away with OrchardCore.Search??)
  4. OrchardCore.Search.Components (All Search based components/like drop downs)
  5. OrchardCore.Search.Lucene
  6. OrchardCore.Search.Elastic
  7. OrchardCore.Search.Azure

@Skrypt
Copy link
Contributor

Skrypt commented Apr 10, 2021

My first test of building an index gets me an error.

image

@Sen-Gupta
Copy link
Author

@Skrypt , please provide me the ContentTypes and Contents that you included for indexing. I'm able to do build and run MatchAll query perfectly. It seem it has something to do with ContentTypes or the fields included!

@Skrypt
Copy link
Contributor

Skrypt commented Apr 10, 2021

I used TheBlogTheme recipe. I'm indexing Blog, BlogPost and Article content types. Seems like the issue is related with the FullText custom field.

@Sen-Gupta
Copy link
Author

I used TheBlogTheme recipe. I'm indexing Blog, BlogPost and Article content types. Seems like the issue is related with the FullText custom field.

Trying to reproduce!

A colleague of mine, also complained about it, i believe he was using the same theme, I'm on agency.

Should be a quick fix. The FullText Field is anyway not required by Elastic, it has it's own way to support FullText Search.

@Skrypt
Copy link
Contributor

Skrypt commented Apr 10, 2021

Right now I'm trying to test this but the ElasticSearch docker container takes half of my PC ram (around 7 gb). So debugging this is not the best experience.

@Skrypt
Copy link
Contributor

Skrypt commented Apr 10, 2021

Ok I confirm that this works if I'm removing the FullText config on the Article Content Type and only allow this Content Type to be indexed in the index.

The FullText Field is anyway not required by Elastic, it has it's own way to support FullText Search.

Please elaborate. Documentation link?

//else
//{
// elasticDocument.Set(entry.Name, null);
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that we can use an Exists query to reach the same goal if we remove these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me first do a RCA and figure out the best solution!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that if we don't index a value for a Field or Part then it won't find any document when we do a wildcard query.

{
  "query": {
    "wildcard": {
      "Article.TitlePart": {
        "value": "*"
      }
    }
  }
}

So ElasticSearch documentation recommends using a default null value for these.

https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html

case DocumentIndex.Types.Text:
if (entry.Value != null && !String.IsNullOrEmpty(Convert.ToString(entry.Value)))
{
elasticDocument.Set(entry.Name, Convert.ToString(entry.Value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to also use the DocumentIndexOptions to define if the field is Analyzed or Stored.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt

All text fields are analyzed by default in Elastic Search!
(https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html)

Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html

I just kept both of these for later implementations as noted in my first comment with PR. (Advanced Scenarios) ;-)

I think, we should

  1. first solid the basic functionalities, merge with Dev (More people can provide feedback)
  2. Define an architecture leading to abstraction for (Indexing, Querying, Search based components) [This is mandatory so that developers can swap search providers without needed to code)
  3. Build advanced features per provider on the go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I looked at these documentation pages and they strongly suggest not adding a Stored custom field because it is already in the _source field. This makes sense. Now, we need to see if we shouldn't do the same with the Lucene implementation so that it be consistent then. Though, from the documentation they say that they allow still to do it so if it's an option we might want to keep it still.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt , We must store with with Lucene as there is no way, to build a document when retrieved from Lucene.
Unfortunately, I had to work with Lucene, Solr and Elastic and Azure Search last 10 years at some point of time!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt for some reason, while debugging, i found that while added fields to Lucene everything was turned to "String" in OrcharcdCore.Lucene. I'm not sure if that changed, (I remember debugging OrchardCore.Lucene around 4 months back).

I might be wrong though! ;-)

Copy link
Contributor

@Skrypt Skrypt Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see this? Because If I'm looking at LuceneIndexManager we use Int32Field, DoubleField and StringField.

DateField, DateTimeField, TimeField are all indexed as String though.

But if you mean the doc.Add(new StringField(entry.Name, "NULL", store)); then it is fine because this is the default NULL value to actually return all results from a wildcard query.

Copy link
Author

@Sen-Gupta Sen-Gupta Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skrypt , I see your point!

But if you mean the doc.Add(new StringField(entry.Name, "NULL", store)); then it is fine because this is the default NULL value to actually return all results from a wildcard query.

Makes sense!

@Sen-Gupta
Copy link
Author

Sen-Gupta commented Apr 11, 2021

@Sen-Gupta

Thanks for this work, it may fix one of the remaining concerns when running in a distributed environment

@jtkech
You guys have done some amazing work! I'm just helping myself! ;-)

Update from the Github on 11-04-2021
Now it is used to know which kind of index we are indexing (LuceneContentIndexSettings, ElasticContentIndexSettings) and also allows to have separate index settings per Indexing provider.
@Skrypt
Copy link
Contributor

Skrypt commented Oct 15, 2021

Fixed the settings. We have now separate sets of settings per indexing provider.
Removed the "Stored" and "Analyzed" from the Elastic Search settings as we discussed.

image

@Skrypt Skrypt added the breaking change 💥 Issues or pull requests that introduces breaking change(s) label Oct 15, 2021
@Skrypt Skrypt changed the title Sen/elasticsearch Elasticsearch / OpenSearch feature ✨ Oct 15, 2021
@Skrypt Skrypt changed the title Elasticsearch / OpenSearch feature ✨ Elasticsearch / OpenSearch feature ✨ 💥 Oct 15, 2021
@Skrypt Skrypt changed the title Elasticsearch / OpenSearch feature ✨ 💥 Elasticsearch / OpenSearch feature ✨💥 Oct 15, 2021
@infofromca
Copy link
Contributor

infofromca commented Oct 23, 2021

I used the following as body:

{
"indexName":"search",
"parameters": "{'term':'explore','from':0,'size':2}",
"query": "{
        'from': {{from}},
        'size':{{size}},
		'query':{
                'bool': {
                'must': [
                    {
                    'match': {
                        'Content.ContentItem.FullText': '{{ term }}'
                    }
                    },
                    {
                    'term': {
                        'Content.ContentItem.ContentType': 'BlogPost'
                    }
                    }
                ]
                }
            }
  }"
}

got exception.
maybe the the body is not correct?
---another question: the body is correct or not? the same body is good for calling lucene api. if both are used diff. body, I think it will confuse dev.
----maybe I did not Install ElasticSearch or OpenSearch with Docker compose, (but can we install it with OC together?), I will install it and try it again.

----but anyway, we should add
if (var elasticSearchResult !=null) under line 68 of ElasticQuerySource.cs

elastic-ex-place
elastic-result

@infofromca
Copy link
Contributor

I clicked Match All query under Run Elastic Query, it gave me Lucene .... page title.
then when I returned to the same page, it gave me
image

@Skrypt
Copy link
Contributor

Skrypt commented Oct 23, 2021

Yes, the SearchAsync method needs to be tested more. To test Queries I suggest you use them with a saved Elastic Query in the admin.

@infofromca
Copy link
Contributor

parameterizedQuery on line of ElasticQuerySource.cs has not been used

@infofromca
Copy link
Contributor

after Install ElasticSearch or OpenSearch with Docker compose, it is ok now.
for strong code, we still need
if (var elasticSearchResult !=null) under line 68 of ElasticQuerySource.cs

@infofromca
Copy link
Contributor

Count is not correct.
I got 4 items, but count is 0
image

@Skrypt
Copy link
Contributor

Skrypt commented Oct 24, 2021

Ok, but as I said in last Tuesday's meeting. This PR is not ready yet. The only thing that works for now is the Admin Elastic Queries, everything else needs to be adjusted. I'm truncating some parts of this PR in others

See : #10515

We need to fix the core abstractions first. That's where I'm at for now. After that, I'll get back to those issues.

As for the line 68 I'd prefer that it works like it is and that the underlying method returns an empty object like we do in Lucene so that we be consistent. But need to take a look at this one later on. I'm pretty sure there is another issue somewhere in the query code itself.

@Piedone
Copy link
Member

Piedone commented Dec 10, 2021

Could you get back to this, @Sen-Gupta?

@Skrypt
Copy link
Contributor

Skrypt commented Dec 10, 2021

@Piedone This is more like a POC than a complete functional Pull Request. So, it works on the backend but everything we use on the frontend (method helpers) needs to be refactored correctly.

This means also that we need to refactor how we implemented the search form module and move code around by creating an abstraction over it.

So basically, nothing works right now. You can't use this PR to make queries where needed: the frontend ...
This is mainly why @Sen-Gupta asked us if we should refactor the Indexing projects and also rename them which will cause breaking changes.

So, I believe this will only be ready when it's fully tested.
The only part missing for this PR to be totally independent of the OC project is about moving ContentIndexSettings to IContentIndexSettings which we got into discussions about how we should migrate these settings to stay backward compatible and we never agreed on a solution.

So, the first step would be to agree on how we should migrate old recipes to not break them. Else, we introduce this feature only for 2.0 with documentation about the breaking change and don't implement any backward compatibility.

@Sen-Gupta
Copy link
Author

@Skrypt @Piedone I believe, we should look at holistically about refactoring our current search to be provider based model based on Interfaces and rename projects as suggested at very beginning of this thread!

It may delay a little but it goes a long way!!

@Skrypt
Copy link
Contributor

Skrypt commented Dec 17, 2021

@Sen-Gupta This is where I'm at : #10515
To support multiple indexing provider settings in Orchard Core. That's the only part the ElasticSearch module is dependent on in Orchard Core so far. Refactoring the search module as provider-based can be done in this PR in parallel without issues.

@@ -0,0 +1,12 @@
@model OrchardCore.Search.Elastic.Settings.ElasticContentIndexSettingsViewModel

<h4>Elastic Search</h4>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also is the visual distinction.

@Skrypt
Copy link
Contributor

Skrypt commented Jan 22, 2022

Moved to Orchard Core repository owned branch skrypt/elasticsearch

@Skrypt Skrypt closed this Jan 22, 2022
@Skrypt Skrypt modified the milestones: 1.x, 1.5 Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change 💥 Issues or pull requests that introduces breaking change(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ElasticSearch module