-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch / OpenSearch feature ✨💥 #9108
Conversation
Please tag as Do not merge! |
Massive PR. Awesome work. Need to review 😉 |
Thanks for this work, it may fix one of the remaining concerns when running in a distributed environment |
name: "Elastic.Search", | ||
areaName: "OrchardCore.Search.Elastic", | ||
pattern: "Search", | ||
defaults: new { controller = "Search", action = "Search" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is conflicting with the Lucene module, does it mean we shouldn't have both enabled at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move the Search feature out of the Lucene module instead and make it use an abstraction on the Search service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skrypt @sebastienros Perhaps evaluate, if we need to follow something like this
- OrchardCore.Search (All abstractions related to Search
- OrchardCore.Indexing (Abstractions for common indexing configurations)
- OrchardCore.Search.Queries (May need to do away with OrchardCore.Search??)
- OrchardCore.Search.Components (All Search based components/like drop downs)
- OrchardCore.Search.Lucene
- OrchardCore.Search.Elastic
- OrchardCore.Search.Azure
@Skrypt , please provide me the ContentTypes and Contents that you included for indexing. I'm able to do build and run MatchAll query perfectly. It seem it has something to do with ContentTypes or the fields included! |
I used TheBlogTheme recipe. I'm indexing Blog, BlogPost and Article content types. Seems like the issue is related with the FullText custom field. |
Trying to reproduce! A colleague of mine, also complained about it, i believe he was using the same theme, I'm on agency. Should be a quick fix. The FullText Field is anyway not required by Elastic, it has it's own way to support FullText Search. |
Right now I'm trying to test this but the ElasticSearch docker container takes half of my PC ram (around 7 gb). So debugging this is not the best experience. |
Ok I confirm that this works if I'm removing the FullText config on the Article Content Type and only allow this Content Type to be indexed in the index.
Please elaborate. Documentation link? |
src/OrchardCore.Modules/OrchardCore.Search.Elastic/OrchardCore.Search.Elastic.csproj
Outdated
Show resolved
Hide resolved
//else | ||
//{ | ||
// elasticDocument.Set(entry.Name, null); | ||
//} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that we can use an Exists query to reach the same goal if we remove these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me first do a RCA and figure out the best solution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that if we don't index a value for a Field or Part then it won't find any document when we do a wildcard query.
{
"query": {
"wildcard": {
"Article.TitlePart": {
"value": "*"
}
}
}
}
So ElasticSearch documentation recommends using a default null value for these.
https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html
case DocumentIndex.Types.Text: | ||
if (entry.Value != null && !String.IsNullOrEmpty(Convert.ToString(entry.Value))) | ||
{ | ||
elasticDocument.Set(entry.Name, Convert.ToString(entry.Value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to also use the DocumentIndexOptions to define if the field is Analyzed or Stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All text fields are analyzed by default in Elastic Search!
(https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html)
Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html
I just kept both of these for later implementations as noted in my first comment with PR. (Advanced Scenarios) ;-)
I think, we should
- first solid the basic functionalities, merge with Dev (More people can provide feedback)
- Define an architecture leading to abstraction for (Indexing, Querying, Search based components) [This is mandatory so that developers can swap search providers without needed to code)
- Build advanced features per provider on the go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I looked at these documentation pages and they strongly suggest not adding a Stored custom field because it is already in the _source field. This makes sense. Now, we need to see if we shouldn't do the same with the Lucene implementation so that it be consistent then. Though, from the documentation they say that they allow still to do it so if it's an option we might want to keep it still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skrypt , We must store with with Lucene as there is no way, to build a document when retrieved from Lucene.
Unfortunately, I had to work with Lucene, Solr and Elastic and Azure Search last 10 years at some point of time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skrypt for some reason, while debugging, i found that while added fields to Lucene everything was turned to "String" in OrcharcdCore.Lucene. I'm not sure if that changed, (I remember debugging OrchardCore.Lucene around 4 months back).
I might be wrong though! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you see this? Because If I'm looking at LuceneIndexManager we use Int32Field, DoubleField and StringField.
DateField, DateTimeField, TimeField are all indexed as String though.
But if you mean the doc.Add(new StringField(entry.Name, "NULL", store));
then it is fine because this is the default NULL
value to actually return all results from a wildcard query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Skrypt , I see your point!
But if you mean the doc.Add(new StringField(entry.Name, "NULL", store)); then it is fine because this is the default NULL value to actually return all results from a wildcard query.
Makes sense!
@jtkech |
Update from the Github on 11-04-2021
Now it is used to know which kind of index we are indexing (LuceneContentIndexSettings, ElasticContentIndexSettings) and also allows to have separate index settings per Indexing provider.
I used the following as body: {
"indexName":"search",
"parameters": "{'term':'explore','from':0,'size':2}",
"query": "{
'from': {{from}},
'size':{{size}},
'query':{
'bool': {
'must': [
{
'match': {
'Content.ContentItem.FullText': '{{ term }}'
}
},
{
'term': {
'Content.ContentItem.ContentType': 'BlogPost'
}
}
]
}
}
}"
} got exception. ----but anyway, we should add |
Yes, the SearchAsync method needs to be tested more. To test Queries I suggest you use them with a saved Elastic Query in the admin. |
parameterizedQuery on line of ElasticQuerySource.cs has not been used |
after Install ElasticSearch or OpenSearch with Docker compose, it is ok now. |
Ok, but as I said in last Tuesday's meeting. This PR is not ready yet. The only thing that works for now is the Admin Elastic Queries, everything else needs to be adjusted. I'm truncating some parts of this PR in others See : #10515 We need to fix the core abstractions first. That's where I'm at for now. After that, I'll get back to those issues. As for the line 68 I'd prefer that it works like it is and that the underlying method returns an empty object like we do in Lucene so that we be consistent. But need to take a look at this one later on. I'm pretty sure there is another issue somewhere in the query code itself. |
Could you get back to this, @Sen-Gupta? |
@Piedone This is more like a POC than a complete functional Pull Request. So, it works on the backend but everything we use on the frontend (method helpers) needs to be refactored correctly. This means also that we need to refactor how we implemented the search form module and move code around by creating an abstraction over it. So basically, nothing works right now. You can't use this PR to make queries where needed: the frontend ... So, I believe this will only be ready when it's fully tested. So, the first step would be to agree on how we should migrate old recipes to not break them. Else, we introduce this feature only for 2.0 with documentation about the breaking change and don't implement any backward compatibility. |
@Sen-Gupta This is where I'm at : #10515 |
@@ -0,0 +1,12 @@ | |||
@model OrchardCore.Search.Elastic.Settings.ElasticContentIndexSettingsViewModel | |||
|
|||
<h4>Elastic Search</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also is the visual distinction.
Moved to Orchard Core repository owned branch skrypt/elasticsearch |
Fixes #4316
How to use:
Install ElasticSearch or OpenSearch with Docker compose
OpenSearch Docker Compose file :
opensearch.txt
ElasticSearch Docker Compose file :
elasticsearch.txt
docker-compose -f opensearch.yml -p opensearch up
to deploy OpenSearch containers.docker-compose -f elasticsearch.yml -p elasticsearch up
to deploy ElasticSearch containers.Advice : don't remove these files from their folder if you want to remove all their containers at once later on in Docker desktop.
You should get this result in Docker Desktop app :
Set up ElasticSearch or OpenSearch in Orchard Core
Implementation details
Analyzed and Stored Properties are not very meaningful in context of ElasticSearch.
Analyzed
Analyzed is default for strings in Elastic Search.
By default all string fields are stored twice in elastic as "analyzed" and stored as "text" field type of elastic and again stored as is as in a "keyword" field type of elastic.
So we will have a field called ContentItemId(Text) analyzed and another called ContentItemId.Keyword(as is as) to match on exact values using TermQuery for fields like ContentItemId or emails (Elastic Stores text fields in 2 fields analyzed vs not analyzed, a field ContentItemId.Keyword is created automatically)
ElasticSearch documentation:
https://www.elastic.co/blog/strings-are-dead-long-live-strings
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html
Stored
Stored is really an overhead and only required if we are processing thousands of large documents.
By default Elastic will store the entire document into a field called _source and retrieves them when asked them from Index itself.
ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/7.12/search-fields.html
DSL Query Syntax
It is suggested to always use MatchQuery instead TermQuery for text fields in Elastic, where fully confident use (.Keyword) fields for exact match with TermQuery. (e.g. matching id, or fields like email, phone number, hostname)
ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
TODO