Skip to content

SearchAnalyzer is not set during field mapping. #8499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alkampfergit opened this issue Apr 17, 2025 · 5 comments
Closed

SearchAnalyzer is not set during field mapping. #8499

alkampfergit opened this issue Apr 17, 2025 · 5 comments
Labels
8.x Relates to a 8.x client version Category: Bug

Comments

@alkampfergit
Copy link

alkampfergit commented Apr 17, 2025

Elastic.Clients.Elasticsearch version: 8.17.4

Elasticsearch version: tried on both: 8.13.0 and 8.18.0

.NET runtime version: .NET 8

Operating system version: Windows 11

Description of the problem including expected versus actual behavior:

I'm moving from NEST for elastic7 to the new driver. I'm mapping a field with this code

        mapping.Properties["securityTokens"] = new TextProperty()
        {
            Analyzer = "not_analyzed_lowercase",
            SearchAnalyzer = "not_analyzed_lowercase",
        };

But the SearchAnalyzer settings seems to be missing, actually I've a unit test that read the mapping from the index to verify that everything is correct and SearchAnalyzer settings is null.

Expected behavior
SearchAnalyzer should be set correctly on index mapping. I've verified using the _mapping endpoint that the mapping is incorrect.

Image

@alkampfergit alkampfergit added 8.x Relates to a 8.x client version Category: Bug labels Apr 17, 2025
@flobernd
Copy link
Member

Hi @alkampfergit,

this is a weird one. Could you please post the JSON request that is made by the client?

You can inspect the response in the debugger and check the ApiCallDetails for that purpose.

@alkampfergit
Copy link
Author

Mapping is done with a call to this function (this is a unit test that aim is to check our compatibility with the driver, actually we are using NEST for version 2 of elastic, NEST for version 7 and we are adding version 8, yes we have customers with all three versions and we must be able to still use up to elastic 2 :) )

 await _elasticClient.Indices.CreateAsync

This is the full dump of the call. As you can see securityTokens has both analyzer and search analyzer.

Valid Elasticsearch response built from a successful (200) low level call on PUT: /test0b54a5cb9c232e2b95b5bf48784efe4121d5e64d-catalog-indexer_16?pretty=true

# Audit trail of this API call:
 - [1] HealthyResponse: Node: http://localhost:9800/ Took: 00:00:00.3328589
# Request:
{
  "mappings": {
    "dynamic_templates": [
      {
        "StringProperties": {
          "match": "s_*",
          "mapping": {
            "analyzer": "omnisearch_string_props",
            "fields": {
              "na": {
                "analyzer": "not_analyzed_lowercase",
                "type": "text"
              },
              "raw": {
                "type": "keyword"
              },
              "nan": {
                "normalizer": "lowercase",
                "type": "keyword"
              }
            },
            "type": "text"
          }
        }
      },
      {
        "NumericProperties": {
          "match": "n_*",
          "mapping": {
            "type": "double"
          }
        }
      },
      {
        "DateProperties": {
          "match": "d_*",
          "mapping": {
            "type": "date"
          }
        }
      },
      {
        "dense_vector_1536": {
          "match": "v1536_*",
          "mapping": {
            "dims": 1536,
            "element_type": "float",
            "index": true,
            "similarity": "dot_product",
            "type": "dense_vector"
          }
        }
      },
      {
        "dense_vector_3072": {
          "match": "v3072_*",
          "mapping": {
            "dims": 3072,
            "element_type": "float",
            "index": true,
            "similarity": "dot_product",
            "type": "dense_vector"
          }
        }
      }
    ],
    "properties": {
      "title": {
        "normalizer": "lowercase",
        "type": "keyword"
      },
      "type": {
        "type": "keyword"
      },
      "checkpointToken": {
        "type": "long"
      },
      "secondaryUpdateToken": {
        "type": "long"
      },
      "lastUpdated": {
        "type": "date"
      },
      "deleted": {
        "store": false,
        "type": "boolean"
      },
      "unsercured": {
        "type": "boolean"
      },
      "offline": {
        "type": "boolean"
      },
      "index": {
        "type": "keyword"
      },
      "ngrammed": {
        "analyzer": "trigram_standard",
        "type": "text"
      },
      "payload": {
        "index": false,
        "type": "keyword"
      },
      "securityTokens": {
        "analyzer": "not_analyzed_lowercase",
        "search_analyzer": "not_analyzed_lowercase",
        "type": "text"
      },
      "mainSearch": {
        "analyzer": "omnisearch_mainsearch",
        "fields": {
          "edge_n_gram": {
            "analyzer": "edge_ngram_standard_analyzer",
            "norms": false,
            "search_analyzer": "omnisearch_mainsearch",
            "type": "text"
          },
          "raw": {
            "normalizer": "lowercase",
            "type": "keyword"
          },
          "na": {
            "analyzer": "not_analyzed_lowercase",
            "type": "text"
          },
          "std": {
            "analyzer": "standard",
            "type": "text"
          }
        },
        "type": "text"
      },
      "mainsearch_it": {
        "analyzer": "italian",
        "type": "text"
      },
      "mainsearch_en": {
        "analyzer": "english",
        "type": "text"
      },
      "mainsearch_de": {
        "analyzer": "german",
        "type": "text"
      },
      "mainsearch_ru": {
        "analyzer": "russian",
        "type": "text"
      },
      "fulltext_it": {
        "analyzer": "italian",
        "type": "text"
      },
      "fulltext_en": {
        "analyzer": "english",
        "type": "text"
      },
      "fulltext_de": {
        "analyzer": "german",
        "type": "text"
      },
      "fulltext_ru": {
        "analyzer": "russian",
        "type": "text"
      },
      "nested": {
        "properties": {
          "name": {
            "analyzer": "not_analyzed_lowercase",
            "type": "text"
          },
          "depth": {
            "type": "integer"
          },
          "path": {
            "analyzer": "omnisearch_path_analyzer",
            "fields": {
              "na": {
                "analyzer": "not_analyzed_lowercase",
                "type": "text"
              }
            },
            "type": "text"
          },
          "svalue": {
            "fields": {
              "na": {
                "analyzer": "not_analyzed_lowercase",
                "type": "text"
              }
            },
            "type": "keyword"
          },
          "nvalue": {
            "type": "double"
          },
          "dvalue": {
            "type": "date"
          }
        },
        "type": "nested"
      },
      "internalData": {
        "index": false,
        "store": true,
        "type": "text"
      },
      "relatedIds": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_analyzer": {
          "type": "standard"
        },
        "omnisearch_path_analyzer": {
          "filter": "lowercase_filter",
          "tokenizer": "jarvis_path_tokenizer",
          "type": "custom"
        },
        "not_analyzed_lowercase": {
          "filter": [
            "lowercase_filter",
            "asciifolding"
          ],
          "tokenizer": "keyword_tokenizer",
          "type": "custom"
        },
        "omnisearch_mainsearch": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "tokenizer": "standard",
          "type": "custom"
        },
        "omnisearch_string_props": {
          "filter": [
            "lowercase",
            "asciifolding"
          ],
          "tokenizer": "standard",
          "type": "custom"
        },
        "edge_ngram_standard_analyzer": {
          "filter": [
            "lowercase_filter",
            "asciifolding",
            "edge_ngram_filter_standard"
          ],
          "tokenizer": "standard",
          "type": "custom"
        },
        "trigram_standard": {
          "filter": "lowercase_filter",
          "tokenizer": "trigram_tokenizer",
          "type": "custom"
        },
        "omni_property_indexTime": {
          "filter": "lowercase",
          "tokenizer": "standard",
          "type": "custom"
        }
      },
      "filter": {
        "lowercase_filter": {
          "type": "lowercase"
        },
        "edge_ngram_filter_standard": {
          "max_gram": 15,
          "min_gram": 2,
          "type": "edge_ngram"
        },
        "trim_zero_chars": {
          "max": 100,
          "min": 1,
          "type": "length"
        }
      },
      "tokenizer": {
        "jarvis_path_tokenizer": {
          "delimiter": "/",
          "type": "path_hierarchy"
        },
        "keyword_tokenizer": {
          "type": "keyword"
        },
        "edge_ngram_tokenizer": {
          "max_gram": 10,
          "min_gram": 3,
          "type": "edge_ngram"
        },
        "trigram_tokenizer": {
          "max_gram": 3,
          "min_gram": 3,
          "type": "ngram"
        },
        "non_ascii_and_space_split_lowercase_tokenizer": {
          "flags": "CASE_INSENSITIVE|MULTILINE",
          "group": -1,
          "pattern": "(?\u003C=[^\\p{ASCII}]|\\s)",
          "type": "pattern"
        }
      }
    },
    "number_of_replicas": 1,
    "number_of_shards": 1
  }
}
# Response:
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test0b54a5cb9c232e2b95b5bf48784efe4121d5e64d-catalog-indexer_16"
}

Then doing the classic mapping requestl

http://localhost:9800/test0b54a5cb9c232e2b95b5bf48784efe4121d5e64d-catalog-indexer_16/_mapping

I got this response

{
  "test0b54a5cb9c232e2b95b5bf48784efe4121d5e64d-catalog-indexer_16": {
    "mappings": {
      "dynamic_templates": [
        {
          "StringProperties": {
            "match": "s_*",
            "mapping": {
              "analyzer": "omnisearch_string_props",
              "fields": {
                "na": {
                  "analyzer": "not_analyzed_lowercase",
                  "type": "text"
                },
                "raw": {
                  "type": "keyword"
                },
                "nan": {
                  "normalizer": "lowercase",
                  "type": "keyword"
                }
              },
              "type": "text"
            }
          }
        },
        {
          "NumericProperties": {
            "match": "n_*",
            "mapping": {
              "type": "double"
            }
          }
        },
        {
          "DateProperties": {
            "match": "d_*",
            "mapping": {
              "type": "date"
            }
          }
        },
        {
          "dense_vector_1536": {
            "match": "v1536_*",
            "mapping": {
              "dims": 1536,
              "element_type": "float",
              "index": true,
              "similarity": "dot_product",
              "type": "dense_vector"
            }
          }
        },
        {
          "dense_vector_3072": {
            "match": "v3072_*",
            "mapping": {
              "dims": 3072,
              "element_type": "float",
              "index": true,
              "similarity": "dot_product",
              "type": "dense_vector"
            }
          }
        }
      ],
      "properties": {
        "checkpointToken": {
          "type": "long"
        },
        "deleted": {
          "type": "boolean"
        },
        "fulltext_de": {
          "type": "text",
          "analyzer": "german"
        },
        "fulltext_en": {
          "type": "text",
          "analyzer": "english"
        },
        "fulltext_it": {
          "type": "text",
          "analyzer": "italian"
        },
        "fulltext_ru": {
          "type": "text",
          "analyzer": "russian"
        },
        "index": {
          "type": "keyword"
        },
        "internalData": {
          "type": "text",
          "index": false,
          "store": true
        },
        "lastUpdated": {
          "type": "date"
        },
        "mainSearch": {
          "type": "text",
          "fields": {
            "edge_n_gram": {
              "type": "text",
              "norms": false,
              "analyzer": "edge_ngram_standard_analyzer",
              "search_analyzer": "omnisearch_mainsearch"
            },
            "na": {
              "type": "text",
              "analyzer": "not_analyzed_lowercase"
            },
            "raw": {
              "type": "keyword",
              "normalizer": "lowercase"
            },
            "std": {
              "type": "text",
              "analyzer": "standard"
            }
          },
          "analyzer": "omnisearch_mainsearch"
        },
        "mainsearch_de": {
          "type": "text",
          "analyzer": "german"
        },
        "mainsearch_en": {
          "type": "text",
          "analyzer": "english"
        },
        "mainsearch_it": {
          "type": "text",
          "analyzer": "italian"
        },
        "mainsearch_ru": {
          "type": "text",
          "analyzer": "russian"
        },
        "nested": {
          "type": "nested",
          "properties": {
            "depth": {
              "type": "integer"
            },
            "dvalue": {
              "type": "date"
            },
            "name": {
              "type": "text",
              "analyzer": "not_analyzed_lowercase"
            },
            "nvalue": {
              "type": "double"
            },
            "path": {
              "type": "text",
              "fields": {
                "na": {
                  "type": "text",
                  "analyzer": "not_analyzed_lowercase"
                }
              },
              "analyzer": "omnisearch_path_analyzer"
            },
            "svalue": {
              "type": "keyword",
              "fields": {
                "na": {
                  "type": "text",
                  "analyzer": "not_analyzed_lowercase"
                }
              }
            }
          }
        },
        "ngrammed": {
          "type": "text",
          "analyzer": "trigram_standard"
        },
        "offline": {
          "type": "boolean"
        },
        "payload": {
          "type": "keyword",
          "index": false
        },
        "relatedIds": {
          "type": "keyword"
        },
        "secondaryUpdateToken": {
          "type": "long"
        },
        "securityTokens": {
          "type": "text",
          "analyzer": "not_analyzed_lowercase"
        },
        "title": {
          "type": "keyword",
          "normalizer": "lowercase"
        },
        "type": {
          "type": "keyword"
        },
        "unsercured": {
          "type": "boolean"
        }
      }
    }
  }
}

If I have time I'll try to reproduce on a simple onefile project.

@flobernd
Copy link
Member

Hi @alkampfergit , thanks for providing the JSON request/response payloads.

The request produced by the Indices.CreateAsync correctly serializes the search_analyzer field which means that this is not a client error.

Just to triple check, could you please execute the exact same request using curl or in the Kibana Dev Console? I strongly expect this to produce the same result.

Unfortunately I don't know why the server does not seem to save the search_analyzer setting. To clarify this, you might probably want to contact support or ask in our discuss forums.

@alkampfergit
Copy link
Author

alkampfergit commented Apr 24, 2025

Hi @flobernd sorry for late response but I was ill. Actually I've tried with postman and I got the very same result. I'll move to the forum.

I also solved the issue, it seems that elasticsearch changed behaviour from version previous to 8 to version 8. If you examine the mapping the test is setting a searchanalyzer that is THE SAME of the analyzer. Since this is the default behaviour, it seems that Elasticsearch 8 will not set the value explicitly if the two are the same. Setting a different analyzer only for searchAnalyzer works correctly.

I've changed the test to try this situation because the old test makes little sense. This is a set of more than 1000 unit test that is performing every query we do to elastic as all the kind of mapping we can do dynamically in the code, that specific test is testing the ability to explicitly set the searchanalyzer but it uses the same value of the analyzer.

I'm closing this bug because it is not a bug after all.

@flobernd
Copy link
Member

Hi @alkampfergit , thanks for the update 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.x Relates to a 8.x client version Category: Bug
Projects
None yet
Development

No branches or pull requests

2 participants