Make similarities dynamically updatable where possible #6727

clintongormley · 2014-07-04T10:32:22Z

The core similarities can be swapped in dynamically on an existing index, as long as discount_overlaps is the same. Currently we disallow updating similarities, because custom similarities may not be compatible.

The logic for deciding whether a similarity can be changed should be more fine grained.

The text was updated successfully, but these errors were encountered:

ghost · 2015-06-17T06:16:22Z

This would be a really nice addition - for an example we have now millions of documents, and would like to experiment with the scoring to deliver that last 10%, but we need to reindex the whole lot each time we want to change the similarity.. This means that we probably end up with as many clusters as there are scoring algos available, which costs time, money, effort and motivation.

ghost · 2015-06-17T06:28:36Z

Maybe you should add interface to the Similarity implementations and add

/**
 * Returns the type names that are compatible metadata wise with this Similarity.
 **/
String[] getMetadataCompatibleTypeNames()

This would allow logic to be added to determine if the change should be allowed or not, solving the custom similarity compatibility problem. (As obviously the custom implementations should implement it too.) Downside for this is that you'd need a wrapper implementation for the Lucene provided Similarities and it would break existing custom ones. (Though a default wrapper that declares compatibility with none would allow everything to work as now.)

I'm not familiar enough with the Lucene classes to know if there is already a built in way to infer that knowledge though. Why this is not solved in Lucene level btw?

rmuir · 2015-06-17T07:08:15Z

Because its not an issue with Lucene. You just call IndexWriterConfig.setSimilarity() and that is what IndexWriter will use to encode normalization factors.

Similarity impls can shove whatever it wants in there, up to 64-bits of stuff encoded in whatever form it wants. So ES does the right thing to prevent you from changing this here (in general). It is the same as changing index analyzer for a field, its generally just an unsafe thing to do.

But the core similarities introduced in lucene 4.0 have a special property, in that by default they all encode the index-time information (normalization factor) in a backwards compatible way as DefaultSimilarity historically did: as 1/sqrt(length) with a certain single-byte encoding.

This was done intentionally to make experimentation and "simple" testing of these ranking algorithms easier. It should not be enforced with any interface or anything like that, because subclasses and even setter methods can easily break it. It is just a way to quickly experiment with different ranking algorithms without reindexing.

I think its nice to expose (safely) this optimization to users of ES, too, so they can play in the same way. But it does not need any additional APIs for experts or custom implementations, that is misleading and dangerous.

If you are really trying to get the last 10% then I don't think this issue is really relevant: its just not going to hold for "tuning". If you are really tuning, you will likely break this property yourself anyway: the default encoding used here is very general purpose and must support a crazy range for documents large and small and various values for index-time boost. If those assumptions don't hold, in many cases you can tweak normalization to be better by adjusting the encoding.

ghost · 2015-06-17T07:37:11Z

Hi, thanks for the response!

I meant that I'm trying to cater a better search experience for the end users, and tuning the relevance ranking, which for me as the user of ES, is the last 10%. (It does pretty well with the defaults, but there are cases where simple per field/query boosting is not delivering. Hence the need to experiment with similarities.)

I'd love to have this exposed to ES users too if possible, though I understand that I'm asking here the permission to (possibly) shoot myself to the foot :)

The API thingy was just a proposal to formalize the now unofficial contract which similarities are interchangeable, but as said I don't know if it makes sense or not. (Well, now I do know that it does not.)

rmuir · 2015-06-17T08:05:48Z

I don't think we should give users the ability to shoot themselves in the foot, ever. Its easily prevented.

A common use case for this issue would be to allow someone to switch from the default similarity to BM25 and then tweak k1 and b parameter values all without reindexing. This is totally safe, and expert enough!

Having a custom similarity (subclass) is a much more expert thing and we don't need to make things complicated for that. If you already know enough to make your own similarity class, then you already have an expert way to tune without reindexing: you can tune your parameters by changing some constant in your code and ES is none the wiser.

s1monw · 2015-06-17T09:40:02Z

I don't think we should give users the ability to shoot themselves in the foot, ever. Its easily prevented.

👍

I agree with rob here and I don't see really a need to do much on this issue.

ghost · 2015-06-17T10:43:57Z

Did I understand it correctly that you wish to close this as won't fix?

s1monw · 2015-06-17T10:59:25Z

@villeapvirtanen yeah that is what I propose

rmuir · 2015-06-17T11:03:04Z

I think the simple case is nice to have for the core similarities from lucene (see my BM25 example above). But i have no idea how tricky it is to implement this.

rjernst · 2015-06-17T17:21:15Z

The similarity parameters are set outside of the mappings (they are in a parallel section called "similarity"). But glancing at the code, I cannot see how it is possible they are updated (or even adding new ones after index creation). I agree this should be fixed: like with mappings, you should be able to tweak the parameters of the similarity, but not change the type, for a given name.

rmuir · 2015-06-17T17:25:35Z

like with mappings, you should be able to tweak the parameters of the similarity, but not change the type, for a given name.

Its not like mappings at all though.

Changing DefaultSimilarity to BM25Similarity is ok: the on-disk encoding is the same.
Changing BM25Similarity k1/b parameters is ok: the on-disk encoding is the same.
Changing BM25Similarity.discountOverlaps is not ok, you need to reindex.

jpountz · 2018-03-13T18:11:50Z

cc @elastic/es-search-aggs

robinp · 2019-09-17T13:01:17Z

Hello - is this still on the plate? Changing b and k1 on the fly for BM25 would be really nice. It seems a waste to reindex if no actual on-disk data would change.

missinglink · 2020-01-10T14:10:19Z

I found my way here for the same reason, I would like to experiment with tweaking k1 for BM25 but it currently requires a full reindex.

robinp · 2020-01-10T14:17:15Z

To ease the pain, I found that if you define a custom similarity, then you can later change the parameters (after closing the index, using the API). So it is safest to add a custom similarity with the stock parameters before indexing.

Once indexed, you can easily change the parameters.

missinglink · 2020-01-10T14:59:05Z

Excellent thank you! Sounds like that solves my issue.

I've added some examples of how to achieve this in this PR.

javanna · 2024-05-31T12:10:05Z

We have no plans on implementing this for the time being. Closing.

clintongormley added the enhancement label Jul 4, 2014

clintongormley assigned rmuir and dakrone and unassigned rmuir Jul 4, 2014

clintongormley mentioned this issue Jul 22, 2014

Merging similarities #4403

Closed

clintongormley added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Jun 18, 2015

clintongormley added the help wanted adoptme label Nov 21, 2015

This was referenced Jun 20, 2016

One similarity per index #18971

Closed

Add the ability to dynamically update similarity options #19046

Closed

jimczi mentioned this issue Sep 6, 2016

Similarity should accept dynamic settings when possible #20339

Closed

clintongormley assigned jimczi and unassigned dakrone Nov 25, 2016

jpountz added the high hanging fruit label Mar 13, 2018

jimczi removed their assignment Aug 21, 2018

missinglink mentioned this issue Jan 10, 2020

add peliasDefaultSimilarity pelias/schema#430

Merged

rjernst added the Team:Search Meta label for search team label May 4, 2020

javanna closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2024

javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make similarities dynamically updatable where possible #6727

Make similarities dynamically updatable where possible #6727

clintongormley commented Jul 4, 2014

ghost commented Jun 17, 2015

ghost commented Jun 17, 2015

rmuir commented Jun 17, 2015

ghost commented Jun 17, 2015

rmuir commented Jun 17, 2015

s1monw commented Jun 17, 2015

ghost commented Jun 17, 2015

s1monw commented Jun 17, 2015

rmuir commented Jun 17, 2015

rjernst commented Jun 17, 2015

rmuir commented Jun 17, 2015

jpountz commented Mar 13, 2018

robinp commented Sep 17, 2019

missinglink commented Jan 10, 2020

robinp commented Jan 10, 2020

missinglink commented Jan 10, 2020 •

edited

Loading

javanna commented May 31, 2024

Make similarities dynamically updatable where possible #6727

Make similarities dynamically updatable where possible #6727

Comments

clintongormley commented Jul 4, 2014

ghost commented Jun 17, 2015

ghost commented Jun 17, 2015

rmuir commented Jun 17, 2015

ghost commented Jun 17, 2015

rmuir commented Jun 17, 2015

s1monw commented Jun 17, 2015

ghost commented Jun 17, 2015

s1monw commented Jun 17, 2015

rmuir commented Jun 17, 2015

rjernst commented Jun 17, 2015

rmuir commented Jun 17, 2015

jpountz commented Mar 13, 2018

robinp commented Sep 17, 2019

missinglink commented Jan 10, 2020

robinp commented Jan 10, 2020

missinglink commented Jan 10, 2020 • edited Loading

javanna commented May 31, 2024

missinglink commented Jan 10, 2020 •

edited

Loading