Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fallback synthetic source for copy_to and doc_values: false cases #112294

Merged
merged 13 commits into from
Sep 10, 2024

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Aug 28, 2024

Closes #109546.
Closes #110038.
Closes #110753.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.16.0 labels Aug 28, 2024
@lkts lkts marked this pull request as draft August 28, 2024 16:28
@lkts lkts force-pushed the synthetic_source_copy_to branch from 97a1ea5 to bda4de6 Compare September 5, 2024 22:37
@lkts lkts changed the title Support copy_to with synthetic source using fallback mechanism Use fallback synthetic source for copy_to and doc_values: false cases Sep 5, 2024
@lkts lkts added >feature :StorageEngine/Mapping The storage related side of mappings >enhancement and removed >feature labels Sep 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've created a changelog YAML for you.

@lkts lkts marked this pull request as ready for review September 5, 2024 22:51
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Sep 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

}
)
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a catch-all test to exercise copy_to in all subclasses?

Copy link
Contributor Author

@lkts lkts Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because copy_to only works with text-like data. F.e. for histogram it does not work even though technically it is supported.

@kkrik-es
Copy link
Contributor

kkrik-es commented Sep 6, 2024

Oops, sorry for the merge conflicts.. I think if you just keep your version it should be good..

@kkrik-es kkrik-es closed this Sep 6, 2024
@kkrik-es kkrik-es reopened this Sep 6, 2024
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mode: synthetic
properties:
foo:
type: match_only_text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if here we add store: true...are we going to store the value twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it is not necessarily the same value (e.g. normalizers).

id: "1"
refresh: true
body:
foo: "Apache Lucene powers Elasticsearch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test with an array of values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add them in a follow-up.

}

var kwd = TextFieldMapper.SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
if (kwd != null) {
return kwd.syntheticFieldLoader(leafName());
return new SyntheticSourceSupport.Native(kwd.syntheticFieldLoader(leafName()));
Copy link
Contributor

@salvatore-campagna salvatore-campagna Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can consider naming the two synthetic source modes WithDocValues and WithStoredFields instead of Native and Fallback? Like:

  • SyntheticSourceSupport.WithDocValues
  • SyntheticSourceSupport.WithStoredFields

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really the distinction i am trying to make. Yes, _ignored_source is a stored field that holds the value used for synthetic source but it is also possible that the field itself uses a different stored field for synthetic source. That becomes pretty blurry.

The difference between Native and Fallback is who controls synthetic source logic. In Native it's the mapper, in Fallback it's on a layer above and the mapper does not control what happens.

mode: synthetic
properties:
annotated_text:
type: annotated_text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before...if we add store: true do we store twice?

_source:
mode: synthetic
properties:
number:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test copy_to in combination with ignore_above and ignore_malformed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test it with ignore_malformed for unsigned_long f.e.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore_above is a good point, i'll follow up.

*/
protected enum SyntheticSourceMode {
NATIVE,
FALLBACK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asked this above already: should we name these something like WITH_DOC_VALUES and WITH_STORED_FIELDS?

@@ -35,7 +35,8 @@ public Set<NodeFeature> getFeatures() {
ObjectMapper.SUBOBJECTS_AUTO,
KeywordFieldMapper.KEYWORD_NORMALIZER_SYNTHETIC_SOURCE,
SourceFieldMapper.SYNTHETIC_SOURCE_STORED_FIELDS_ADVANCE_FIX,
Mapper.SYNTHETIC_SOURCE_KEEP_FEATURE
Mapper.SYNTHETIC_SOURCE_KEEP_FEATURE,
SourceFieldMapper.SYNTHETIC_SOURCE_RESTRICTIONS_REMOVED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYNTHETIC_SOURCE_WITH_COPY_TO_AND_DOC_VALUES_FALSE_SUPPORT? I Know it is longer but much more descriptive.

}

private Supplier<Map<String, Object>> scaledFloatMapping() {
return () -> {
var scalingFactor = ESTestCase.randomFrom(10, 1000, 100000, 100.5);
return Map.of("scaling_factor", scalingFactor, "store", ESTestCase.randomBoolean(), "index", ESTestCase.randomBoolean());
return Map.of(
"scaling_factor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

@salvatore-campagna
Copy link
Contributor

73 files is quite a lot anyway

@salvatore-campagna
Copy link
Contributor

Don't forget the PR description

@elasticsearchmachine
Copy link
Collaborator

Hi @lkts, I've updated the changelog YAML for you.

@lkts
Copy link
Contributor Author

lkts commented Sep 10, 2024

@elasticmachine update branch

@lkts lkts merged commit 082e721 into elastic:main Sep 10, 2024
15 checks passed
@lkts lkts deleted the synthetic_source_copy_to branch September 10, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants