Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert change (#90674) #103865

Merged
merged 8 commits into from
Jan 4, 2024

Conversation

benwtrent
Copy link
Member

Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to account for later changes.

This is in contrast with: #103858

closes: #103011

@benwtrent benwtrent added >bug :Search Foundations/Mapping Index mappings, including merging and defining field types v8.12.1 v8.13.0 v8.11.4 labels Jan 3, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 3, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this and compared line by line with the PR that it reverts. It looks good to me. I think it is wise to revert at this point, and reconsider what we want to do next. You mentioned it did not revert cleanly: did the problems only come from the float handling in postProcessDynamicArrayMapping for dynamic mapping of vector fields?

Thanks for jumping on this!

@benwtrent
Copy link
Member Author

did the problems only come from the float handling in postProcessDynamicArrayMapping for dynamic mapping of vector fields?

Correct, the logic there changed from a separate bug fix.

@benwtrent benwtrent added auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport-and-merge Automatically create backport pull requests and merge when ready labels Jan 4, 2024
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@benwtrent benwtrent removed the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 4, 2024
@benwtrent benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jan 4, 2024
@benwtrent
Copy link
Member Author

@elasticmachine update branch

new HashSet<>(),
new LinkedHashMap<>(),
new HashMap<>(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ it makes sense to go back to the original hashmap, which we had before.

@benwtrent
Copy link
Member Author

run elasticsearch-ci/part-2

@benwtrent
Copy link
Member Author

run elasticsearch-ci/part-1
run elasticsearch-ci/part-3
run elasticsearch-ci/part-4
run elasticsearch-ci/bwc-snapshots
run elasticsearch-ci/packaging-tests-unix-sample
run elasticsearch-ci/packaging-tests-windows-sample

@felixbarny
Copy link
Member

FWIW, I'm also happy about this change as it will make #102936 and #96235 much easier as we don't need separate implementations for mapperSize in both Mapper and Mapper.Builder that need to be kept in sync.

Before we re-introduce this change (in case we want to do that at some point), we should really merge the two PRs mentioned above first.

@benwtrent
Copy link
Member Author

@felixbarny

Before we re-introduce this change (in case we want to do that at some point), we should really merge the two PRs mentioned above first.

By "re-introduce" you mean adding back "keep builders" change that this PR reverts?

@benwtrent
Copy link
Member Author

@elasticmachine update branch

@felixbarny
Copy link
Member

By "re-introduce" you mean adding back "keep builders" change that this PR reverts?

Yes, that's what I meant. Aka revert the revert.

So before we're considering to re-introduce to "Store dynamic mapping updates as builders", we should first merge #102936 and #96235. We can then weigh in if the additional complexity of maintaining a mapperSize implementation in both Mapper and Mapper.Builder is worth the optimization.

@elasticsearchmachine elasticsearchmachine merged commit a74ae22 into elastic:main Jan 4, 2024
15 checks passed
@benwtrent benwtrent deleted the feature/revert-90674 branch January 4, 2024 17:19
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 4, 2024
Reverts elastic#90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
elastic#103858

closes: elastic#103011
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.12
8.11

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Jan 4, 2024
Reverts elastic#90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
elastic#103858

closes: elastic#103011
elasticsearchmachine pushed a commit that referenced this pull request Jan 4, 2024
Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
#103858

closes: #103011
elasticsearchmachine pushed a commit that referenced this pull request Jan 4, 2024
Reverts #90674

The revert is not perfectly clean as there are some minor adjustments to
account for later changes.

This is in contrast with:
#103858

closes: #103011
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Jan 5, 2024
Because of elastic#103865,
DocumentParserContext#addDynamicMapper receives a Mapper, not a Mapper.Builder again.
Therefore, we don't need a mapperSize method for the builder.

This simplifies things a lot.
felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Jan 5, 2024
Because of elastic#103865,
DocumentParserContext#addDynamicMapper receives a Mapper, not a Mapper.Builder again.
Therefore, we don't need a mapperSize method for the builder.

This simplifies things a lot.
elasticsearchmachine pushed a commit that referenced this pull request Feb 6, 2024
Today, we're counting all mappers, including mappers for subfields that
aren't explicitly added to the mapping towards the field limit.

This means that some field types, such as `search_as_you_type` or
`percolator` count as more than one field even though that's not
apparent to users as they're just defining them as a single field in the
mapping.

This change makes it so that each field mapper only counts as one. We're
still counting multi-fields.

This makes it easier to understand for users why the field limit is hit.

~In addition to that, it also simplifies
#96235 as it makes the
implementation of `Mapper.Builder#getTotalFieldsCount` much easier and
easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk
of over- or under-estimating the field count of a `Mapper.Builder` in
`DocumentParserContext#addDynamicMapper`, which in turn reduces the risk
of data loss due to the issue described here:
#96235 (comment)

*Edit: due to #103865, we
don't need an implementation of `getTotalFieldsCount` or `mapperSize` in
`Mapper.Builder`. Still, this PR more closely aligns
`Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`,
which  `DocumentParserContext#addDynamicMapper` uses to determine
whether the field limit is hit*

A potential risk of this is that we're now effectively allowing more
fields in the mapping. It may be surprising to users that more fields
can be added to a mapping. Although, I'd not expect negative
consequences from that. Generally, I'd  expect users to be happy about
any change that reduces the risk of data loss.

We could also think about whether to apply the new counting logic only
to new indices (depending on the `IndexVersion`). However, that would
add more complexity and I'm not convinced about the value. We'd then
need to maintain two different ways of counting fields and also require
passing in the `IndexVersion` to `MappingLookup` which previously didn't
require the `IndexVersion`.

This PR is meant as a conversation starter. It would also simplify
#96235 but I don't think
this blocks that PR in any way.

I'm curious about the opinion of @javanna and @jpountz on this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport-and-merge Automatically create backport pull requests and merge when ready auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.11.4 v8.12.1 v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Significant ingest performance regression while handling parsing dynamically added objects
5 participants