fix(ingest/mongodb): Fix downsampling the collection schema output undetermined #9612
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We noticed from recurring ingestion of MongoDB that there are changes of fields in the dataset while there is no modification of the source data.
The root cause is the ingestion will downsample the collection schema based on
max_schema_size
we set in the config. The collection fields are sorted by count but there is no further sorting applied when the count is the same. We should add a secondary element delimited_name to the sorted function so the output is consistentChecklist