Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synthetic _source: support ignore_malformed #90007

Closed
felixbarny opened this issue Sep 12, 2022 · 13 comments
Closed

Synthetic _source: support ignore_malformed #90007

felixbarny opened this issue Sep 12, 2022 · 13 comments
Assignees
Labels

Comments

@felixbarny
Copy link
Member

As part of LX, we want to make log indices more lenient. Therefore, we're planning on using ignore_malformed. See also #88777. While using synthetic source is not a requirement for LX, we should think about how we could eventually leverage synthetic source to save space.

One of the blockers is that synthetic source doesn't have support for ignore_malformed. That's because malformed values will only be available in the _source. But if the source is synthetic, malformed values would just be lost.

We could to something similar to #89466 and store malformed values in a hidden stored field. One of the challenges is that stored fields are strictly typed (IINM) but the malformed value could be of any type. But we could just always store the malformed value in a string field, in the exact form that was provided in the JSON document. For example "42" (note the quotes), true, 42. When re-constructing the _source, we can just use that value and append it to the JSON document. Just an idea :)

cc @nik9000

@felixbarny felixbarny added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics labels Sep 12, 2022
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Sep 12, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000 nik9000 mentioned this issue Sep 12, 2022
50 tasks
@nik9000
Copy link
Member

nik9000 commented Sep 13, 2022

Began in #90038

@javanna
Copy link
Member

javanna commented Apr 21, 2023

heya @nik9000 do you remember what is left to do here?

@felixbarny
Copy link
Member Author

AFAICS, Nik added support for the ip field type but others are missing. I guess the first step would be to create a list of all field types and tick the one(s) off that are supported already and then work on implementing support for the rest.

We have recently made the change to use ignore_malformed by default for logs: #95329.

@felixbarny
Copy link
Member Author

create a list of all field types and tick the one(s) off that are supported already and then work on implementing support for the rest.

Nvm. this already exists here: #86603

@nik9000
Copy link
Member

nik9000 commented May 4, 2023

heya @nik9000 do you remember what is left to do here?

Wow I missed this comment. Sorry.

I left off trying to build ignore_malformed for geo_point which is especially complex because it's a json object and not a single field. So we have to "go back" if we hit a malformed object. I think I had it working, but the implementation made me sad: #90777 . I picked geo_point at the time because it was the simplest example of the "complex" fields. But I think the way forward is clear for most of the other field types. Just not the json object ones.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@lkts
Copy link
Contributor

lkts commented Apr 30, 2024

Currently ignore_malformed is not supported while using synthetic source for following fields:

  • aggregate_metric_double
  • scaled_float
  • date, date_nanos
  • geo_point

geo_shape does not support synthetic source to begin with. We plan to implement a generic synthetic source implementation that will store fields that don't natively support synthetic source as is. It should cover geo_shape.

@axw
Copy link
Member

axw commented May 8, 2024

@lkts is that list meant to be exhaustive? It's also not supported by aggregate_metric_double, as I've just found out:

public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
if (ignoreMalformed) {
throw new IllegalArgumentException(
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it ignores malformed numbers"
);
}
return new AggregateMetricSyntheticFieldLoader(name(), simpleName(), metrics);
}

axw added a commit to axw/elasticsearch that referenced this issue May 9, 2024
Enable ignore_malformed on all non-metrics APM data streams,
and enable ignore_dynamic_beyond_limit for all APM data streams.

We can enable ignore_malformed on metrics data streams when
elastic#90007 is fixed.
axw added a commit that referenced this issue May 9, 2024
* apm-data: ignore_{malformed,dynamic_beyond_limit}

Enable ignore_malformed on all non-metrics APM data streams,
and enable ignore_dynamic_beyond_limit for all APM data streams.

We can enable ignore_malformed on metrics data streams when
#90007 is fixed.

* Update docs/changelog/108444.yaml
@lkts
Copy link
Contributor

lkts commented May 13, 2024

I missed that, thank you for notifying.

@lkts
Copy link
Contributor

lkts commented May 23, 2024

See #106483.

@martijnvg martijnvg added :StorageEngine/Mapping The storage related side of mappings and removed :StorageEngine/TSDB You know, for Metrics :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team labels May 31, 2024
@lkts
Copy link
Contributor

lkts commented Jul 2, 2024

Closing as a duplicate of #106483.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants