_transform API does not support missing_bucket #55102

gatling822 · 2020-04-12T17:12:57Z

Elasticsearch/Kibana Version 7.5.2 running in on-prem ECE

While working with the _transaform API I noticed that documents were only getting grouped if all fields in the group by section had a value. Some reading in the composite aggregation documentation I see that this can be fixed by setting missing_bucket to true. The terms aggregation also has a missing parameter that can be set but that appears to no be supported either #48243

Composite Aggregation

Any idea how we can get around this?

POST _transform/_preview
{
  "source": {
    "index": [
      "visanet-transaction*"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest"
  },
  "pivot": {
    "group_by": {
      "transactionTime": {
        "date_histogram": {
          "field": "transactionTime",
          "fixed_interval": "1h"
        }
      },
      "vipID": {
        "terms": {
          "field": "vipID",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "transactionID.value_count": {
        "value_count": {
          "field": "transactionID"
        }
      },
      "usdAmount.sum": {
        "sum": {
          "field": "usdAmount"
        }
      }
    }
  }
}

Error:

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[1:20] [data_frame_terms_group] unknown field [missing_bucket], parser not found"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[1:167] [data_frame_transform_config] failed to parse field [pivot]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:167] [data_frame_transform_pivot] failed to parse field [group_by]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:20] [data_frame_terms_group] unknown field [missing_bucket], parser not found"
      }
    }
  },
  "status": 400
}

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-04-13T17:09:01Z

Pinging @elastic/ml-core (:ml/Transform)

hendrikmuhs · 2020-04-14T15:08:13Z

@gatling822 Thanks for your input. I am afraid there is no workaround. As you already stated, its a missing feature in transform as already described in #48243.

Is there anything in addition to what is already in #48243, if not I will close this as duplicate.

gatling822 · 2020-04-14T16:04:10Z

Is there a reason we can't use missing_bucket, which seems to be supported under Composite aggregations and does a similar task?

Can a work around be to set a null_value in the index mapping, do you know how this will affect indexing or storage?

hendrikmuhs · 2020-04-14T17:38:01Z

Is there a reason we can't use missing_bucket, which seems to be supported under Composite aggregations and does a similar task?

The reason is as simple as "not implemented", transform is still quite new, other things are missing, too.

Can a work around be to set a null_value in the index mapping, do you know how this will affect indexing or storage?

Good idea. This sounds like a good workaround, however it requires that you can change your data or ingesting new data, for existing/old data it requires a reindex. As long as you choose a reasonably sized value, I do not think you notice any performance or sizing problems.

gatling822 · 2020-04-15T01:43:09Z

Thanks for the help @hendrikmuhs
One last question is there some published roadmap for _transform API or Elastic in general?

hendrikmuhs · 2020-04-15T07:30:59Z

There is no published roadmap, neither a private one. Why? Because that's not how we work. Elasticsearch does time based releases, therefore a new feature is not strictly tied to a release number.

This does not mean, that we have no plan, we actually strive for releasing a new feature in a certain version. However, the release train does not stop, if we miss it, we take the next one. We do not delay a release for a new feature, because there is always another one that seems important. We delay for quality and worst case we even revert a feature, if it does not meet the quality bar. That said, this is one of the reasons why we do not publish release dates upfront. However, if you work with the stack for some time, you notice, there is always something new within 2-3 months.

Long story, short: We appreciate feedback, especially on rather new features like transform. Up-voting helps us to prioritize.

The best way to follow the development is using github. If you want to follow certain features, we have labels like :ml/Transform, with such a label as a search filter you get a better overview of what is been worked on. The same label is useful for Pull requests, to see what coming you can switch to the closed PR's and apply the feature label and a version label, e.g. 7.8.0.

If you miss a certain feature, great, please open an issue, but please look for an existing one first. Feel free to add to the existing one, if it turns out that its not a duplicate, we will spawn a new issue from it. Again, the simplest form of feedback is an up-vote using the built-in github voting, this is welcome, too.

Unsure, unclear, request for help? This is better placed on discuss.

add support for "missing_bucket" in group_by fixes #42941 fixes #55102

add support for "missing_bucket" in group_by fixes #42941 fixes #55102 backport #59591

nik9000 added the :ml/Transform Transform label Apr 13, 2020

hendrikmuhs self-assigned this May 29, 2020

hendrikmuhs mentioned this issue Jul 15, 2020

[Transform] add support for missing bucket #59591

Merged

hendrikmuhs closed this as completed in #59591 Jul 29, 2020

hendrikmuhs pushed a commit that referenced this issue Jul 29, 2020

[Transform] add support for missing bucket (#59591)

004388f

add support for "missing_bucket" in group_by fixes #42941 fixes #55102

hendrikmuhs mentioned this issue Jul 29, 2020

[7.x][Transform] add support for missing bucket (#59591) #60390

Merged

hendrikmuhs pushed a commit that referenced this issue Jul 30, 2020

[7.x][Transform] add support for missing bucket (#59591) (#60390)

aaed6b5

add support for "missing_bucket" in group_by fixes #42941 fixes #55102 backport #59591

Mpdreamz mentioned this issue Nov 16, 2020

7.10.1 Meta Ticket elastic/elasticsearch-net#5096

Closed

61 tasks

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_transform API does not support missing_bucket #55102

_transform API does not support missing_bucket #55102

gatling822 commented Apr 12, 2020

elasticmachine commented Apr 13, 2020

hendrikmuhs commented Apr 14, 2020

gatling822 commented Apr 14, 2020

hendrikmuhs commented Apr 14, 2020

gatling822 commented Apr 15, 2020

hendrikmuhs commented Apr 15, 2020

_transform API does not support missing_bucket #55102

_transform API does not support missing_bucket #55102

Comments

gatling822 commented Apr 12, 2020

elasticmachine commented Apr 13, 2020

hendrikmuhs commented Apr 14, 2020

gatling822 commented Apr 14, 2020

hendrikmuhs commented Apr 14, 2020

gatling822 commented Apr 15, 2020

hendrikmuhs commented Apr 15, 2020