Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_transform API does not support missing_bucket #55102

Closed
gatling822 opened this issue Apr 12, 2020 · 6 comments · Fixed by #59591
Closed

_transform API does not support missing_bucket #55102

gatling822 opened this issue Apr 12, 2020 · 6 comments · Fixed by #59591
Assignees
Labels
:ml/Transform Transform

Comments

@gatling822
Copy link

Elasticsearch/Kibana Version 7.5.2 running in on-prem ECE

While working with the _transaform API I noticed that documents were only getting grouped if all fields in the group by section had a value. Some reading in the composite aggregation documentation I see that this can be fixed by setting missing_bucket to true. The terms aggregation also has a missing parameter that can be set but that appears to no be supported either #48243

Composite Aggregation

Any idea how we can get around this?

POST _transform/_preview
{
  "source": {
    "index": [
      "visanet-transaction*"
    ],
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "dest"
  },
  "pivot": {
    "group_by": {
      "transactionTime": {
        "date_histogram": {
          "field": "transactionTime",
          "fixed_interval": "1h"
        }
      },
      "vipID": {
        "terms": {
          "field": "vipID",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "transactionID.value_count": {
        "value_count": {
          "field": "transactionID"
        }
      },
      "usdAmount.sum": {
        "sum": {
          "field": "usdAmount"
        }
      }
    }
  }
}

Error:

{
  "error": {
    "root_cause": [
      {
        "type": "x_content_parse_exception",
        "reason": "[1:20] [data_frame_terms_group] unknown field [missing_bucket], parser not found"
      }
    ],
    "type": "x_content_parse_exception",
    "reason": "[1:167] [data_frame_transform_config] failed to parse field [pivot]",
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:167] [data_frame_transform_pivot] failed to parse field [group_by]",
      "caused_by": {
        "type": "x_content_parse_exception",
        "reason": "[1:20] [data_frame_terms_group] unknown field [missing_bucket], parser not found"
      }
    }
  },
  "status": 400
}
@nik9000 nik9000 added the :ml/Transform Transform label Apr 13, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

@hendrikmuhs
Copy link

@gatling822 Thanks for your input. I am afraid there is no workaround. As you already stated, its a missing feature in transform as already described in #48243.

Is there anything in addition to what is already in #48243, if not I will close this as duplicate.

@gatling822
Copy link
Author

Is there a reason we can't use missing_bucket, which seems to be supported under Composite aggregations and does a similar task?

Can a work around be to set a null_value in the index mapping, do you know how this will affect indexing or storage?

@hendrikmuhs
Copy link

Is there a reason we can't use missing_bucket, which seems to be supported under Composite aggregations and does a similar task?

The reason is as simple as "not implemented", transform is still quite new, other things are missing, too.

Can a work around be to set a null_value in the index mapping, do you know how this will affect indexing or storage?

Good idea. This sounds like a good workaround, however it requires that you can change your data or ingesting new data, for existing/old data it requires a reindex. As long as you choose a reasonably sized value, I do not think you notice any performance or sizing problems.

@gatling822
Copy link
Author

Thanks for the help @hendrikmuhs
One last question is there some published roadmap for _transform API or Elastic in general?

@hendrikmuhs
Copy link

There is no published roadmap, neither a private one. Why? Because that's not how we work. Elasticsearch does time based releases, therefore a new feature is not strictly tied to a release number.

This does not mean, that we have no plan, we actually strive for releasing a new feature in a certain version. However, the release train does not stop, if we miss it, we take the next one. We do not delay a release for a new feature, because there is always another one that seems important. We delay for quality and worst case we even revert a feature, if it does not meet the quality bar. That said, this is one of the reasons why we do not publish release dates upfront. However, if you work with the stack for some time, you notice, there is always something new within 2-3 months.

Long story, short: We appreciate feedback, especially on rather new features like transform. Up-voting helps us to prioritize.

The best way to follow the development is using github. If you want to follow certain features, we have labels like :ml/Transform, with such a label as a search filter you get a better overview of what is been worked on. The same label is useful for Pull requests, to see what coming you can switch to the closed PR's and apply the feature label and a version label, e.g. 7.8.0.

If you miss a certain feature, great, please open an issue, but please look for an existing one first. Feel free to add to the existing one, if it turns out that its not a duplicate, we will spawn a new issue from it. Again, the simplest form of feedback is an up-vote using the built-in github voting, this is welcome, too.

Unsure, unclear, request for help? This is better placed on discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants