Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dataset.name to dataset.id and dataset.path #176

Merged
merged 5 commits into from
Mar 4, 2020

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Dec 5, 2019

The dataset name in ECS consists of package.subtype, for example nginx.access. To follow ECS, this is also set now accordingly in the API output of a package.

There are two fields now:

  • dataset.id: Unique identifier of a dataset, for example nginx.access. This can be configured in a dataset
  • dataset.path: This is the name of the directory, for example access. This is especially useful for EPM to find the right assets.

@ruflin ruflin self-assigned this Dec 5, 2019
@ruflin ruflin changed the title Adjust dataset name to be package.dir-name Change dataset.name to dataset.id and dataset.path Dec 17, 2019
@ruflin
Copy link
Member Author

ruflin commented Dec 17, 2019

@exekias @skh After some more thinking, I changed this PR quite a bit. There is now an id and a path field, but no name anymore.

I renamed name to id as this should be a unique identifier. And I introduced path as this is the directory to the dataset. An event block for a dataset looks like:

    {
      "id": "bar",
      "title": "Foo",
      "name": "foo",	
      "release": "",
      "type": "logs",
      "ingest_pipeline": "pipeline-entry"
      "path": "foo"
    }	

@skh The part I'm not sure about is the path, if it should be the full path like dataset/foo or even nginx-1.0.2/dataset/foo. What makes it easiest for EPM?

@ruflin ruflin marked this pull request as ready for review December 17, 2019 14:46
@ruflin
Copy link
Member Author

ruflin commented Dec 17, 2019

@skh @jfsiii I'm pretty sure this will break some things in EPM as we use the .Name probably. It also means we need to adjust the schema.

@exekias
Copy link

exekias commented Dec 17, 2019

Thanks for pinging, do you have an example of when id and path will be different?

@ruflin
Copy link
Member Author

ruflin commented Dec 17, 2019

@exekias The path (access) and id (nginx.access) will always be different. But I think the part you are getting at where it can not be generated out package name + path. An example is our state_node metricset: https://github.com/elastic/beats/blob/master/metricbeat/module/kubernetes/state_node/state_node.go#L111 The path here is state_node but the dataset is kubernetes.node. Similar it will be with the auditd dataset that is an audit package, but the dataset is just auditd.

@skh
Copy link
Contributor

skh commented Dec 17, 2019

If this is a breaking change, do we need to do it now?

As for the path, I assume we need the property because it will not always be dataset/$NAME any more? Then I would vote for dataset/foo.

@ruflin
Copy link
Member Author

ruflin commented Dec 17, 2019

@skh It is a breaking change and we don't need to merge it now. Just did it now as things "cleared up" in my mind. Will wait with merging.

Correct with your dataset/$NAME part above. Will change it then to dataset/foo as it is in the end where we read the files from, relative path to the package root.

@ruflin
Copy link
Member Author

ruflin commented Jan 15, 2020

@jfsiii I think the time has come for this PR. Could you help me on the side of EPM to get this in?

@ruflin ruflin requested a review from jfsiii January 15, 2020 07:59
util/package.go Outdated
}
// go-ucfg automatically calls the `Validate` method on the Dataset object here
err = manifest.Unpack(d)
if err != nil {
return errors.Wrapf(err, "error building dataset in package: %s", p.Name)
}

// if id is not set, {package}.{datasetName} is the default
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be datasetPath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed

@ruflin
Copy link
Member Author

ruflin commented Feb 12, 2020

@jfsiii Rebased again on the most recent changes. Could you check what needs to happen on the KB side?

The dataset name in ECS consists of package.subtype, for example nginx.access. To follow ECS, this is also set now accordingly in the API output of a package.

There are two fields now:

* dataset.id: Unique identifier of a dataset, for example `nginx.access`. This can be configured in a dataset
* dataset.path: This is the name of the directory, for example `access`. This is especially useful for EPM to find the right assets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants