Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset to data stream #24

Merged
merged 6 commits into from
Sep 30, 2020

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Aug 13, 2020

This PR updates the spec to rename dataset references in packages to data_stream.

Concretely, the following elements of packages are changed:

  • In the package's root manifest.yml file, the datasets property must now be called data_streams instead.
  • In the package's root folder, the dataset folder must now be called data_stream instead.

Resolves #18.

@ycombinator ycombinator requested review from ruflin and mtojek August 13, 2020 17:13
@elasticmachine
Copy link

elasticmachine commented Aug 13, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #24 updated]

  • Start Time: 2020-09-28T22:41:29.517+0000

  • Duration: 2 min 9 sec

versions/1/spec.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One additional conversation we can have here is about our recommendation on the names of these data stream template directories. So far what we used is the second part of the dataset name that combined with the package it results in the dataset part. This applies in most cases but not all of them, that is why we allow to overwrite it.

I personally still like the naming we have as it makes browsing easy but I thought worth bringing up. In the end I don't expect this to change the spec, only our recommendation on naming.

versions/1/spec.yml Outdated Show resolved Hide resolved
versions/1/data_stream/manifest.spec.yml Outdated Show resolved Hide resolved
@@ -1,10 +1,10 @@
spec:
- description: Folder containing a single dataset definition
- description: Folder containing a single data stream definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled over this as it is not fully current. This a template for all data streams created for a specific type + dataset combination. All logs-nginx.access-* will use this template, independent of the namespace.

In my head, these are data stream templates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I'm lost here. @ruflin, could you please elaborate more about this? data streams, templates, namespaces

Copy link
Contributor Author

@ycombinator ycombinator Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm lost too. I think it might be useful to add a definitions.md or just a README.md file in the versions/1 folder defining all these terms and concepts and how they relate to each other. The spec (so far) describes the syntactic relationships but I also have to admit that I don't fully understand all the semantic relationships between these concepts.

@ruflin can you please make a new PR to add such a file? I think you are the only(?) one who understands all these terms (old and new), the corresponding concepts, and the relationships between them.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek @ycombinator Before we get this change in, I would like to answer all the questions around dataset / data_stream to make sure we don't introduce breaking changes to the spec before we are 100% confident these are the right changes. I'll set up a call.

@ycombinator ycombinator force-pushed the dataset-to-data_stream branch from c48677d to 72503d2 Compare September 15, 2020 13:49
@ycombinator
Copy link
Contributor Author

@ruflin @mtojek I'm resurrecting this PR after a while. I've rebased it on master so it has all the other changes now. Could you please re-review when you get a chance? Thanks!

@ycombinator ycombinator requested a review from ruflin September 15, 2020 13:50
@mtojek
Copy link
Contributor

mtojek commented Sep 15, 2020

Ok, so once you push this change to master, we'll block introducing next changes to this repository until the package-registry and all integrations are adjusted?

@ycombinator
Copy link
Contributor Author

Ok, so once you push this change to master, we'll block introducing next changes to this repository until the package-registry and all integrations are adjusted?

Yes. But to make the transition quick, I would say we should try to get the PRs ready in package-registry and integrations repos along side this one so we can merge all three at around the same time. We can also merge this PR into master and update elastic-package, but then make a PR to integrations updating elastic-package and making the necessary adjustments to the packages.

@mtojek
Copy link
Contributor

mtojek commented Sep 15, 2020

I think we can safely push this to master. Then:

  1. Update dependency on the elastic-package.
  2. Adjust package-registry (not sure if this is safe to push changes, @ruflin ?)
  3. Adjust integrations.

Points 1. and 2. can go in parallel.

I'm not sure about the package-storage, but I bet this one should go as the last one?

@mtojek
Copy link
Contributor

mtojek commented Sep 15, 2020

Actually the same question goes to Kibana. Are we safe to introduce such renaming or is it better to schedule it? @ruflin

@ruflin
Copy link
Contributor

ruflin commented Sep 16, 2020

We should schedule this change together with Kibana. For my two recent breaking changes to the registry @jfsiii has worked some magic that both version work which is also a best case scenario here. If we can do that, it means Kibana merges + Backports this first, then we get it in and roll it out to all of the parts mentioned above. Can you sync up with @jfsiii on this?

If it becomes complicated on the Kibana side to have it backward compatible, we do a coordinated merge which might break some things for 24h which depend on snapshots.

@ycombinator
Copy link
Contributor Author

I'm unclear as to why this PR needs to be scheduled with a Kibana PR. This PR just changes the spec. It does not change any packages. So I'm not understanding the impact of this PR on Kibana. What am I missing?

@ruflin
Copy link
Contributor

ruflin commented Sep 18, 2020

@ycombinator If we get this PR in without the changes ready on the Kibana side, it means if we have other changes to the package-spec and want to roll it out, we can't as it will break things. So just merging it will not have a direct impact on Kibana but it will hold development back on other things in the spec.

@jfsiii
Copy link

jfsiii commented Sep 18, 2020

@ruflin I'm not totally clear on the how the spec changes map to Registry & Kibana code/type changes but am happy to sync up with @mtojek. @jen-huang did a lot of dataset.* to data_stream.* conversion in elastic/kibana@042254f

We do reference datasets in our RegsitryPackage type which mirrors the Registry's Package.Datasets

@ruflin
Copy link
Contributor

ruflin commented Sep 21, 2020

@jfsiii @ycombinator Could you sync up to push this forward? Please pull me in if needed. Please directly also discuss #19 to make sure everyone is aware of all the changes coming.

@mtojek
Copy link
Contributor

mtojek commented Sep 28, 2020

@ycombinator I'm not sure why the CI didn't report problems with test directories:

https://github.com/ycombinator/package-spec/tree/dataset-to-data_stream/code/go/internal/validator/test/packages/bad_deploy_variants
https://github.com/ycombinator/package-spec/tree/dataset-to-data_stream/code/go/internal/validator/test/packages/good

There are still "dataset" folders and it failed for me in the local environment.

EDIT:

You can also rebase the PR.

@mtojek mtojek mentioned this pull request Sep 28, 2020
@ycombinator ycombinator force-pushed the dataset-to-data_stream branch from 72503d2 to f706b65 Compare September 28, 2020 12:49
jen-huang added a commit to jen-huang/kibana that referenced this pull request Sep 28, 2020
jen-huang added a commit to jen-huang/kibana that referenced this pull request Sep 28, 2020
jen-huang added a commit to jen-huang/kibana that referenced this pull request Sep 28, 2020
@ycombinator
Copy link
Contributor Author

@mtojek Looks like CI just failed as expected: https://beats-ci.elastic.co/blue/organizations/jenkins/Beats%2Fpackage-spec/detail/PR-24/6/pipeline. Yay!

Fixing...

@ycombinator ycombinator merged commit 734e3cc into elastic:master Sep 30, 2020
@ycombinator ycombinator deleted the dataset-to-data_stream branch September 30, 2020 11:05
jen-huang added a commit to elastic/kibana that referenced this pull request Oct 1, 2020
…fig_templates`->`policy_templates` renaming (#78699)

* Match elastic/package-spec#24 `datasets`->`data_streams` property renaming

* Match elastic/package-spec#24 `datasets.name`->`data_streams.dataset` property renaming

* Match elastic/package-spec#24 `/dataset`->`/data_stream` directory renaming

* Match elastic/package-spec#50 `config_templates`->`policy_templates` property renaming

* Update API integration test fixtures (test packages)

* Temporarily skip API integration tests

* Temporarily skip more API integration tests

* Pin to custom docker image, unskip test suites, clean up broken icon paths in test package manifests

* Skip the only (yay!) failing test suite

* Revert "Skip the only (yay!) failing test suite"

This reverts commit 3db32e2.

* Re-skip tests and revert docker image

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jen-huang added a commit to jen-huang/kibana that referenced this pull request Oct 1, 2020
…fig_templates`->`policy_templates` renaming (elastic#78699)

* Match elastic/package-spec#24 `datasets`->`data_streams` property renaming

* Match elastic/package-spec#24 `datasets.name`->`data_streams.dataset` property renaming

* Match elastic/package-spec#24 `/dataset`->`/data_stream` directory renaming

* Match elastic/package-spec#50 `config_templates`->`policy_templates` property renaming

* Update API integration test fixtures (test packages)

* Temporarily skip API integration tests

* Temporarily skip more API integration tests

* Pin to custom docker image, unskip test suites, clean up broken icon paths in test package manifests

* Skip the only (yay!) failing test suite

* Revert "Skip the only (yay!) failing test suite"

This reverts commit 3db32e2.

* Re-skip tests and revert docker image

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
jen-huang added a commit to elastic/kibana that referenced this pull request Oct 1, 2020
…fig_templates`->`policy_templates` renaming (#78699) (#79183)

* Match elastic/package-spec#24 `datasets`->`data_streams` property renaming

* Match elastic/package-spec#24 `datasets.name`->`data_streams.dataset` property renaming

* Match elastic/package-spec#24 `/dataset`->`/data_stream` directory renaming

* Match elastic/package-spec#50 `config_templates`->`policy_templates` property renaming

* Update API integration test fixtures (test packages)

* Temporarily skip API integration tests

* Temporarily skip more API integration tests

* Pin to custom docker image, unskip test suites, clean up broken icon paths in test package manifests

* Skip the only (yay!) failing test suite

* Revert "Skip the only (yay!) failing test suite"

This reverts commit 3db32e2.

* Re-skip tests and revert docker image

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Discuss] Impact of renaming dataset to data_stream
6 participants