-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package best practices doc #139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So excited about this! Looks awesome 🤩
This looks good enough to me to merge as-is -- we could always revisit with updates as-needed.
The review comments include responses and ideas.
The GitHub emojis don't have nearly a wide enough range to express how excited I am about this! 🏆
Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>
Shipping it! Thanks for the feedback @dbeatty10 - I've taken a very literal approach to defining version tags, and just said "it has to meet the regex". I think that's probably the best we can do if we have taken parts of two version tagging systems? If we have taken all of PEP-440 then this gets easier but I don't think we did. |
package-best-practices.md
Outdated
- Packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility. | ||
- Packages importing another package which has reached major version one or later SHOULD allow all subsequent minor and patch releases of that major version. | ||
- Packages importing another package which is major version zero MAY pin to the current minor version only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the last 3 bullet points can all be replaced with simple guidance:
- Packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility. | |
- Packages importing another package which has reached major version one or later SHOULD allow all subsequent minor and patch releases of that major version. | |
- Packages importing another package which is major version zero MAY pin to the current minor version only. | |
- The widest version range possible SHOULD be specified to give the dependency resolver the appropriate latitude. |
My guess is that after lots of discussion exploring many scenarios, ☝️ will still hold as an essential summary, a cataclysm sentence that "contains the most information in the fewest words".
The three bullet points make assumptions about the versioning scheme being used by the imported package which might be strict SemVer 2.0, but just as likely might be a variant or something else entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three bullet points make assumptions about the versioning scheme being used by the imported package which might be strict SemVer 2.0, but just as likely might be a variant or something else entirely.
Could we change that SHOULD
from Releases and Updates to a MUST
?
Those bullets are written in the same spirit as when you go to a fancy hotel and there's a No Elephants Allowed In the Ballroom sign - you know they didn't put it up until they had to.
The first one (no patches) is the most common and catastrophic failure state I see; I'm not wed to the others but they seemed like the obvious next question: "If I'm not allowed to pin to a patch, what do you recommend?"
What do you think of something like this?
Packages SHOULD specify the widest possible range of supported versions, to minimise issues in dependency resolution. In particular, packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packages SHOULD specify the widest possible range of supported versions, to minimise issues in dependency resolution. In particular, packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility.
🎯 looks perfect
Could we change that
SHOULD
from Releases and Updates to aMUST
?
I'm guessing that we can't say "Packages MUST follow the guidance of Semantic Versioning 2.0.0" because we don't have any auditing mechanism -- there is no easy way to validate to what degree a package is following it or not. We can validate the version string with a regex, but we can't reasonably determine if they are following the SemVer guidance or not. They are on the honor system for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair!
package-best-practices.md
Outdated
|
||
### Interoperability | ||
- Packages MUST NOT override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package. | ||
- Packages MUST use the cross-database macros built into dbt Core where available, such as `{{ datediff }}` and `{{ type_string() }}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST
or SHOULD
? Could a BigQuery-specific spend package BigQuery-isms? Or what if the current cross-database implementation doesn't cover some edge case that the package needs or wants to accommodate?
Optional ideas for re-wording slightly:
- Packages MUST use the cross-database macros built into dbt Core where available, such as `{{ datediff }}` and `{{ type_string() }}`. | |
- Packages SHOULD use the [cross-database macros](https://docs.getdbt.com/reference/dbt-jinja-functions/cross-database-macros) built into dbt Core where available, such as `{{ except() }}` and `{{ type_string() }}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went back and forth on this and maybe came down too hard 🤔. How about packages intended for use with multiple warehouses MUST...
I like the link to the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an edge case that the macros don't cover, then I think the "where available" works as an escape hatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love your suggestion to introduce the nuanced language of RFC 6919 where it is useful! 🤩
News flash: I'm not an expert on the interpretation and application of RFC 2119 or 6919. The following is all informed by my nascent understanding and interpretations.
After reading the "Guidance in the use of these Imperatives" section of RFC 2119, it feels like SHOULD
is the applicable imperative here. The following stood out to me in particular:
they must not be used to try to impose a particular method on implementors where the method is not required for interoperability.
A package could ignore some of our "musts" and operate just fine. I didn't check, but I'm guessing there are examples currently in the Hub that would back this up.
But most (not all) of our current usage of MUST
seems like it should be SHOULD
instead. SHOULD
appears to be a really strong word in the spec! It shouldn't be confused at all with OPTIONAL
.
So I think we shouldn't feel like we are pandering or compromising on our expectations in any way when we use SHOULD
. We should feel like we're laying down something strong, confident, and very assertive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tragically, 6919 was published on 1 April 2013 and was never ratified. (See also https://datatracker.ietf.org/doc/html/rfc2324)
Happy to roll with SHOULD here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really excellent work -- looks fantastic 🤩
I left comments for a few nit-picky things. The "wide version ranges" comment deserves the most focus and consideration.
Co-authored-by: Doug Beatty <44704949+dbeatty10@users.noreply.github.com>
- Packages SHOULD specify the widest possible range of supported versions, to minimise issues in dependency resolution. | ||
- In particular, packages SHOULD NOT pin to a patch version of their imported package unless they are aware of an incompatibility. | ||
### Interoperability | ||
- Packages MUST NOT override dbt Core behaviour in such a way as to impact other dbt resources (models, tests, etc) not provided by the package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 I wrote this thinking about the aftermath of that Fivetran package that overrode freshness collection and had unfortunate side effects.
But, if someone like this community member turned their gist into a package, that would be an acceptable overriding of behaviour yeah? Does this rule need a bit of wordsmithing or do we just drop it down to REALLY SHOULD NOT
per RFC 6919?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for REALLY SHOULD NOT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REALLY SHOULD NOT
doesn't actually exist 😢 (discussed above) - should've made clearer that this was a joke
Re-reading the SemVer 2.0.0 spec and specifically the FAQ, we should strongly consider adding the following:
If the Hub is intended to also host packages that should not be used by anyone in production, then I fully retract the suggestion above. |
Yeahhh... but.... we are nowhere near following that rule ourselves, nor are any of the other packages in the Hub. The rest of this document is is codifying existing behaviour to save us from abstractly saying "you don't meet our norms". This would be a change of policy which I don't really want to spend time enforcing at the moment. And FWIW, we already say that people SHOULD follow SemVer, so I don't think we need to call out one FAQ specifically. I'd be happy to suggest to people who are submitting future packages that they might want to tag them 1.0.0 instead of 0.1.0 though |
I'm attracted to encoding this for a couple reasons:
I'm very aware of our own failure to follow that one 🤒 Probably even more reason for us to turn the tide in ways that we can. |
What about things like dbt-project-evaluator, dbt-audit-helper or dbt-codegen? They're primarily for interactive use and I wouldn't necessarily encourage people to use in an unmonitored production environment, especially not codegen. I am particularly uncomfortable with the editorialising of "if it's not ready for prod it's not ready for the Hub". Can we meet in the middle and put an addendum on the "should follow SemVer" along the lines of "Note in particular the recommendation for production-ready packages to be version 1.0.0 or above"? |
Production-ready packages
This would be movement in the right direction. Would it end up looking something like this? ### Releases and updates
- Packages' git tags MUST validate against the regex defined in [version.py](/hubcap/version.py).
- Packages SHOULD follow the guidance of [Semantic Versioning 2.0.0](https://semver.org/spec/v2.0.0.html).
- Note in particular the recommendation for production-ready packages to be version `1.0.0` or above Right-shifted SemVer
What constitutes being "prod"? 🤷 To the best that I can tell, some or all of those projects follow "right-shifted SemVer" -- just shift everything one separator to the left to recover the If that is the case, then there's "major" releases hiding in there along with an implied public API -- they could have just started out at Starting out using ZeroVer probably has measurable gravity that is hard to escape. Interestingly, the SemVer FAQ explains scenarios when the bump to 1.0.0 is later than it should be. It doesn't explain scenarios when it is too early though. |
yeah I think you’re probably right that it's mostly just right shifted semver. im having horrible flashbacks to high school calculus when we learned about integration and always had to remember the I feel good about this now - we're encouraging good behaviour for new packages, we can gently nudge the existing ecosystem to make better choices over time, but we're not marking the entire package hub as non-compliant which feels a bit rough. I also suspect that this is another place where we were following in dbt Core's footsteps: if it had started at v1, would all the packages be zero-based? I doubt it. |
- The documentation SHOULD indicate which data warehouses are expected to work with this package. | ||
|
||
### Customisability | ||
- Packages MUST NOT hard-code table references, and MUST use `ref` or `source` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking commentary.
The current text definitely conveys the spirit of the guidance, but an overly lawyerly maintainer could hard-code a view reference and claim full compliance.
Two opportunities for further clarity:
- table → relation
- use applicable adapter methods rather than directly utilizing
describe
,show
, orinformation_schema
I'm fuzzy on the detailed guidance to accomplish the latter, so that's why I don't have an explicit suggestion that covers both opportunities.
Our glossary doesn't include an entry "relation" currently, it only has the following disparate entries:
- https://docs.getdbt.com/terms/table
- https://docs.getdbt.com/terms/view
- https://docs.getdbt.com/terms/cte
The closest documentation I could find quickly focuses on the definition and usage of the Relation Python object:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This will add great value.
Approving with the belief and trust that we'll make edits in the future as-needed.
Based on the lessons learned from adding a bunch of packages to the Hub, here's a spec for would-be contributors to validate against before opening a hubcap PR.