-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing performance #3034
Parsing performance #3034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is huge. I took a first quick pass and left a few comments before signing out for the night.
from yaml import ( | ||
CLoader as Loader, | ||
CSafeLoader as SafeLoader, | ||
CDumper as Dumper | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's definitely document how folks can try installing the C version to take advantage of these speedups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will write up some documentation. That would go in docs.getdbt.com?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!
core/setup.py
Outdated
'logbook>=1.5,<1.6', | ||
'typing-extensions>=3.7.4,<3.8', | ||
'jsonschema', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Woah... are we done with
hologram
entirely?? - We'll need
dbt-mashumaro
here somewhere. I thinkdependency_links
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed the Hologram code in dbt.dataclass_schema and updated Hologram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a FEAT! Really incredible work here @gshank. The numbers speak for themselves and I am so impressed at the performance perf gains we're making here.
I dropped a couple of questions inline, and some other flags for things I want to revisit with fresher eyes. Nice work so far!
@@ -454,182 +440,46 @@ class TestConfig(NodeConfig): | |||
severity: Severity = Severity('ERROR') | |||
|
|||
|
|||
SnapshotVariants = Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 👋 👋
eebcb32
to
6b8eeae
Compare
6b8eeae
to
ba8d1ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I don't have much feedback. I like the seperation of data validation and serialization! There are a few places where we lose typing, but I think we can trade that for readability. Let's figure out how we should organize dbt-mashumaro in order package this up when we get there. I'm requesting changes because I know some things are still in flux (?) so I want to know when I can take another pass of reading through this, some pieces of this are a black box to me still :).
@@ -13,3 +13,5 @@ mypy==0.782 | |||
wheel | |||
twine | |||
pytest-logbook>=1.2.0,<1.3 | |||
git+https://github.com/fishtown-analytics/dbt-mashumaro.git@dbt-customizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will want to move this into setup.py before distrubuting obviously, I'm happy to follow up after the PR with how we should package this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We'll also have to do a new release of Hologram, which I added to dev_requirements.txt for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This is the comment I'm still tracking. It's not possible to install from setup.py
on this branch. We need an answer here before we can put this out as a beta version, which I'd like to do as soon as possible. Could one of you open a follow-on issue?
Unless I'm mistaken, it also means we can't tell people to test this out (after merging) via:
pip install git+https://github.com/fishtown-analytics/dbt.git
76c17d5
to
bb8fe42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved on my end so as to not be a blocker to merge, but would definitely love to:
- Ensure that we're encoding timestamps in the ISO 8601 format (sounds like you're already on this one @gshank ?)
- Get to a good place w/ the mashurmaro fork and some other deps (i see that jsonschema is unpinned, for instance)
Given that we're early in the release cycle for Margaret Mead, I don't feel strongly about how much of this happens in this PR vs. a future one. Really excited about the perf gains here!
de91253
to
e9f9cb6
Compare
233065e
to
9448675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Incredible work here @gshank. Thanks both @drewbanin and @kwigley for the detailed reviews.
The big thing on my mind is (unsurprisingly) distribution. I'd love to get this into beta-testers' hands as quickly as possible.
@@ -13,3 +13,5 @@ mypy==0.782 | |||
wheel | |||
twine | |||
pytest-logbook>=1.2.0,<1.3 | |||
git+https://github.com/fishtown-analytics/dbt-mashumaro.git@dbt-customizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This is the comment I'm still tracking. It's not possible to install from setup.py
on this branch. We need an answer here before we can put this out as a beta version, which I'd like to do as soon as possible. Could one of you open a follow-on issue?
Unless I'm mistaken, it also means we can't tell people to test this out (after merging) via:
pip install git+https://github.com/fishtown-analytics/dbt.git
mapped_fields in the classes for 'from_dict', removing deepcopy on fqn_search, separating validation from 'from_dict', and special handling for dbt internal not_null and unique tests. Use TestMacroNamespace instead of original in order to limit the number of macros in the context. Integrate mashumaro into dbt to improve performance of 'from_dict' and 'to_dict'
9448675
to
e5568a1
Compare
Resolves #2862 (partially -- there is still more performance work to come)
This work implements a number of changes to improve performance, including supporting libyaml, caching mapped_fields in the classes for 'from_dict', removing deepcopy on fqn_search, moving validation out of the 'from_dict' methods, amd special handling for dbt internal not_null and unique tests. Uses a new TestMacroNamespace instead of original in order to limit the number of macros in the context.
In addition it replaces the Hologram 'from_dict' and 'to_dict ' calls with Mashumaro calls, which compiles those methods into each subclass.
Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.