-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add utils for working with Spark Plan #159
Conversation
- Return Spark Plan as a string - Try to estimated the size of DataFrame On branch feature/plan-utils Changes to be committed: new file: quinn/plan_utils.py
This is cool. I think we should add these APIs as "experimental". From what I've seen, these plans change arbitrarily over time. This code will likely break as time goes on. I don't think that's an issue if we have the experimental annotation in the docs. I'm not sure if Looks like we need a Cool work!!! |
On branch feature/plan-utils Changes to be committed: new file: quinn/experimental/__init__.py renamed: quinn/plan_utils.py -> quinn/experimental/plan_utils.py
On branch feature/plan-utils Changes to be committed: modified: quinn/experimental/__init__.py modified: quinn/experimental/plan_utils.py
@MrPowers Kindly reminder |
@MrPowers Should we close it without merging? |
Closed as very unstable API |
Two new functions:
On branch feature/plan-utils
Changes to be committed:
new file: quinn/plan_utils.py
The function, that returns the plan works like this:
The difference with
df.explain
is that our function return string that may be parsed. It is a small function, but it may be used, for example, for generation of data lineage graph (when we are trying to get dependencies on the level of each column).The function, that estimate size in bytes works like this:
This functional is really tricky, I do not know another way to estimate the size. It is important, for example, when we need to estimate the amount of resulting partitions. Or we may use to understand where we can apply broadcast hints, etc.
Because it is absolutely new API, any feedback will be cool!