-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
visualization of pyjanitor chained method calls #1176
Comments
I was unable to find the way to attach labels. My intent was to label this issue as 'disscussion-needed' |
Hl @asmirnov69! Thanks for posting this issue. It's a really cool idea! I especially like being able to visualize both pandas and janitor functions simultaneously. Because the POC implementation is a tad complex, would you be kind enough to point out where the key changes were needed in order to enable keeping track of which function/method calls were made? I read through the code, but was confused, as I didn't see something like a globally-instantiated NetworkX graph object (or analogous thing). Additionally, I saw the use of a new I'd love to see this functionality implemented. We could probably house this in
|
There is a reference from sklearn's Pipeline
|
Hi @ericmjl
As for housing of viz in pyjanitor - I think it make sense to consider as main approach since visualization suppose to come as set of in-the-box features for all users. In any case it will be up to you to decide based upon the progress of some sort of parallel development.
PR approach seems to work well for bug fixes and small features additions. I think we can plan to use PRs. However I would expect we will need to agree on long-term existence of some parallel feature branches with separate limiter releases.
after your revelation elsewhere that you are using Obsidian I decide to take second look at that system. As result all my notes both for office work and home are now in various obsidian vaults. I will provide more answers a bit later. Some of the answers actually belong to project documentation so clarity on documentation approach would be nice to have. Thanks again for your support, I really appreciate that. |
@Zeroto521 thanks for posting on viz available in sklearn. pyjviz-poc proposal goes further in this direction. Main idea is to use RDF as data format to capture details needed for visualization and other uses. Visualization itself can be made as independent component as result of that. |
@ericmjl Hi, I put additional answers into https://github.com/asmirnov69/pyjviz-poc/blob/main/docs/pyjviz-poc/Q%26A.md One thing to mention here: I renamed ChainedMethodsPipe to ChainedMethodsCall. It was actually original name which somehow didn't get to proposal final commit. I think pipe as a term is overused so ChainedMethodsCall looks better and more relevant. Let me know if you think we should still use ChainedMethodsPipe. |
Hi @ericmjl I would suggest to introduce the python module pyjviz which will use pyjanitor as required dependency. I've made some initial experiments and looks like it is possible to do it this way. pyjviz will give users ability to visualize pyjanitor code in a manner close to POC examples There are changes in the way how visualization will look like from code perspective. It will make current examples and docs useless. All new examples and docs will be in pyjviz. POC repo pyjviz-poc will be archived in next few days. I start working on pyjviz and will report on the progress. I hope to get something done this week so we can resume discussions on how this new module will look like. How exactly we can communicate on pyjviz dev efforts? Right now only available option is this current issue. Would it make sense to move further discussion elsewhere? |
@asmirnov69 thank you for giving it thought! I think it's a good idea. I apologize for silence on my side, we've been busy with physical health issues and our 2nd baby's arrival. I've been thinking about the development and pyjviz, and having a separate repo with an independent space makes a lot of sense. In that way, you can really drive forward and own the problem space without being too burdened by the existing code base. Would you like a repo space under the |
@ericmjl Best wishes for your new baby and your family!
Yes, new repo in pyjanitor-devs for the visualization project would be the best. Also I agree to join pyjanitor dev team as open source contributor. Feel free to pick better name than pyjviz. Let me know when I can start using that new repo and what are the requirements. I plan to close this issue and archive pyjviz-poc repo. |
Thank you, @asmirnov69! I have added you as a core dev, you should be getting an invite shortly. I'm not sure what a better name would be, so we can go with pyjviz if you'd like. One things I hope to encourage as a standard is the use of continual testing, linting, and automated publishing of docs. We can slowly make that happen; there's enough of a great pattern accumulated by other contributors over the years in the pyjanitor repo that we can copy over. The delivery is today, I will be offline for a bit. In the meantime, could you send me an email at ericmajinglong@gmail.com? (Short-whale, which I usually use, is down.) I will also send you a link to join our discord chat room. |
new repo pyjviz will be used for further development of ideas described above |
Brief Description
I'd like to propose some new features to pyjanitor with focus to visualization and data organization. I've made proof-of-concept repo to explain what exactly is the proposal contains: https://github.com/asmirnov69/pyjviz-poc
To be really brief - this example of pyjanitor example with corresponding diagram
Examples
more examples are here
A bit more about what is goal beyond immediate focus
Immediate focus is to have png diagram files generation working using rdflib with provided SPARQL and graphviz. There is a bigger idea of rdf logs and similarily collected data to be stored in graph database. This would be database of research&production activity which uses SPARQL and/or opencypher to provide the way to connect collected data to other knowledge graph systems (e.g. Obsidian).
The text was updated successfully, but these errors were encountered: