-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OSPP2023] Refactor OpenDigger by sql query builder #1265
Comments
Hello @xgdyp, I am TianchenZhou from East China Normal University, and I am very interested in this project. I learned that opendigger is an open source analysis report project, which aims to combine the wisdom of global developers to jointly analyze and insight into open source related data, to help everyone better understand and participate in open source. This project makes me curious and eager to contribute to it. Therefore, I would like to ask you, what pre-tasks can I do to help me start my contribution journey? Or, what preparations can I do to increase my chances of being selected? Thank you very much for your time and guidance. I look forward to your reply!
|
I would like to use PyPika as a reference to implement a python sql builder. PyPika is a simple and flexible SQL query builder that supports multiple database platforms and analytical features. I think PyPika is a good model because it reflects the richness and logic of the SQL language, rather than simply concatenating strings. Although PyPika cannot fully support clickhouse queries, it already covers most of the required features. I strongly agree with using sql builder to construct queries, because it can improve the readability and maintainability of the code, and also avoid security risks such as SQL injection. Therefore, I would like to learn more about PyPika’s design and implementation, and communicate with you about my work plan and progress. Could you please give me some guidance or suggestions, so that I know what standards and levels my work should achieve? |
Hi, welcome, can you use pypika first to see how much it supports clickhouse? |
some information in this link |
OK, I'll report what I learn later. |
@xgdyp There are multiple queries that call this function: open-digger/python/metrics/basic.py Lines 241 to 254 in b35b78d
This function uses clickhouse's built-in function
Footnotes |
@xgdyp I need further support on below, I hope you can tutor me or give me a little help. I need a few simple SQL to use as my example. Most SQLs in this codebase are contained in string templates, I need complete SQL for tests, so please inform me where I can find some, or offer me some if possible. |
Nice work. You can use sample data as your test playground. For SQLs, you can also find in js kernel. Just print the spliced query. In OpenDigger, we don't use a lot of clickhouse feature syntax(maybe I agree with you that the first thing we should do is to find out whether it is feasible to use pypika. And you can try to run OpenDigger first, feel free to ask me if you have any questions . |
@xgdyp It's so hard for me to reform the original SQL from these string templates, could you please point me to some simple ones? |
Here are some steps for your onboarding:
|
Thanks, I'll start from here. |
The main task of python refactoring is to organize open-digger code python kernel in a more object-oriented way. Recently, I have been learning about python design patterns, but I don't think there is a suitable design pattern for open-digger. In the refactoring, I only used the singleton pattern in the instantiation of the database. In addition, I did the following:
Changes after refactoring:
Thanks very much for @xgdyp's guidance and help. |
By now, the unimplemented chaoss metrics in the python kernel have been implemented and refactored. In #1375, I have put all the changes in the OSPP in a new folder called |
Description
Description:Complex and long SQL is difficult to maintain, this is not conducive to the development of the project. One of the ways to solve this problem is to create SQL through SQL builder, which can help us reduce the difficulty of reading SQL, especially subquery. So we want to explore whether there is a mature framework that can help OpenDigger.
In Python, we can use pypika which is a python pkg supporting some of clickhouse sql syntax. So what we need to do is to investigate in detail whether it can cover our sql, and whether it will really improve OpenDigger.
If yes, we can refactor Python kernel first as OSPP task. I think I can follow up on this project.
Expected outcomes:
metrics generated by python sql builder
Skills:
python typescript javascript SQL
references:
https://github.com/didi/gendry
https://github.com/sqlkata/querybuilder
https://github.com/ibis-project/ibis
https://github.com/doug-martin/goqu
The text was updated successfully, but these errors were encountered: