-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(guides): update json database guide #1086
Conversation
✅ Deploy Preview for pydis-static ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pull request.
Overall I am a fan of the format. However, given the audience I think we should expand a bit on the individual points and add examples to explain, for instance, why do I want data consistency, integrity, etc..
JSON databases face scalability challenges due to limited indexing and querying capabilities. | ||
SQL databases offer better scalability options, including horizontal scaling. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can expand on this a bit: how does horizontal scaling help SQL databases more than JSON? For instance, we could write that SQL databases can hold their indices and hot data in main memory allowing upgrading of main memory size for good performance gains, while JSON files have no built-in way to index data, so the main option you would have is loading the entire document into memory, which is a lot more inefficient.
JSON lacks predefined schemas and validation checks, leading to inconsistent and invalid data. | ||
SQL databases enforce data integrity through structured schemas and data type constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add an example here. How does SQL enforce data integrity? Which forms of bad data does it help against? For instance, in JSON (unless you write your own validation routine, which takes a lot of time and care), you can omit a key or set someone's age to "581 years"
and it won't bat an eye. If you're unlucky, you will only catch it later when your app crashes. In an SQL database you would say age
is a small integer and add a check constraint to ensure that age is between 0 and 120.
JSON databases lack efficient querying and indexing mechanisms, making data retrieval slow. | ||
SQL databases excel at quick data retrieval with optimized indexing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expand here on the fact that JSON - or well, Python's representation of it, a dict - allows efficient access to single keys and that's about it. You can query by key, but anything else will by default need some advanced querying. As an example, we could add that to find any user whose account is more than 1 year old, you would need to write a function which does nothing but loop through all users to find that, whilst in SQL it's a one-liner query (and you can index it).
JSON databases often lack proper transaction support, compromising data consistency. | ||
SQL databases follow ACID principles, ensuring reliable transactions even during failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should explain here: why do we want transaction support? What benefits does it have? An example would be an application with a large JSON file that crashes while it's writing out a new version. What will you end up with: the old data, a half-overwritten file, a half-written half-empty file, an empty file? With SQL databases and their transactions, the behavior on power loss or other fatal crashes is well defined.
JSON databases struggle with complex queries, lacking features like JOINs and aggregations. | ||
SQL databases support advanced querying, enabling complex data operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially in regard to the previous point we should explain the benefits for secondary indexes and their queries here.
|
||
### Conclusion: | ||
|
||
JSON's flexibility suits data exchange, but its shortcomings in data integrity, querying efficiency, transactions, and scalability make it unsuitable for robust databases. SQL databases, with structured schemas, powerful queries, ACID transactions, and scalability, provide better solutions for data-intensive applications. When choosing a database solution, consider your project's needs and the limitations of JSON, favoring SQL where appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point we could add: with the default naive way of writing JSON (and I'm not aware of any trivial, more efficient alternatives), you need to read in and write out the entire JSON file every time you make a change. Grossly inefficient and prone to fatal problems when your app crashes at the wrong time.
Thank you for the update. Did ChatGPT help you? |
Not gonna lie it is helpful in such things. |
Thank you for your pull request. I'm not convinced there is much value in providing a guide that parrots what an AI has said on the topic. If all our guides were written like this, I am not sure there would be any value in reading them at all. Content written by ChatGPT reads very much like SEO spam - probably because of its training set. But outside of the overall content format, a lot of content is just duplicated altogether, and some of the content is just completely wrong. These pages are written primarily by our helpers, who pour a lot of time and effort into writing clear, understandable, approachable, and most importantly, factually correct guides. I cannot say the same about an AI that replicates what it finds on the internet. |
@jchristgit Personally I modified the text I was given and added scenarios and examples. Read it and approved it myself. It was making sense in my head but I guess I missed the point of this being understable by beginners. My bad I can rework it if you wish so. |
Proposed in python-discord/meta#217