-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(guides): update json database guide #1086
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,27 +2,36 @@ | |
title: Why JSON is unsuitable as a database | ||
description: The many reasons why you shouldn't use JSON as a database, and instead opt for SQL. | ||
relevant_links: | ||
Tips on Storing Data: https://tutorial.vco.sh/tips/storage/ | ||
Tips on Storing Data: https://tutorial.vco.sh/tips/storage/ | ||
--- | ||
|
||
JSON, quite simply, is not a database. It's not designed to be a data storage format, | ||
rather a wayof transmitting data over a network. It's also often used as a way of doing configuration files for programs. | ||
|
||
There is no redundancy built in to JSON. JSON is just a format, and Python has libraries for it | ||
like json and ujson that let you load and dump it, sometimes to files, but that's all it does, write data to a file. | ||
There is no sort of DBMS (Database Management System), which means no sort of sophistication in how the data is stored, | ||
or built in ways to keep it safe and backed up, there's no built in encryption either - bear in mind | ||
in larger applications encryption may be necessary for GDPR/relevant data protection regulations compliance. | ||
|
||
JSON, unlike relational databases, has no way to store relational data, | ||
which is a very commonly needed way of storing data. | ||
Relational data, as the name may suggest, is data that relates to other data. | ||
For example if you have a table of users and a table of servers, the server table will probably have an owner field, | ||
where you'd reference a user from the users table. (**This is only relevant for relational data**). | ||
|
||
JSON is primarily a KV (key-value) format, for example `{"a": "b"}` where `a` is the key and `b` is the value, | ||
but what if you want to search not by that key but by a sub-key? Well, instead of being able to quickly use `var[key]`, | ||
which in a Python dictionary has a constant return time (for more info look up hash tables), | ||
you now have to iterate through every object in the dictionary and compare to find what you're looking for. | ||
Most relational database systems, like MySQL, MariaDB, and PostgreSQL have ways of indexing secondary fields | ||
apart from the primary key so that you can easily search by multiple attributes. | ||
JSON (JavaScript Object Notation) is commonly used for data interchange, but it's not a database solution. SQL (Structured Query Language) offers better alternatives due to the following reasons: | ||
|
||
## Data Integrity and Validation: | ||
|
||
JSON lacks predefined schemas and validation checks, leading to inconsistent and invalid data. | ||
SQL databases enforce data integrity through structured schemas and data type constraints. | ||
|
||
## Querying and Indexing: | ||
|
||
JSON databases lack efficient querying and indexing mechanisms, making data retrieval slow. | ||
SQL databases excel at quick data retrieval with optimized indexing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would expand here on the fact that JSON - or well, Python's representation of it, a dict - allows efficient access to single keys and that's about it. You can query by key, but anything else will by default need some advanced querying. As an example, we could add that to find any user whose account is more than 1 year old, you would need to write a function which does nothing but loop through all users to find that, whilst in SQL it's a one-liner query (and you can index it). |
||
|
||
## Complex Queries: | ||
|
||
JSON databases struggle with complex queries, lacking features like JOINs and aggregations. | ||
SQL databases support advanced querying, enabling complex data operations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Especially in regard to the previous point we should explain the benefits for secondary indexes and their queries here. |
||
|
||
## ACID Transactions: | ||
|
||
JSON databases often lack proper transaction support, compromising data consistency. | ||
SQL databases follow ACID principles, ensuring reliable transactions even during failures. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should explain here: why do we want transaction support? What benefits does it have? An example would be an application with a large JSON file that crashes while it's writing out a new version. What will you end up with: the old data, a half-overwritten file, a half-written half-empty file, an empty file? With SQL databases and their transactions, the behavior on power loss or other fatal crashes is well defined. |
||
|
||
## Scalability: | ||
|
||
JSON databases face scalability challenges due to limited indexing and querying capabilities. | ||
SQL databases offer better scalability options, including horizontal scaling. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can expand on this a bit: how does horizontal scaling help SQL databases more than JSON? For instance, we could write that SQL databases can hold their indices and hot data in main memory allowing upgrading of main memory size for good performance gains, while JSON files have no built-in way to index data, so the main option you would have is loading the entire document into memory, which is a lot more inefficient. |
||
### Conclusion: | ||
|
||
JSON's flexibility suits data exchange, but its shortcomings in data integrity, querying efficiency, transactions, and scalability make it unsuitable for robust databases. SQL databases, with structured schemas, powerful queries, ACID transactions, and scalability, provide better solutions for data-intensive applications. When choosing a database solution, consider your project's needs and the limitations of JSON, favoring SQL where appropriate. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One point we could add: with the default naive way of writing JSON (and I'm not aware of any trivial, more efficient alternatives), you need to read in and write out the entire JSON file every time you make a change. Grossly inefficient and prone to fatal problems when your app crashes at the wrong time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add an example here. How does SQL enforce data integrity? Which forms of bad data does it help against? For instance, in JSON (unless you write your own validation routine, which takes a lot of time and care), you can omit a key or set someone's age to
"581 years"
and it won't bat an eye. If you're unlucky, you will only catch it later when your app crashes. In an SQL database you would sayage
is a small integer and add a check constraint to ensure that age is between 0 and 120.