Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc_id is preemptively converted to string before being passed to storage class #459

Closed
cdwilson opened this issue Jan 30, 2022 · 2 comments

Comments

@cdwilson
Copy link

As someone who is brand new to TinyDB, my naive assumption was that when writing to storage, the doc_id would be passed to the storage class keeping the document_id_class type intact, and it would be the responsibility of each storage class to convert it into a type that is compatible for that storage type.

However, I was surprised to find that the Table class preemptively converts the doc_id to a string before passing the table to the storage class for writing:

tinydb/tinydb/table.py

Lines 727 to 736 in 967dbf9

# Convert the document IDs back to strings.
# This is required as some storages (most notably the JSON file format)
# don't support IDs other than strings.
tables[self.name] = {
str(doc_id): doc
for doc_id, doc in table.items()
}
# Write the newly updated data back to the storage
self._storage.write(tables)

I'm curious if there is a reason why this conversion needs to happen in the Table._update_table() method, or whether the conversion could be delegated to the storage class instead?

For example, the YAMLStorage class described in Write a Custom Storage supports writing a YAML document directly to storage with integer document IDs. Making the following change to the Table._update_table() method enables the YAMLStorage storage class to store a YAML document with integer document IDs:

         # Perform the table update operation
         updater(table)
 
-        # Convert the document IDs back to strings.
-        # This is required as some storages (most notably the JSON file format)
-        # don't support IDs other than strings.
-        tables[self.name] = {
-            str(doc_id): doc
-            for doc_id, doc in table.items()
-        }
+        tables[self.name] = table
 
         # Write the newly updated data back to the storage
         self._storage.write(tables)
import yaml
from tinydb import Query, TinyDB
from tinydb.storages import Storage

class YAMLStorage(Storage):
    def __init__(self, filename):
        self.filename = filename

    def read(self):
        with open(self.filename) as handle:
            try:
                data = yaml.safe_load(handle.read())
                return data
            except yaml.YAMLError:
                return None

    def write(self, data):
        with open(self.filename, "w+") as handle:
            yaml.dump(data, handle)

    def close(self):
        pass

db = TinyDB("db.yml", storage=YAMLStorage)
db.insert({"name": "John", "age": 22})
_default:
  1:
    age: 22
    name: John
@MrPigss
Copy link
Contributor

MrPigss commented Jan 31, 2022

I agree with @cdwilson ,

You can choose the type of the doc_id by setting doc_id_class, but it get converted to string anyway.
Why not let you choose the type of doc_id you actually want to write?
Also even when working with the json format, the moment you choose to use something other than the standard lib json u probably can use other types as key as well.
For example orjson has a nonstringkey option wich allows it to parse things other than string.

Anyhow, this seems more of a job for the storage than for the Table class.

@msiemens
Copy link
Owner

Short note: I'm moving this to a discussion as I feel that the discussions board is a better place to, well, discuss this topic 🙂

Repository owner locked and limited conversation to collaborators Feb 14, 2022
@msiemens msiemens converted this issue into discussion #466 Feb 14, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants