-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite DB with SQLAlchemy #190
Conversation
Nice job. Do you think it would make sense to add Alembic as well for in case the DB schema will change among versions? |
Alembic would definitely be a good addition once the schema has settled down. Things are still very, very unstable. |
The most recent commit adds some of the basic functionality: adding tags to entries, creating tags, and modifying tags. Adding/editing/removing tag box/text line fields should also work. It works with multi-selection. Many changes were made to several Qt widgets/modals to support the database types, and there is a significant overhaul of search results to leverage dataclasses instead of ambiguous tuples/lists. There's still a lot left to do to get it back to functional parity. A lot of the codebase relies on mutating/reading what was an always accessible database. That's a lot harder to do now that there is a required interface layer with stricter access. I am still working on how best to resolve that. In many cases, querying the database from Qt is done through the Library object. In some specific cases, like writing preview panel field widgets, a database session is opened in the logic to allow for lazy-loading joins deep in callbacks. I don't like mixing direct database access into the Qt code, but for now, it's the best option due to its simplicity. I'd appreciate any thoughts on whether it's acceptable to open database sessions inside Qt classes. From an MVC viewpoint, it feels wrong - but this codebase isn't really following an MVC pattern. |
I really like that instead of just talking what needs to be done and how it needs to be done, you just go and write the code. And that's much easier to reason about than some abstract plans. So in this regard I fully support this PR. However I dont really have any decisive power in this project, so this doesnt mean that much :) @CyanVoxel - since git exists, it's should be easy to keep a separate branch for bugfixes of the existing release, and this could get merged into the development branch where more people will be exposed to it, and it can be gradually polished until it will fully work as expected. |
Yeah, I wasn't too sure if removing commented out code was a good idea for basically that reason. I tried to only touch comments that were in areas I was actively working in because they sort of got in the visual way, but I probably overreached in that aspect, apologies.
This is still very much a work in progress, a lot of my changes have been sort of ad-hoc. Before considering merging into dev, I would like to continue refining some key aspects, like the Library methods and Session usage. As I refactored, their purposes became slightly muddied and I'd like to make them more unified. Also, a lot of dead code is still sitting in src.core, and there are some globals I need to move to the newer src.core.constants. Small things, but lots of them. |
Of course, I didnt mean to merge it now. I briefly tried to run the code from this branch, and it's not ready for that. I meant this should get merged when it will have feature parity with the current state of |
So do you think the best approach going forward would be to open up a |
I'm on the fence regarding the general dev branch, so I would rather say dont change it if the current setup works. Regarding "dev branch" for the DB migration... No matter which one will get chosen, I'd suggest to merge it when Tags will be working, and the rest can be added later *. However merging it will cause the same scenario as above. Meaning the main branch will basically become a "development" branch unless it will be 100% ready for release. From my point of view it's still more preferable approach than trying to get implemented everything (Collations etc.) before merging it. Because that's doing two things at the same time - migrating from JSON to DB, and then also adding new features which are not currently implemented even in JSON, and looking at the size of both competing PRs, they're already pretty big as they are. * the DB schema will change sooner or later no matter what. And at that point SQLAlchemy(+Alembic) provides better way how to deal with it than anything sqlite can provide (afaik) |
Just wanted to touch base with this PR. A db-migration branch has been opened so this migration can be worked on bit-by-bit without worrying about getting the entire thing functional in a single PR. #187, which had a non-SQLAlchemy approach but a newer planned schema, has been closed in favor of this SQLAlchemy approach. In essence, the planned direction we have is to move forward with SQLAlchemy, the schema shown in #187 (ideally), and on the db-migration branch. Assuming the conflicts are resolved here, I wouldn't mind pulling this PR and using it as a jumping off point. Alternatively, this could be split up into smaller PRs working off the current codebase in order to manage things better. I just wanted to get everyone's thoughts before getting this moving along 🤞 |
Add item delegate for Ignored File Extension to add leading `.` if left off extension
* Added various file formats to constants.py * Update tagstudio/src/core/constants.py Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com> * Update tagstudio/src/core/constants.py Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com> --------- Co-authored-by: Travis Abendshien <46939827+CyanVoxel@users.noreply.github.com>
I went ahead and reverted the majority of the changes for two reasons:
I feel it's best to mostly wipe things, reevaluate/update the schema in this PR, and then reimplement the prior work in following PRs. I don't anticipate reimplementation to take long. One of the major issues I have with #187's schema is the entry_attributes table. Defining a table that is a catch all for a mixture of datatypes does not sit well with me. In my opinion, there should be a Field class, with subclasses for each Field purpose, hence TextField for things like Note, Description, etc., DatetimeField for all date fields, and TagBoxField for meta tags and user tags, or any other tag boxes. Using multiple tables/objects for different field types allows for more concrete typing, which eliminates the need to fuzzily check the field type. For example this pattern used in src.qt.widgets.preview_panel: The other tables I did not implement from #187, entry_page and location, are simple. As it is now, the table declarations are functionally equivalent to #187, with the major difference in Field outlined above. I would be fine with merging this into the db-migration branch as is, if the schema is deemed acceptable. I'd like to hear any thoughts on the schema, and I've added an ER diagram for those unfamiliar with SQLAlchemy below! |
I'll have to do some re-reading on past discussions to confirm but I believe the main goal of the mashup was that basically everything was a tag, so the field Author was really a tag with the name Author that had an associated value for a given entry, and the Meta Tags box was a tag that indicated to the UI that it should render a TagBox with all tags included that had Meta Tags as a The locations table was intended to support a multiple directories feature which should probably be excluded from the first set of changes. Rough Idea for Locations TableFor the locations table the original intent was to support multiple directories for a single Library. Location = {
"id": 0,
"path": "C:\Users\Loran425\Pictures",
"name": "Pictures"
}
Entry = {
"location": 0,
"path": "test.png"
...
} Would refer to the file "C:\Users\Loran425\Pictures\test.png" in my original PR. |
@JoshuaMaddy I dont know if having the field split among multiple tables is the best approach. I understand the issue with type-guessing, but what would you think about having a single table with multiple columns (for each type), and then there would be a column There's also needed a column like
|
To me, this sounds like a common image board approach to tagging, in that each tag has a 'category'. I personally prefer this approach, but I was attempting to emulate the JSON structure which iirc does not treat said fields as tags.
Other image board implementations also use copyright and character categories, but I'm hesitant to include these - feels a bit opinionated as to how the program should be used. There are some optional fields that carry per-entry data, for example date published and notes, that I feel do not lend themselves to a tag representation and should stay as fields. I will change TagBox / Meta TagBox to function as you've described, with the intention of using Tag categories as the filter when displayed.
Thank you for the clarification, I misunderstood the purpose of that table. I agree that as of now it is probably outside of the scope, but should be added in the future. I'd be glad to add it now if desired. |
I personally do not see a benefit to having a monolithic field table, but there is a nice middle ground that SQLAlchemy provides for this exact problem.
This will be implemented with the Ordering List extension on the Entry class. I'll get these changes made today and push a revised schema for further comments! |
I've updated the Schema to use a single table for Fields, Fields are ordered, Tags are joined to Entries, and Tags have categories.
|
A quick mock-up of an SQLite database using SQLAlchemy as the ORM.
My reasoning for suggesting using SQLAlchemy:
Entry.has_tag
/Entry.remove_tag
This is heavily inspired by #187 , but not with the exact same table definitions.
src.database.table_declarations.*
are individual files declaring each table/Python object. Adapted from classes insrc.core.libarary
.src.database.manage
is the bare minimum to create an engine, and to add/drop tables from declarations.src.make_db
is a quick, dirty script to demonstrate how to make an SQLite database from the declarations, then add records as objects and query records as objects.This is just a proof of concept, so there are some notable issues.
crop
,dimensions
, etc are not implemented, but could be casted, stored as a pickle blob, or as JSON.If people like this direction, I'd be glad to flesh it out more! I also get that some devs dislike ORMs, so no worries if that's the case.