Initial database schema #53

mitchdowney · 2024-08-12T19:37:23Z

No description provided.

…gn key

randoneering · 2024-08-16T03:46:15Z

docs/database/0002_account.sql

Using Serial cleans up the setup for 1) auto-increment, 2) NOT NULL and 3) Unique Primary Keys.
For the Foreign Keys (references), the default behavior is that you will not be able to delete a row if its being referenced in another table. There is nothing wrong with this approach, the only thing I would say is if there are going to be frequent deletes to tables with foreign keys, then the deletes have to be done in a specific order, or you can change the behavior of the foreign key to "CASCADE" on delete/update. This will make sure the other references are updated/deleted as well. There is a potential performance impact here.

randoneering · 2024-08-16T03:47:42Z

docs/database/0003_clip.sql

For the account_id column (referencing the ID in account)...I think you have to have the same "ON DELETE CASCADE" behavior in other tables that are also referencing ID in the account table. I will circle back on this

for start_time/end_time-is there a reason why you want to use numeric? Is to store epoch time?

For the account_id column (referencing the ID in account)...I think you have to have the same "ON DELETE CASCADE" behavior in other tables that are also referencing ID in the account table. I will circle back on this

Ah, I am missing all of the ON DELETE CASCADE rules in the account sql file. I will review the full code and try to determine which are missing, and which are unnecessary.

for start_time/end_time-is there a reason why you want to use numeric? Is to store epoch time?

No, that is a good catch. We only need up to 2 decimal points for start/end time.

Since we only need 2 decimal points, I guess we will use the FLOAT type...however, I suppose we could make it a numeric and limit the numeric to 2 decimal points? While the latter seems more efficient, I suspect numeric values maybe quirkier than floats (I think I saw that both 0 and 0.0 are possible separate numeric values), so I'll go ahead with FLOAT unless we can think of good reasons to go with numeric.

Can you not just use the datatype "time" there? its format is 00:00:00.

https://www.postgresql.org/docs/current/datatype-datetime.html

Hmm I hadn't thought of that. I guess we could use it since it has microsecond resolution? The only reason I can think of possibly not however, is that when parsing "time position" values from RSS feeds, or using them in a media player (ex. start playback from 90 seconds in), the values will be integers or decimals, not hh:mm:ss:ms...so I'm guessing we would need to convert those values to and from hh:mm:ss:ms time?

If there isn't an advantage, like a performance benefit, for using datatype "time", then I think I'd prefer to use "float".

randoneering · 2024-08-16T04:07:04Z

docs/database/0004_playlist.sql

I know code reuse is something we all strive for, which 'INHERITS' allows for, but this can cause some performance overhead because you will be querying more than one table. https://www.postgresql.org/docs/current/tutorial-inheritance.html. I generally don't use this in my tables, only because I don't mind writing out the same columns over (and over) again. Sure, it makes for more reading, but it isn't going to cause any potential performance hit that 'INHERITS' does. But, as the documentation says, its great for object-oriented databases. I would suggest keeping this in place for now, but if you feel like you want to start squeezing performance out of the database-I would start here.

I am coming back from speaking with my colleague and I think it would be wise to not use inherit. Here is why;

By using inherit, you are essentially created a link between two tables for the sake of cleaning up code (not having to write out the same code to create the same type of columns in several tables). The postgres documentation actually gives a good example using 'Cities' table and 'Capitals' table. The 'Capitals' table inherits the 'Cities' columns plus addition columns added specific to 'Capitals'. The query hitting the Cities table will then, essentially be querying two tables (not just Cities). If you are to look for cities that are not capitals, you then have to use specific syntax of 'FROM ONLY CITIES'. Seems kind of silly when you could just essentially have a cities table with a boolean column labeled 'capital' with a default of false.

So-TLDR-I would avoid using inherit and just write out the code.

I'm very glad you and your colleagues looked into this. The INHERITS usage is totally new to me, and I'm not sure if it would help or hurt more long run...

Part of why I used INHERITS is not only for code usability, but because we have concepts like a "playlist", and a playlist can have multiple different table types associated with it. For example, a playlist consists of playlist_items, and each playlist_item can have a foreign key relationship with 4 different data types: item, clip, chapter, soundbite.

I was thinking that if we use inherits, then when it is time to query the database for ALL of the playlist_items for a account's/user's playlist, then we could query the playlist_item_base table using the playlist_id, and use UNION so each playlist_item returned would select the different columns that correspond with the different tables that inherit playlist_item_base.

Does that make sense, or am I making incorrect assumptions? Is there another / better way to do this besides this INHERITS plus UNION approach?

(P.S. I know the table names may semantically sound like they can be consolidated into one table, like why can't "clip" and "soundbite" be the same table? But these are distinct entities according to RSS spec, so I'd suggest ignoring the apparent similarities for now.)

I think I mentioned this in the matrix channel, but I think the idea would be less on the table setup and more in the query to get the data you need. Because you can get the same values using joins, and if there are indexes on those tables, the performance should be better than using INHERITS .

For example

SELECT a.column, b.column FROM table_1 a INNER JOIN table_2 b on a.id = b.id WHERE a.column <> 42

But this would mean your sql queries will need to do the work with the joins, and not using the backend "magic" of INHERITS

Ok, if the performance should be better than using inherits, we're definitely trying to choose performance wherever possible. If not using inherits allows that, I'd like go with that option, but I'd like to explain the use case one more time to be sure, before I stop using inherits.

Our playlists will have "playlist items" in them like this, of 5 potentially different types:

clip

item (episode)

chapter

soundbite

item (episode)

chapter

clip

add_by_rss_item (episode)

etc

We'll need to make paginated queries, for example for the first 20 of those playlist items for a specific playlist, ordered by the position column.

Also, when updating a position of a playlist item (ex. the playlist item at position 7 moves to position 3), we will need to get the playlist items with the next highest and lowest position values, so that the playlist item in the new position can be given a position value in between those 2 values.

If there is a playlist_item_base table, then I was thinking it might be more performant to query that table for the items ordered by position, rather than querying all 5 of the playlist item tables, and ordering the items by position, and then somehow stitching the requested range (ex. the next 20 playlist items by position from a playlist) of items together and returning them.

Do you think that querying the 5 tables separately could be more performant than a base table for this purpose? (Part of my problem may be that I don't yet know what a query that can stitch and return this paginated data together would look like.)

Id say stick with the inherit since you have already been investing in the design (in time and effort). What we could do down the line would be to put time into writing the query to get what you need and compare the performance. Ultimately, you want to do something that not just you will be able to understand, and inherit gives people that breadcrumb trail. Keep on keeping on.

randoneering · 2024-08-16T04:14:33Z

docs/database/0005_now_playing_content.sql

+
+CREATE OR REPLACE FUNCTION check_account_id_uniqueness()
+RETURNS TRIGGER AS $$
+BEGIN


I think this is a good use case for a UNION instead of a JOIN here. Nice work.

Thank you :)

That said...we are actually discussing possibly merging all 3 of these tables...

history
now_playing
queue

And making them one table, where the position would be negative for history items, 0 for now playing item, and positive for upcoming queue items.

If that's the case, then I think we may not need a union like this.

randoneering · 2024-08-16T04:17:27Z

docs/database/0006_queue_content.sql

+CREATE TABLE queue_content_base (
+    id SERIAL PRIMARY KEY,
+    account_id INTEGER NOT NULL REFERENCES account(id) ON DELETE CASCADE,
+    position numeric_20_11 NOT NULL


Again-for the numeric data type. Are you anticipating the numbers to be large for position of "queued content?" What would this table hold specifically? Queued uploads to rss feeds? Just trying to clarify things for myself !

Good question, and I got the idea for this blog post. @suorcd and I have discussed this quite a bit, and aren't 100% set on if we want to go this way or not.

The reasoning is basically 1) using integer for position can lead to problems, as reordering items in the list means every item after that item also needs its position updated. 2) using a decimal column allows you to reorder by fitting a reordered item between 2 positions, by calculating the middle point between the two.

An alternative to this could be a jsonb column that maintains the position order data on a higher order table (like playlist), which may be doable for our use case, but we'll lose some referential benefits.

In this particular case, I think you are good to use the numeric datatype. Looking at this again, it makes more sense from a positioning perspective.

randoneering · 2024-08-16T04:18:42Z

docs/database/0001_init_podcasting_20_database.sql

+    ('mixed'), ('podcastL'), ('musicL'), ('videoL'), ('filmL'), ('audiobookL'), ('newsletterL'), ('blogL'), ('publisherL'), ('courseL')
+;
+
+CREATE TABLE medium_base (


AH there we go-found the table :) Ignore my comment about missing the 'Medium' table

randoneering

I need to still review the 0001_init_podcasting_20_database.sql file. I will do this tomorrow if I get a chance.

randoneering · 2024-08-16T20:01:29Z

docs/database/0000_init_helpers.sql

After speaking with my colleagues, what you are doing here makes sense-you are enforcing constraints/checks on commonly used columns that will be amongst your tables. You are essentially creating 'global check constraints.' So, in my eyes, its a great way to keep data integrity. You could also do the same thing per table, setting constraints on the columns by each table (instead of doing the domain). However, its just another way to 'skin a cat'. This just makes it easier across the board.

Some examples my colleague brought up was enforcing format of birthdays or email addresses.

Thanks for checking, and yes exactly. A DOMAIN like varchar_url I think is convenient because we can easily ensure each url field is limited to 2083 characters and checks it begins with http:// or https://

…_db container

…e.yml

mitchdowney added 22 commits August 12, 2024 02:12

Continue adding init database sql, and add inheritance tables

2d57521

Continue adding initial database schema

6f21f91

Rename Podcasting 2.0 database init file

b18ccdf

Add categories and missing NOT NULL to relationships

173ef69

Add account table

b156bd9

Set varchar_long to 2500; add chapter hash column

3f4d5c3

Add unique id to inherits tables that can only have one row per forei…

ddbc6f2

…gn key

Add clip table

f9f08c0

Add sharable_status to account and clip

f33f1c0

Fix item_chapter reference

3c1001a

Add playlist and playlist_item tables

bcaa8d8

Rename clip table to item_clip

cb89380

Remove extra linebreaks

858c505

Change short_id to short_id_v2

ee46019

Rename item_clip to clip

54b4edb

Add playlist_item_item_add_by_rss table

02436e6

Rename playlist_item to playlist_content

b9eb4a6

More initial helpers to their own sql file

4478fe3

Add updated_at handling to a couple columns

e7911f3

Add now_playing_content table

075f34d

Add queue_content table

1c45c62

Fix invalid table names

59f6827

randoneering reviewed Aug 16, 2024

View reviewed changes

mitchdowney added 30 commits August 29, 2024 15:43

Remove unnecessary channel_value_time_split tables

4761e32

Change medium_value to medium

231435b

Change medium_value to medium; add guid_enclosure_url to item

3252aec

Update comment

4e8887f

Add item_chapters_feed_log

1ba48d3

Remove unused item_chapters_feed log columns

e105f5c

Add missing column to item_Person and item_season_episode

9b293ed

Remove url column from item_soundbite

23af605

Use numeric instead of integer for player times

571438a

Fix typos in "recipient" spelling

c441896

Update item_season_episode table

8915aa8

Make protocol column required in chat tables

dae2c66

Add last_parsed_file_hash column to feed table

e5f05bf

Fix item_chapter column name id_text

336c3bc

Remove item_chapter_image table

15bb606

Update feed_log

868ae77

Add podverse_queue and podverse_workers to docker compose

4a30352

Rename podverse_queue to podverse_amqp

908d3d7

Remove podverse_workers from makefile

a540f55

Change podverse_queue to podverse_amqp

a38ae57

Add podverse_network to docker-compose file

4b51c3e

Add podverse_workers_debug

9e9d519

Add npm-link-modules.sh and add example env files

d206bac

Add dev/local-dev-setup.md; Attach init_database.sql file to podverse…

f5876b5

…_db container

Update env examples; update npm-link-modules.sh; update docker-compos…

c73a129

…e.yml

Update npm-link-modules.sh

0f90c1c

Add foreign key indexes

dc82f5a

Add comments to npm-link-modules.sh

250e6dd

Update npm-link-modules

608c460

Fix OneToOne unique relationships; add indexes to remote_item tables

bf2403f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial database schema #53

Initial database schema #53

mitchdowney commented Aug 12, 2024

randoneering Aug 16, 2024

randoneering Aug 16, 2024

randoneering Aug 16, 2024

mitchdowney Aug 16, 2024

randoneering Aug 18, 2024

mitchdowney Aug 18, 2024 •

edited

Loading

randoneering Aug 16, 2024 •

edited

Loading

randoneering Aug 16, 2024

mitchdowney Aug 16, 2024

randoneering Aug 18, 2024

mitchdowney Aug 18, 2024 •

edited

Loading

randoneering Aug 22, 2024

randoneering Aug 16, 2024

mitchdowney Aug 17, 2024

randoneering Aug 16, 2024

mitchdowney Aug 17, 2024

randoneering Aug 18, 2024

randoneering Aug 16, 2024

randoneering left a comment

randoneering Aug 16, 2024

mitchdowney Aug 17, 2024

Initial database schema #53

Are you sure you want to change the base?

Initial database schema #53

Conversation

mitchdowney commented Aug 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchdowney Aug 18, 2024 • edited Loading

Choose a reason for hiding this comment

randoneering Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchdowney Aug 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

randoneering left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mitchdowney Aug 18, 2024 •

edited

Loading

randoneering Aug 16, 2024 •

edited

Loading

mitchdowney Aug 18, 2024 •

edited

Loading