feat: Support for additional metadata and multi-value metadata (genre, composer) #1852

chme · 2025-02-01T13:22:58Z

This PR are is a poc / rfc.
Its purpose is to revive the discussion about supporting more than one genre/compose for a song #886 and hopefully come to a solution.

It follows what @whatdoineed2do outlined as third option in #886 (comment):

add supplementary meta table that can have linkage back to files entry
this is the one I explored most and is still troubling to resolve. If we keep the main files table as it stands, where each cols like genre continue to hold only one item element and any additional elements can be held in the meta table that could be name-value pairs

As described in the comment a new table is introduced files_metadata that links to the files table and contains key / value pairs. A file can have multiple entries for a key.

When scanning a file with ffmpeg the files_metadata table is populated. Genre and composer tags are split by separators (currently hard-coded, should be configurable) and multiple rows are inserted for the file in files_metadata.
files_metadata entries are not updated, if a file changed, all rows for this file_id are deleted and new ones are inserted.
The scan logic for the files table is unchanged.

To fetch via the files_metadata table, new query_types are added for genre / composer browse queries.
The existing queries and usages to fetch from the DB are unchanged.

The JSON API was updated to show how the queries can be used (note, that this currently breaks the web UI).
How the JSON API should be changed in the end, needs further thought (e. g. if it is OK to have breaking changes or not).

In my opinion, this solution has several advantages:

Support multi-value tags like genre and composer is now possible.
The new files_metadata table allows to easily expose additional metadata fields without impacting existing queries. In this PR for example, I added several MusicBrainz IDs (the web UI could e. g. make use of them to fetch additional data from external service).
Currently lyrics are stored in the files table, which increases the result set size for files queries by a lot (assuming a user that has a lot of files with lyrics tags). Lyrics are only relevant when looking at a single file/song, e. g. in the now playing page or in the file/song details dialog. But most or all of the time a list of songs is fetched, they are not relevant. This might also apply for other tags, and could improve the query execution times for large libraries.
It can be implemented without breaking the existing APIs.

There are of course some limitations or difficulties with this approach:

Support for files_metadata in smart playlist queries will be difficult and most probably - if possible at all - not performant.
Genre / composer duplicated in files and files_metadata. But I would argue here that this is OK. In the files table is the raw value from the scan, while in files_metadata are the parsed/processed values. And also not normalizing a DB to improve performance is in my opinion also OK.

@ejurgensen, it would be great if you could take a look and say whether this approach is OK and worth fleshing out and looking into in detail.

…ata for local files

…s endpoint; add new tracks-metadata endpoint

ejurgensen · 2025-02-09T10:22:18Z

Great that you have looked into this! It's come up a few times, and I have never been able to get around to it. I think this looks very promising.

Some questions I would like to hear your thoughts about:

For genres, composers, artists and probably some other fields, the logical relationship between a track (or a library "item") and the tag is n:n. The current files table only allows 1:1, and this change would allow 1:n. If we are going to change, should we consider going for a proper n:n? Or is that scope creep?
The addition of the mfmi struct is the database layer spilling into the application layer, which theoretically isn't ideal. However, I realize that changing mfi to having e.g. genre as an array wouldn't be easy. What are your thoughts on those alternatives?
A PR review classic: Naming. Both "files" and "files_metadata" contain metadata, and neither are strictly speaking exclusively files. Could we come up with a better naming scheme?
I'm not sure what the criteria would be for data to live in the new table in addition to the main table. Is it anything that is multiple value? I don't think lyrics should be moved due to size (is there an actual performance issue?), but there could be multiple localizations.

chme · 2025-02-15T06:10:16Z

For genres, composers, artists and probably some other fields, the logical relationship between a track (or a library "item") and the tag is n:n. The current files table only allows 1:1, and this change would allow 1:n. If we are going to change, should we consider going for a proper n:n? Or is that scope creep?

At the moment, I don't see a benefit in adding an additional table for genre/composer for proper m:m. Joining three tables makes it a bit more complex. Maybe there could be some slight performance improvement when fetching the list of genre/composer, but I doubt it is significant, especially when additional info like number of tracks should be returned (requires then joining the three tables).

What I am thinking about is, to add a column to the files_metadata table to store a persistent id (for genre / composer). This persistent id would be nice, to generate better URLs and to avoid duplicates by ignoring case differences.
It would also allow to set a proper index, that would improve fetching the list of genre/composer.
And it could maybe even make a transition to n:m easier one day.

The addition of the mfmi struct is the database layer spilling into the application layer, which theoretically isn't ideal. However, I realize that changing mfi to having e.g. genre as an array wouldn't be easy. What are your thoughts on those alternatives?

I would keep it simple. Modelling the application model close to the DB model, keeps it simple and performant. Adding an array of genre, would increase the complexity by a great deal, without a real benefit.

A PR review classic: Naming. Both "files" and "files_metadata" contain metadata, and neither are strictly speaking exclusively files. Could we come up with a better naming scheme?

Some other not so great names :-)

extra_metadata or extra_data
additional_metadata or additional_data
extended_metadata or ...
files_details, files_attributes

I'm not sure what the criteria would be for data to live in the new table in addition to the main table. Is it anything that is multiple value? I don't think lyrics should be moved due to size (is there an actual performance issue?), but there could be multiple localizations.

The new table would be for tags with multiple values. And I would also store data in there, that is only relevant when looking at one specific item/track/file. And is not relevant when fetching a list of items. Data that would be nice to see in a details dialog of an item, but is of no importance in a list of items (like lyrics :-)).

I have not measured the performance impact of having a library with lyrics for the majority of items. On my development laptop with my medium sized library it does not matter, I'll look into setting up my RPI again to get some numbers of a resource restrained system.

chme added 3 commits February 1, 2025 12:19

[db] DB migration to 22.03; new table files_metadata

b7f91c1

[scan] Scan lyrics, genre, composer, musicbrainz IDs into files_metad…

422bb1e

…ata for local files

[jsonapi] Use files_metadata for genre/composer; add new browse-album…

937184c

…s endpoint; add new tracks-metadata endpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for additional metadata and multi-value metadata (genre, composer) #1852

feat: Support for additional metadata and multi-value metadata (genre, composer) #1852

chme commented Feb 1, 2025

ejurgensen commented Feb 9, 2025

chme commented Feb 15, 2025

feat: Support for additional metadata and multi-value metadata (genre, composer) #1852

Are you sure you want to change the base?

feat: Support for additional metadata and multi-value metadata (genre, composer) #1852

Conversation

chme commented Feb 1, 2025

ejurgensen commented Feb 9, 2025

chme commented Feb 15, 2025