-
-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support for PDF Metadata #277
Comments
Even though i suppose PDF would have metadata, does it happen in real life for comics? Would you be able to share what kind of metadata you're referring to, and to which Komga field you would expect it mapped to? |
So personally, I happen to use Komga for 2 things. 1) reading some comics. 2) easy access to my vast collection of pdf hardware manuals.. But if I look at one:
I feel like that is some great data to pull in. At least the date, title, subject, author lines? |
It's a bit difficult to decide which field would go where with a sample or 1. Do you know where this information is coming from? Are those pdfs coming from a vendor, or is the data filled by someone else before sharing it? Are the content of the fields consistent between all your pdfs? I would surmise that those metadata could also be filled with garbage automatically generated by conversion software for instance, and an auto import feature would produce less than ideal results. I'm not sure the date is of any interest, it looks like the file creation date, not the release date. Author might be of interest, but what about creator and producer? Title and subject contain the same data, so probably subject is superfluous in that particular case. Basically what I would need before doing anything in that direction is to get a better understanding of :
|
Well, let me look at a bunch of files then, I have a pretty random collection of stuff, some hand-made.. From the docs of the program pdfinfo:
So I've just sat here digging through my hundreds of pdfs, and here is what I can report...
Some examples:
This looks like trash to you, but I know what a 2486Dxx is, so thats pretty great to have.
Maybe not the best title... but.. eh..
The author isn't wonderful there, but the subject/title are useful
Ok that one is annoying.. but it's really a corner case.
Ok, wow, they went all out.
Intel seems to be really consistent with putting the author in..
This might look non-ideal, but wow, not having to remember that a 2441 is the thermostat unit would save me a bunch of time looking in each one.. Overview of my 500 manuals: 95% of them have useful titles. Overall, I'd say that importing this data would vastly improve my library. In the ones where it has the data, it almost always improves the quality of it. It looks like the specification is part of the PDF spec.. a quick google found that it's an optional section at the end of pre-2.0 spec PDF files, and is documented in the adobe pdf spec manual. https://www.adobe.com/devnet/pdf/pdf_reference.html |
Thanks for that analysis on your files, it is really helpful. I found some information about the different fields here. I ran a similar analysis on my files, which are not as tidy as yours. I found out a few things:
For example on a magazine published on 30th of April 2020, i have this:
How would you see the usage of |
So looking at mine, I see a few different types.. I almost feel like it should just be placed in the summary when available:
I think that's a pretty good representation of mine. Most of those are chipset names, though a few have really nice descriptions to be honest. A few say user/setup/reference guide, but those seem to not be in the majority. A bunch have the release revision of the document/chip. Only a very few had keywords, I guess tags for those if they actually appear? |
The problem with
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Describe the solution you'd like
Like it was done for ComicInfo and EPUB file formats, I'd like to know if you plan to support PDF metadata extraction ?
Regards
The text was updated successfully, but these errors were encountered: