-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce list of work_status, require publication date for Active Works in Thoth #595
Comments
RFC @brendan-oconnell @rhigman Reduce the list of Setting the following constraints on dates
|
@ja573 I agree with deprecating the codes you mention. The impact would be minimal; currently in Thoth, there are 2 Works I still wonder about requiring |
Might interact with #585. Not sure about (ETA @brendan-oconnell haha, snap - I spent too long composing my draft!) |
One more piece of data: |
Good point - setting the default value in the migration would be a can of worms. |
Yeah, I'm also divided about |
Yes, and it would definitely become an issue if we did start regularly disseminating ONIX records prior to publication. (Although, in that case, the individual exports should be preventing the creation of ONIX files where the platform requires a publication date and the record doesn't have it.) |
Do we currently disseminate ONIX records prior to publication, and if not, is that something that's important to Thoth users? |
early May? I'm out next week, but will continue working on this when I get back. |
No, at least for now we'll only distribute post-publication. But (a) we might do in the future (e.g. if we integrate platforms like lightning source) and (b) we don't know how people might use the records we output (we may not be pushing them, but people might be harvesting them). |
I think at present the only ONIX output you can generate pre-publication is Thoth's – and since it's meant to be the full implementation of ONIX, we should enforce having a publication_date set for forthcoming books |
Sorry, I don't follow... |
It seems like it's me who didn't follow what ETA meant in this context... I thought you meant "estimated time of arrival" :) |
Ah, my fault! I was using it as "edited to add" - just to acknowledge that you made a very similar point in the time it took me to post mine 😄 - should have avoided that ambiguity! |
Hmm, actually, the current implementation of In practice, I think you can output all the other ONIX flavours pre-publication except for Google Books and Overdrive - those are the only ones which explicitly mandate a publication date. |
Odd - I make it 79 vs 21. And 14 of those 21 dates are in the past! |
My fault... I looked at my (out-of-date) development data dump, instead of the production database! It seems like the proportion of I also did notice a lot of |
OK, so to sum up, this kind of gets to a tension between Thoth-as-metadata-management system vs. -dissemination system. As @rhigman notes above, publishers using Thoth as a metadata management system seem to want a kind of 'draft' record state, and they also seem to be currently using FORTHCOMING for this, as indicated by the relatively large number of FORTHCOMING works with no publication_date. If we create a catchall So this would seem to be an argument for making On the other hand, we want Thoth as a dissemination system to be able to disseminate successfully as much as possible, and not requiring What's the best way to proceed with this decision? I have the least experience and domain-specific knowledge of anyone on this project, so I don't want to make the decision myself :) Do we need to discuss at a future Thoth meeting? In any case, it seems like this question of how publishers are creating 'drafts' in Thoth is worth digging into further... |
Based on those five statuses, the ideal usage would be that books start as graph TD;
FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
FORTHCOMING -->|Cancelled| CANCELLED;
FORTHCOMING -->|Published| ACTIVE;
ACTIVE -.->|Withdrawn| INACTIVE;
|
Then, if we agree on reducing the status to just those 5, we need to look at what constraints ONIX has between these statuses and other fields and implement them accordingly |
My view on this topic: since 2020 at SciELO Books we have started working with books that will be released. |
Publishers using Thoth as a metadata management tool will have statuses FORTHCOMING (publication date NOT known) and FORTHCOMING (publication date know) which are still not resolved in that flow @ja573, and so the basic issue remains! If a publication date is 'required' for FORTHCOMING then publishers will be forced to input a made-up date - and if ONIX is then successfully distributed 'false' data is entered into various distribution systems (as well as Thoth) - in addition, it is unlikely that publishers will check if the inputted date has passed, again causing issues if distributed. |
The original idea was to required the publication date, but after the discussion it was clear that we should not be doing that, and just leave it to the onix output to complain about it not being set. Those who choose to enter a publication date for forthcoming titles (which is already possible) would need to check that the date is to some extent accurate, as we don't currently have any mechanisms to check the veracity of data that's input. But because this date is meant to be an estimate anyway, I don't think it'll be a problem if it's not completely accurate. At some point we could write notifications to publishers informing them of forthcoming books with dates in the past, though |
Apologies, this may be slightly adjacent to the core discussion here - if I understood things correctly, we are also considering to make our metadata Crossmark-compliant (see also #582 ) ... |
graph TD;
FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
FORTHCOMING -->|Cancelled| CANCELLED;
FORTHCOMING -->|Published| ACTIVE;
ACTIVE -.->|Require removal| WITHDRAWN;
ACTIVE -.->|New edition| SUPERSEDED;
|
@ja573 Do you think On the other hand, these are works that, according to the workflow you outline in your diagram, should have passed through an On the other, other hand though, if publishers are adding back catalog titles to Thoth, and they want to add works that have already been withdrawn or superseded in their catalog, perhaps they might not know the publication date... which would support not requiring it, to avoid them introducing false metadata into Thoth. And I know the general philosophy has been to keep required fields to a minimum. What do you think? |
This was discussed in a team meeting, and we decided to make |
If a
Work
in Thoth has awork_status
ofACTIVE
, requirepublication_date
.The text was updated successfully, but these errors were encountered: