Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce list of work_status, require publication date for Active Works in Thoth #595

Closed
brendan-oconnell opened this issue Apr 25, 2024 · 27 comments · Fixed by #603
Closed
Assignees
Labels
data model This issue requires or discusses a change in the data model feature New feature or request

Comments

@brendan-oconnell
Copy link
Contributor

If a Work in Thoth has a work_status of ACTIVE, require publication_date.

@brendan-oconnell brendan-oconnell added feature New feature or request data model This issue requires or discusses a change in the data model labels Apr 25, 2024
@brendan-oconnell brendan-oconnell self-assigned this Apr 25, 2024
@ja573
Copy link
Member

ja573 commented Apr 25, 2024

RFC @brendan-oconnell @rhigman

Reduce the list of work_status to ACTIVE and INACTIVE (both for post-publication), and FORTHCOMING, CANCELLED and POSTPONED_INDEFINITELY (for pre-publication)

Setting the following constraints on dates

publication_date

Must have (<= today)

  • ACTIVE
  • INACTIVE

Can have (>= today)

  • FORTHCOMING

Must not have

  • POSTPONED_INDEFINITELY
  • CANCELLED

withdrawn_date

Must have (<= today)

  • INACTIVE

Must not have

  • ACTIVE
  • POSTPONED_INDEFINITELY
  • CANCELLED
  • FORTHCOMING

Deprecate

Get rid of the following codes completely as they're either redundant or not relevant. When set, replace them with INACTIVE

  • UNSPECIFIED
  • OUT_OF_STOCK_INDEFINITELY
  • OUT_OF_PRINT
  • WITHDRAWN_FROM_SALE
  • NO_LONGER_OUR_PRODUCT
  • UNKNOWN
  • REMAINDERED
  • RECALLED

@brendan-oconnell
Copy link
Contributor Author

@ja573 I agree with deprecating the codes you mention. The impact would be minimal; currently in Thoth, there are 2 Works WITHDRAWN_FROM_SALE , 13 that are OUT_OF_PRINT, and the rest of the codes have no Works associated with them.

I still wonder about requiring publication_date for FORTHCOMING works... I really don't know enough about how publishers use Thoth to know if this would pose a problem for them. I suppose it would always be possible to set a dummy publication_date, then the publisher hopefully updates it if necessary when they change the work_status to ACTIVE.

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

Might interact with #585.

Not sure about publication_date being mandatory for FORTHCOMING. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage, FORTHCOMING is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.

(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)

@brendan-oconnell
Copy link
Contributor Author

One more piece of data:
Currently 178 FORTHCOMING Works in Thoth with no publication_date vs. 12 that have a publication_date

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

One more piece of data: Currently 178 FORTHCOMING Works in Thoth with no publication_date vs. 12 that have a publication_date

Good point - setting the default value in the migration would be a can of worms.

@ja573
Copy link
Member

ja573 commented Apr 25, 2024

Yeah, I'm also divided about FORTHCOMING... The reason for proposing it was that technically, a date is required by ONIX, which is understood as the expected publication date. But yes – we should avoid dummy dates

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

Yeah, I'm also divided about FORTHCOMING... The reason for proposing it was that technically, a date is required by ONIX, which is understood as the expected publication date. But yes – we should avoid dummy dates

Yes, and it would definitely become an issue if we did start regularly disseminating ONIX records prior to publication. (Although, in that case, the individual exports should be preventing the creation of ONIX files where the platform requires a publication date and the record doesn't have it.)

@brendan-oconnell
Copy link
Contributor Author

Do we currently disseminate ONIX records prior to publication, and if not, is that something that's important to Thoth users?

@brendan-oconnell
Copy link
Contributor Author

Might interact with #585.

Not sure about publication_date being mandatory for FORTHCOMING. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage, FORTHCOMING is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.

(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)

early May? I'm out next week, but will continue working on this when I get back.

@ja573
Copy link
Member

ja573 commented Apr 25, 2024

No, at least for now we'll only distribute post-publication. But (a) we might do in the future (e.g. if we integrate platforms like lightning source) and (b) we don't know how people might use the records we output (we may not be pushing them, but people might be harvesting them).

@ja573
Copy link
Member

ja573 commented Apr 25, 2024

I think at present the only ONIX output you can generate pre-publication is Thoth's – and since it's meant to be the full implementation of ONIX, we should enforce having a publication_date set for forthcoming books

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

Might interact with #585.
Not sure about publication_date being mandatory for FORTHCOMING. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage, FORTHCOMING is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.
(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)

early May? I'm out next week, but will continue working on this when I get back.

Sorry, I don't follow...

@brendan-oconnell
Copy link
Contributor Author

Might interact with #585.
Not sure about publication_date being mandatory for FORTHCOMING. From the point of view of Thoth as a metadata management system i.e. somewhere where users can "draft" records for potential book projects from an early stage, FORTHCOMING is the most sensible status for such drafts. Forcing users to set a publication date as soon as they create a record would introduce friction, and increase the issues we already see with "fake" dates causing confusion.
(ETA @brendan-oconnell haha, snap - I spent too long composing my draft!)

early May? I'm out next week, but will continue working on this when I get back.

Sorry, I don't follow...

It seems like it's me who didn't follow what ETA meant in this context... I thought you meant "estimated time of arrival" :)

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

It seems like it's me who didn't follow what ETA meant in this context... I thought you meant "estimated time of arrival" :)

Ah, my fault! I was using it as "edited to add" - just to acknowledge that you made a very similar point in the time it took me to post mine 😄 - should have avoided that ambiguity!

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

I think at present the only ONIX output you can generate pre-publication is Thoth's – and since it's meant to be the full implementation of ONIX, we should enforce having a publication_date set for forthcoming books

Hmm, actually, the current implementation of onix::thoth is very permissive in terms of still letting you output something even if the record is incomplete. During development, I'd been thinking of it more as a way to get one's entire record "out" of Thoth in a familiar/standard format. (All of the high-level mandatory ONIX fields are already mandatory within Thoth, so that's not a concern, but identifying this kind of interaction between fields would have required a lot of close-reading.) Not that we can't change it.

In practice, I think you can output all the other ONIX flavours pre-publication except for Google Books and Overdrive - those are the only ones which explicitly mandate a publication date.

@rhigman
Copy link
Member

rhigman commented Apr 25, 2024

One more piece of data: Currently 178 FORTHCOMING Works in Thoth with no publication_date vs. 12 that have a publication_date

Odd - I make it 79 vs 21.

And 14 of those 21 dates are in the past!

@brendan-oconnell
Copy link
Contributor Author

brendan-oconnell commented Apr 25, 2024

One more piece of data: Currently 178 FORTHCOMING Works in Thoth with no publication_date vs. 12 that have a publication_date

Odd - I make it 79 vs 21.

And 14 of those 21 dates are in the past!

My fault... I looked at my (out-of-date) development data dump, instead of the production database! It seems like the proportion of FORTHCOMING works with publication_date vs. none from my figures somewhat hold though.

I also did notice a lot of Forthcoming dates in the past...

@brendan-oconnell
Copy link
Contributor Author

OK, so to sum up, this kind of gets to a tension between Thoth-as-metadata-management system vs. -dissemination system.

As @rhigman notes above, publishers using Thoth as a metadata management system seem to want a kind of 'draft' record state, and they also seem to be currently using FORTHCOMING for this, as indicated by the relatively large number of FORTHCOMING works with no publication_date. If we create a catchall INACTIVE status as @ja573 has proposed, they could use this for 'drafts' of any kind, although the term "Inactive" has a different definition in ONIX Codes for Publishing Status: "The product was active, but is now permanently or indefinitely inactive in the sense that the publisher will not accept orders for it, though stock may still be available elsewhere in the supply chain." I'm not sure how important/well-known those ONIX Codes are to publishers?

So this would seem to be an argument for making publication_date optional for FORTHCOMING.

On the other hand, we want Thoth as a dissemination system to be able to disseminate successfully as much as possible, and not requiring FORTHCOMING works to have a publication_date would prevent some ONIX outputs, some of the time.

What's the best way to proceed with this decision? I have the least experience and domain-specific knowledge of anyone on this project, so I don't want to make the decision myself :) Do we need to discuss at a future Thoth meeting? In any case, it seems like this question of how publishers are creating 'drafts' in Thoth is worth digging into further...

@ja573
Copy link
Member

ja573 commented Apr 25, 2024

Based on those five statuses, the ideal usage would be that books start as FORTHCOMING and then follow:

graph TD;
    FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
    POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
    FORTHCOMING -->|Cancelled| CANCELLED;
    FORTHCOMING -->|Published| ACTIVE;
    ACTIVE -.->|Withdrawn| INACTIVE;
Loading

@ja573
Copy link
Member

ja573 commented Apr 25, 2024

Then, if we agree on reducing the status to just those 5, we need to look at what constraints ONIX has between these statuses and other fields and implement them accordingly

@amandasramalho
Copy link

My view on this topic: since 2020 at SciELO Books we have started working with books that will be released.
This means that the publication date is usually in the future and ONIX is sent in advance to Kobo, Amazon and Google so that the book is available as a ‘pre-release’. As a result, the book is listed in the catalogues, but the files are only released on the day specified as the publication date.
This also means that the entire set of metadata is prepared beforehand, but without the release date. The date is entered when it is set by the publisher and then the metadata is exported in ONIX.

@rupertgatti
Copy link
Member

Publishers using Thoth as a metadata management tool will have statuses FORTHCOMING (publication date NOT known) and FORTHCOMING (publication date know) which are still not resolved in that flow @ja573, and so the basic issue remains! If a publication date is 'required' for FORTHCOMING then publishers will be forced to input a made-up date - and if ONIX is then successfully distributed 'false' data is entered into various distribution systems (as well as Thoth) - in addition, it is unlikely that publishers will check if the inputted date has passed, again causing issues if distributed.
So - if publication date is enforced for FORTHCOMING, then I think we need a different name for a status where the publication date has not been determined. And if FORTHCOMING does not require a publication date we need something which flag that the ONIX is not well formatted (as it is missing data) and/or prevent distribution of ONIX files to platforms that require a publication date.
Presumably we will need to have a flag/hold when trying to distribute a Forthcoming work with a past publication date in any case - so I guess I would prefer to add a check for existence of a publication date at the same point rather than create a new work status.

@ja573
Copy link
Member

ja573 commented May 6, 2024

Publishers using Thoth as a metadata management tool will have statuses FORTHCOMING (publication date NOT known) and FORTHCOMING (publication date know) which are still not resolved in that flow @ja573, and so the basic issue remains! If a publication date is 'required' for FORTHCOMING then publishers will be forced to input a made-up date - and if ONIX is then successfully distributed 'false' data is entered into various distribution systems (as well as Thoth) - in addition, it is unlikely that publishers will check if the inputted date has passed, again causing issues if distributed. So - if publication date is enforced for FORTHCOMING, then I think we need a different name for a status where the publication date has not been determined. And if FORTHCOMING does not require a publication date we need something which flag that the ONIX is not well formatted (as it is missing data) and/or prevent distribution of ONIX files to platforms that require a publication date. Presumably we will need to have a flag/hold when trying to distribute a Forthcoming work with a past publication date in any case - so I guess I would prefer to add a check for existence of a publication date at the same point rather than create a new work status.

The original idea was to required the publication date, but after the discussion it was clear that we should not be doing that, and just leave it to the onix output to complain about it not being set.

Those who choose to enter a publication date for forthcoming titles (which is already possible) would need to check that the date is to some extent accurate, as we don't currently have any mechanisms to check the veracity of data that's input. But because this date is meant to be an estimate anyway, I don't think it'll be a problem if it's not completely accurate.

At some point we could write notifications to publishers informing them of forthcoming books with dates in the past, though

@tosteiner
Copy link
Member

Apologies, this may be slightly adjacent to the core discussion here - if I understood things correctly, we are also considering to make our metadata Crossmark-compliant (see also #582 ) ...
Now, with regards to updates to Work Status, Crossmark categorisation of 12 different changes to a given Work Status might be relevant here as well (if we were to implement those): https://www.crossref.org/documentation/crossmark/participating-in-crossmark/#00279

@ja573
Copy link
Member

ja573 commented May 10, 2024

graph TD;
    FORTHCOMING -->|Postponed Indefinitely| POSTPONED_INDEFINITELY;
    POSTPONED_INDEFINITELY -.->|Resumed| FORTHCOMING;
    FORTHCOMING -->|Cancelled| CANCELLED;
    FORTHCOMING -->|Published| ACTIVE;
    ACTIVE -.->|Require removal| WITHDRAWN;
    ACTIVE -.->|New edition| SUPERSEDED;
Loading

@brendan-oconnell brendan-oconnell changed the title Require publication date for Active Works in Thoth Reduce list of work_status, require publication date for Active Works in Thoth May 28, 2024
@brendan-oconnell
Copy link
Contributor Author

@ja573 Do you think publication_date should be required for WITHDRAWN and SUPERSEDED works? On the one hand, withdrawn_date will be required for these work_status, and we only need one date for Crossmark (the date of the update, whether it be a withdrawal, new edition, etc.). So for Crossmark purposes, it's not essential.

On the other hand, these are works that, according to the workflow you outline in your diagram, should have passed through an ACTIVE state and have been published at some point, which would mean they would need to have a publication_date when they're ACTIVE. This would support requiring publication_date, because it should (theoretically) always be present.

On the other, other hand though, if publishers are adding back catalog titles to Thoth, and they want to add works that have already been withdrawn or superseded in their catalog, perhaps they might not know the publication date... which would support not requiring it, to avoid them introducing false metadata into Thoth. And I know the general philosophy has been to keep required fields to a minimum.

What do you think?

@brendan-oconnell
Copy link
Contributor Author

@ja573 Do you think publication_date should be required for WITHDRAWN and SUPERSEDED works? On the one hand, withdrawn_date will be required for these work_status, and we only need one date for Crossmark (the date of the update, whether it be a withdrawal, new edition, etc.). So for Crossmark purposes, it's not essential.

On the other hand, these are works that, according to the workflow you outline in your diagram, should have passed through an ACTIVE state and have been published at some point, which would mean they would need to have a publication_date when they're ACTIVE. This would support requiring publication_date, because it should (theoretically) always be present.

On the other, other hand though, if publishers are adding back catalog titles to Thoth, and they want to add works that have already been withdrawn or superseded in their catalog, perhaps they might not know the publication date... which would support not requiring it, to avoid them introducing false metadata into Thoth. And I know the general philosophy has been to keep required fields to a minimum.

What do you think?

This was discussed in a team meeting, and we decided to make publication_date required for WITHDRAWN and SUPERSEDED works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data model This issue requires or discusses a change in the data model feature New feature or request
Projects
None yet
6 participants