Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable data use terms #3127

Open
chaoran-chen opened this issue Oct 30, 2024 · 9 comments · May be fixed by #3468
Open

Disable data use terms #3127

chaoran-chen opened this issue Oct 30, 2024 · 9 comments · May be fixed by #3468
Assignees
Labels
backend related to the loculus backend component config Configuration related issues, i.e. helm processing deployment Code changes targetting the deployment infrastructure discussion Open questions feature Feature proposal v1.0 Tasks that are crucial for a Loculus 1.0 release website Tasks related to the web application

Comments

@chaoran-chen
Copy link
Member

chaoran-chen commented Oct 30, 2024

It should be possible to disable data use terms entirely.

Acceptance criteria

  • There is a config setting to disable data use terms
  • if data use terms are disabled:
    • The selection from the submit page is removed
    • The Pathoplexus-specific acknowledgment section on the submit page is removed
    • The search field and table column from the browse page is removed
    • The section from the sequence details page is removed
    • The fields from LAPIS / SILO is removed
    • "By using our API you agree to our Data Use Terms ." from the API documentation page is removed
@chaoran-chen chaoran-chen added website Tasks related to the web application backend related to the loculus backend component deployment Code changes targetting the deployment infrastructure config Configuration related issues, i.e. helm processing feature Feature proposal v1.0 Tasks that are crucial for a Loculus 1.0 release labels Oct 30, 2024
@corneliusroemer
Copy link
Contributor

Would be important to keep complexity of this low, changing as few things as possible, otherwise we could end up with lots of code paths.

@corneliusroemer
Copy link
Contributor

To me this issue looks like it could end up taking days and increase complexity a lot as currently data use terms are hard coded, always there, in both website and backend.

@chaoran-chen What I'm missing is reasoning of why data use terms should be disableable. What harm is done as is to Genspectrum? Why is OPEN not sufficient?

Shouldn't all data either be open or restricted? Isn't this at the core of Loculus?

Data use terms are used in a lot of places in website and backend and making them entirely optional could require lots of changes in many files.

Website

For website, would we add type: DISABLED and handle that case explicitly?

export const dataUseTerms = z.union([
restrictedDataUseTerms,
z.object({
type: z.literal(openDataUseTermsType),
}),
]);

If we made it nullable we'd lose out on null checks.

Backend

Is it really necessary to allow disabling in backend? What purpose does that serve?

@emmahodcroft
Copy link
Member

emmahodcroft commented Dec 2, 2024

From thinking about this over the weekend I also had the same question as Cornelius raised above - would just having one option, and making this 'OPEN' not be a solution? One could even write general DUT (do we have this configurable - what links to DUT go to?) that just say things like 'the data is open, please use ethically and considerately' etc.

@theosanderson
Copy link
Member

theosanderson commented Dec 2, 2024

I think in the long term of Loculus it should be possible to hide any UI related to data use terms - which I think will be irrelevant for many use cases. I don't feel strongly about when that needs to happen, and if we do want to prioritise it I would do it in easier ways which address the UI before investing work in the backend.

@chaoran-chen
Copy link
Member Author

chaoran-chen commented Dec 2, 2024

The data use terms are terms that we have defined for Pathoplexus but they are not globally defined and also, unlike some licenses like GPL/CC-BY/etc., not widely known. I don't think that the current implementation with the "open" and "restricted" terms (and how they work) is really a useful feature for other databases. I remember that @theosanderson suggested once to make them more configurable, i.e., allow a maintainer to define their own set of terms with rules on how to transition between them: that would be nice but even more work than disabling them (and wouldn't replace the need for some instances to simply disabling them).

In case of GenSpectrum, I don't want to make any statement about the data use terms and I don't think it would be accurate. INSDC does not have "data use terms", so I don't think that we should claim/display that they have "open data use terms". This would get even more difficult if we download data from different sources (e.g., SC2 data from RKI or Canada's VirusSeq), it would be unappropriate to indicate that they are published under the same terms. Instead, I'd like to just have a "source" field to indicate where the data are from and link to their terms somewhere.

W.r.t. to changing the backend: I don't know how much code changes it exactly would be but it would be good to achieve that:

  1. the columns are not in LAPIS/the downloaded dataset because, as argued, they are often at least useless, sometimes even inaccurate/inappropriate
  2. when submitting (both website and through the API), users should not be be forced to choose a data use term and, if they do provide something, it should either result in an error (ideal) or at least not have weird side affects (like a banner showing up when opening the entry that doesn't make any sense)

@emmahodcroft
Copy link
Member

I fully agree that long-term allowing this to be disabled and be fully customized would be important and helpful for others setting up their own versions which likely will have DUT different from our own, I guess for me the question is more about prioritization now.

Do we have a way that DUT can be specified? (The document linked in the various places that link it?) That seems like it wouldn't be too bad to allow customization of, as a first step.

As a slightly more involved step can we allow users to rename 'OPEN' to be something else, like 'NONE' but behave the same? That might then be a 'bridging solution' that would help cover more use cases before we try to implement more comprehensive customization?

@theosanderson
Copy link
Member

Do we have a way that DUT can be specified? (The document linked in the various places that link it?) That seems like it wouldn't be too bad to allow customization of, as a first step.

For each of OPEN and CLOSED we can specify a URL atm

@chaoran-chen
Copy link
Member Author

Yes, we can configure the link to a data use terms document which is great but doesn't change that there are many cases where we wouldn't want a data use terms column at all. For GenSpectrum, I don't think that it makes sense to write data use terms ourselves.

@fhennig fhennig added the discussion Open questions label Dec 4, 2024
@fhennig
Copy link
Contributor

fhennig commented Dec 4, 2024

@corneliusroemer I'm curious how you arrive at multiple days of work and large complexity increase. I think it's just a few if/else switches in the frontend and one such switch in the backend. I wouldn't touch the DB. I'd estimate it takes half a day to get it sort of done and then maybe allow for a day of docs/testing/leeway in case something was missed. I wouldn't considered a few "if enabled -> display component" type switch a large complexity increase.

I wouldn't add a new type of data use term. It's just about user exposed things (i.e. web UI, backend API). If the DUTs are gone from there, what's in the DB doesn't matter as it won't be exposed to the user.

Maybe also relevant: I'd assume this setting is assumed to not change (i.e. we don't support changing it after data was added. It might still work (i.e. all the functionality will appear, but then all the data is OPEN by default or something)).

I think in general it's good to discuss an implementation outline so we can have a better estimate of the complexity and effort of the task!

Re prio: Obv. can't say anything about that.

@fhennig fhennig self-assigned this Dec 17, 2024
@fhennig fhennig linked a pull request Dec 18, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend related to the loculus backend component config Configuration related issues, i.e. helm processing deployment Code changes targetting the deployment infrastructure discussion Open questions feature Feature proposal v1.0 Tasks that are crucial for a Loculus 1.0 release website Tasks related to the web application
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

5 participants