Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to be able to run my own Calkit cloud instance and have projects visible to other instances #190

Open
petebachant opened this issue Oct 31, 2024 · 0 comments
Milestone

Comments

@petebachant
Copy link
Member

petebachant commented Oct 31, 2024

Imagine some lab wanted control over their own data, but wanted their projects/datasets to be searchable on other instances. This would require some sort of federation like Mastodon.

So if someone wanted to import a dataset from another instance, they could run something like

calkit import dataset https://calkit.somelab.gov/some-account/some-project/some-path.csv

If someone doesn't use a prefix, we assume it's https://calkit.io.

If/when we support this, we can open source this software.

We would want the Python package to be able to manage different tokens for different servers, and projects should know which server they belong to. Should we not allow the same project on multiple instances? That could be confusing as to which one is the "true" project.

One very simple way to enable this federation is to hard code a list of instance domains. If another instance wants to join, they can submit a PR to add on to that list. Then, if someone wants to fetch a list of projects, they make a request to all of the instances and join them together. However, is there any way to have shared user accounts? Maybe that's not desirable. Users can join whatever instances they want separately. This means that in order to use GitHub authentication, each satellite will need to create its own GitHub app. However, if a user or web app makes a request to some other instance, they send a token that can be verified against the issuer, and their email can be gleaned from that, which will allow them to be authorized on the other instance.

Maybe we can also have aggregator servers that periodically fetch and cache public data from all of the satellites, such that actions like searching for datasets or whatever can be done with one request instead of many. The satellites will then need to send requests to the aggregators on any relevant events, e.g., public data being created or destroyed.

User stories

  • As an admin, I want to run my own Calkit instance and for the projects and datasets to be searchable from any Calkit instance. This way, I can have control over my infrastructure and costs, while still allowing the research to be part of the overall web of knowledge and artifacts.
  • As a researcher, I want to be able to search for projects, datasets, publications, figures, functions across all instances and be able to make use of them in my own work. I also want my work to be able to be found by others out there without needing to put it on the centralized server.

Things that can be unique about each instance

  • The domain
  • Cloud storage bucket
  • Subscription plans -- whether or not to charge
  • Whether or not it should make queries to the federated network
  • What is on the home page? If a lab setup their own instance, maybe they'd want to show something about their lab
  • Some title with the instance name, probably up in the nav bar
  • GitHub app
  • Zenodo app
  • Stripe app
@petebachant petebachant converted this from a draft issue Oct 31, 2024
@petebachant petebachant moved this from Backlog to In progress in Calkit Dec 1, 2024
@petebachant petebachant added this to the Open source milestone Dec 1, 2024
@petebachant petebachant moved this from In progress to Ready in Calkit Dec 17, 2024
@petebachant petebachant moved this from Ready to Backlog in Calkit Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant