Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall purpose/point... WiP #1

Open
yarikoptic opened this issue May 19, 2023 · 4 comments
Open

Overall purpose/point... WiP #1

yarikoptic opened this issue May 19, 2023 · 4 comments

Comments

@yarikoptic
Copy link
Member

yarikoptic commented May 19, 2023

NB WiP -- yet to finish dumping ideas...

A somewhat in spirit of automations like con/tinuous, datalad/git-annex, datalad-extensions testing, etc.

An automation to provide "tuned" forks of multiple git repos, possibly without offering "tune ups" back to original locations (since they might not want them). Use cases:

Following high level configuration structure I see

  • sources: list of original locations where to clone/fork from
    • organization, e.g. https://github.com/OpenNeuroDatasets (starting with github but later might be extended to gitlab)
    • repos-regex (optional), to what repos within organization to limit, .* by default
    • name-tuneup (optional, later): how to possibly rename fork in case of multiple sources possibly colliding, e.g. "s,^,openneuro-,g" to add an openneuro- prefix
  • destination: e.g. https://github.com/OpenNeuroDatasets-NIDM (starting with github but who knows - might later want to support gitlab)
  • transformations: list of commands to do

conjob init https://github.com/OpenNeuroDatasets https://github.com/OpenNeuroDatasets-NIDM/ BIDS2NIDM

which would

  • initiate OpenNeuroDatasets-NIDM organization
  • populate it with forks of all (default) repos from OpenNeuroDatasets
  • run BIDS2NIDM on the default branch
  • save the results in its own default branch

Possible additional features:

  • initiation/update of PRs against original repos to "offer" changes introduced
  • rerender command to update all templated produced

Aspects which come to mind

  • to scale up we better make every repo monitor original location in its own CI but that would require .github/workflows change too in that repo in some branch. Could be done and just operate "out of branch" (e.g. some action to run in conjob/ branch while working on main branch of the repo)
  • so it would b
@surchs
Copy link

surchs commented May 19, 2023

This sounds very cool @yarikoptic. For Neurobagel, I think this would be the rough workflow for the first (retro-spective) annotation for e.g. OpenNeuro:

  1. We create a participants.json (example, schema) for all (most) OpenNeuro datasets: Annotate the OpenNeuro datasets neurobagel/bulk_annotations#2
  2. We put this augmented participants.json file in a fork / branch of the corresponding OpenNeuro datalad dataset, replacing the previous participants.json (?)
  3. When that fork / branch gets updated (e.g. because of some clever bot watching the upstream dataset, or because we added the augmented .json),
  4. The most recent version of metadata can be searched at https://query.neurobagel.org/ (and would probably link back to the fork for datalad get purposes ?)

@Remi-Gau
Copy link

Not sure if that's relevant but the all-repos package may come in handy.
I have used it for the bids app organization maintenance and it is used when you want to perform the same operation on a while bunch of repo.

https://github.com/bids-apps/maintenance-tools

I'd be curious to see if it plays well with datalad

@Remi-Gau
Copy link

Remi-Gau commented May 21, 2023

possibly relevant: having "patch" datasets

bids-standard/bids-specification#814

if possible this would prevent having to create a sibling of each dataset we want to annotate

we would still want ways to make sure annotation are not obsolete and stay in synch with upstream

@yarikoptic
Copy link
Member Author

in reply to @surchs above - 3 activities about annotations:

prototype of a rough bash script which does everything for a given dataset is https://github.com/OpenNeuroDatasets-JSONLD/.github/blob/main/code/prototype-neurobagel.sh and is being now ran to populate that organization with adjusted forks. Body of the script is fairly generic, the only specific invocation is cloning of openneuro-annotations and the bottom where we invoke update_json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants