Filesets to swap #56

will-moore · 2023-06-13T21:20:49Z

Starting with the list of projects/screens from https://docs.google.com/spreadsheets/d/11Eg8JzY7dqRUMGVAjGIfTcGDs0BitJQJQ20BVoaiZOg/edit#gid=1516505897
and the script get_filesets_to_swap.py produced a list of 2062 Filesets (see idr_filesets.csv).

Following workflow at IDR/omero-mkngff#2, we harvest the Fileset names and UUIDs from BioStudies, and run python parse_bia_uuids.py idr0051 script from this PR to add the Fileset IDs based on their name from the idr_filesets.csv table created above.

The output idr00XX.csv tables are included in this PR.

They are used as the input for omero mkngff sql which generated sql outputs. They are also included in this PR.

We could move the idr00XX.csv files and the sql scripts to a different repo. Don't know if/where we want to document the usage instructions/workflow above?

will-moore · 2023-09-27T10:05:37Z

Validating .sql files to check:

We aren't missing .zarray files (due to temp mkngff bug).

$ for i in $(ls ./); do echo "$i $(grep -c 'zarray' $i)"; done;

There is just 1 "UPDATE" string in each

$ for i in $(ls ./); do echo "$i $(grep -c 'UPDATE' $i)"; done;

Remove all Directory rows. See Document mkngff workflow omero-mkngff#2 (comment) (NB: sed doesn't work on Mac, use grep instead):

for i in $(ls ./); do grep -v "Directory" $i > temp.txt && mv temp.txt $i ; done;

will-moore · 2023-10-16T21:33:59Z

To test clientpath changes, need an original (not edited) Fileset.
Test on idr0138-pilot...

Checked-out this PR and updated idr0090/4053140.sql with SECRET.
Updated and run setup.sql with IDR/omero-mkngff#12
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@clientpath'
Then ran 4053140.sql:

Also used that omero-mkngff PR to generate sql and run that

(venv3) (base) [wmoore@pilot-idr0138-omeroreadwrite ~]$ omero mkngff sql 4053141 --clientpath=https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/S-BIAD852/f12bdada-57eb-4fab-90ef-9655e4106497/f12bdada-57eb-4fab-90ef-9655e4106497.zarr --secret=$SECRET /bia-integrator-data/S-BIAD852/f12bdada-57eb-4fab-90ef-9655e4106497/f12bdada-57eb-4fab-90ef-9655e4106497.zarr > 4053141.sql

and ran that sql as above...

sbesson · 2023-10-17T09:53:07Z

@will-moore will-moore requested review from joshmoore and sbesson 31 minutes ago

How is this meant to be reviewed? Should that involve a large round of testing and thus the entire IDR team?

At the code-level, primary concern is that this PR is adding thousands of files (the GitHub files UI) cannot even cope. Are we certain this is not going to bloat this repository and make it unusable in the future? An alternative would be to migrate the generation script and all SQL files to a standalone one-off upgrade repository.

will-moore · 2023-10-17T10:07:14Z

To review, I'll list some things that need checking, and some ways to check them. I don't think it makes sense to get others to run the sql themselves, but rather check the result (as applied to idr-testing)...

Check that we've not missed any .pattern data (or other data that won't be read by mainline BioFormats). All studies in gdoc spreadsheet (see description above) are included in idr_filesets.py or have a idr00XX.csv file in this PR.
All the other scripts build on each other, resulting in new NGFF Filesets on idr-testing. So it makes sense to test that images there are viewable, and also check the Fileset files in webclient - the "Imported from" links should be valid URLs (not 404).
Good to discuss where to put the csv and sql files. I also wonder if we're going to follow the mkngff workflow for future NGFF submissions to IDR, so we might have more of these in the future??

sbesson · 2023-10-17T12:58:36Z

Starting from a clean checkout

(base) sbesson@Sebastiens-MacBook-Pro-3 idr-utils % du -csh .
1.1M	.
1.1M	total
(base) sbesson@Sebastiens-MacBook-Pro-3 idr-utils % git fetch origin pull/56/merge
remote: Enumerating objects: 4333, done.
remote: Counting objects: 100% (41/41), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 4333 (delta 39), reused 28 (delta 28), pack-reused 4292
Receiving objects: 100% (4333/4333), 102.44 MiB | 4.50 MiB/s, done.
Resolving deltas: 100% (3002/3002), done.
From https://github.com/IDR/idr-utils
 * branch            refs/pull/56/merge -> FETCH_HEAD
(base) sbesson@Sebastiens-MacBook-Pro-3 idr-utils % du -csh .                     
113M	.
113M	total

Even if the size is still relatively small, a 100x size increase to commit SQL scripts expected to be used once defies a bit the aim of this utility repository from my perspective. My vote would be to either use a separate repository or split each file into the appropriate study repository.

will-moore · 2023-10-18T10:00:49Z

OK, I think I'll move each idr00XX.csv and the sql scripts to the different study repos once we're done using them (and they're not changing). But while they're still being used and potentially edited, I'll keep working with this PR for convenience.
Also much easier to check them out into one place in one go for running them on idr-next etc while they're still in one branch.

will-moore · 2023-12-07T12:49:20Z

@francesw - could you create a repo named mkngff_upgrade_scripts where I can push everything from this PR?
I think everything here is very much a one-off usage and doesn't need to be added to idr-utils for any future use.
Thanks.

francesw · 2023-12-07T14:40:31Z

Done. https://github.com/IDR/mkngff_upgrade_scripts

will-moore · 2023-12-07T16:59:15Z

moved all contents from this PR to https://github.com/IDR/mkngff_upgrade_scripts . Closing...

will-moore added 2 commits June 13, 2023 22:03

Add script get_filesets_to_swap.py

8f7ea14

Add idr_filesets.csv

9dd111b

will-moore mentioned this pull request Jun 15, 2023

idr0011-ledesmafernandez-dad4 S-BIAD866 IDR/idr-metadata#642

Open

Remove .pattern from expected NGFF.ome.zarr names

9b3207a

This was referenced Jun 19, 2023

Document NGFF Fileset replacement workflow IDR/idr-metadata#656

Open

idr0026-weigelin-immunotherapy S-BIAD860 IDR/idr-metadata#648

Open

will-moore mentioned this pull request Jul 10, 2023

idr0091-julou-lacinduction S-BIAD852 IDR/idr-metadata#650

Open

will-moore added 2 commits August 18, 2023 13:58

Fix docstring, tweak prints

30b215d

Add /parse_bia_uuids.py and idr0051.csv, idr0054.csv

3cdaa30

This was referenced Aug 18, 2023

Add --symlink_repo option for creating symlinks from sql command IDR/omero-mkngff#4

Merged

Document mkngff workflow IDR/omero-mkngff#2

Open

idr0012-fuchs-cellmorph S-BIAD845 IDR/idr-metadata#643

Open

Add idr0011.csv with a few manual edits

17224c4

will-moore force-pushed the filesets_to_swap branch from 668f50c to 17224c4 Compare September 11, 2023 21:57

will-moore added 5 commits September 12, 2023 10:25

Add idr0026.csv

003b3a3

Add idr0033.csv

504b9d9

Add idr0035.csv

7ce43a3

Add idr0010.csv

631808b

Add idr0013.csv and tweak parse_bia_uuids.py for idr0013

cac35aa

will-moore force-pushed the filesets_to_swap branch from 8dcddc4 to cac35aa Compare September 19, 2023 17:22

will-moore added 7 commits September 20, 2023 23:15

Add idr0016.csv

ebbb0b9

Add idr0091.csv

04079e9

Add idr0012.csv

25c425d

Add idr0015.csv - one fileset not on s3 yet

8fb22c2

Add missing row to idr0051.csv

921793f

Add idr0025.csv

9d1ab64

Remove whitespace in idr0025.csv

2b95374

will-moore added 2 commits September 27, 2023 13:39

Add idr0004/sql files

8631ec1

Add idr0004.csv

99d89ee

will-moore added 11 commits October 16, 2023 19:19

Add clientpath to idr0015 sql

4eb5105

Add clientpath to idr0016 ssql

da4026a

Add clientpath to idr0025 sql

1d0e3e0

Add clientpath to idr0026 sql

55579ed

Add clientpath to idr0033 sql

7d6b4cc

Add clientpath to idr0035 sql

c49d853

Add clientpath to idr0051 sql

62f6667

Add clientpath to idr0054 sql

8af033a

Add clientpath to idr0090 sql

9e7bfa4

Add clientpath to idr0091 sql

5a917cd

Add clientpath to idr0036 sql

d186099

will-moore requested review from joshmoore and sbesson October 17, 2023 09:16

will-moore mentioned this pull request Oct 17, 2023

Add clientpath to Filesets IDR/omero-mkngff#12

Merged

will-moore mentioned this pull request Oct 18, 2023

idr-next steps for NGFF upgrade IDR/idr-metadata#675

Open

14 tasks

will-moore added 5 commits November 10, 2023 12:52

Update ngff_filesets/idr0015.csv with 3 regenerated plates from EBI

cd73bbe

Fix 21112.sql, 21160.sql. Add 21118.sql for idr0015

08437f4

Add clientpath to idr0015/21112.sql, 21118.sql and 21160.sql

7bb3acf

Add script add_clientpaths_to_sql.py

87e17ab

Remove Filesets from ngff_filesets/ csv files if not ready

125c4e5

will-moore closed this Dec 7, 2023

will-moore mentioned this pull request May 21, 2024

idr-testing May 2024 IDR/idr-metadata#696

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filesets to swap #56

Filesets to swap #56

will-moore commented Jun 13, 2023 •

edited

Loading

will-moore commented Sep 27, 2023

will-moore commented Oct 16, 2023 •

edited

Loading

sbesson commented Oct 17, 2023

will-moore commented Oct 17, 2023

sbesson commented Oct 17, 2023

will-moore commented Oct 18, 2023

will-moore commented Dec 7, 2023

francesw commented Dec 7, 2023

will-moore commented Dec 7, 2023

Filesets to swap #56

Filesets to swap #56

Conversation

will-moore commented Jun 13, 2023 • edited Loading

will-moore commented Sep 27, 2023

will-moore commented Oct 16, 2023 • edited Loading

sbesson commented Oct 17, 2023

will-moore commented Oct 17, 2023

sbesson commented Oct 17, 2023

will-moore commented Oct 18, 2023

will-moore commented Dec 7, 2023

francesw commented Dec 7, 2023

will-moore commented Dec 7, 2023

will-moore commented Jun 13, 2023 •

edited

Loading

will-moore commented Oct 16, 2023 •

edited

Loading