Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idr0091-julou-lacinduction S-BIAD852 #650

Open
will-moore opened this issue Feb 22, 2023 · 44 comments
Open

idr0091-julou-lacinduction S-BIAD852 #650

will-moore opened this issue Feb 22, 2023 · 44 comments

Comments

@will-moore
Copy link
Member

idr0091-julou-lacinduction

@will-moore will-moore moved this to test convert in NGFF conversion Feb 22, 2023
@dominikl
Copy link
Member

Issue with conversion:

(base) [dlindner@pilot-zarr2-dev idr0091]$ time /home/dlindner/bioformats2raw/bin/bioformats2raw --memo-directory ../memo /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt 20170919_glyc_lac_1_MMStack.ome.zarr
OpenJDK 64-Bit Server VM warning: You have loaded library /tmp/opencv_openpnp4590289654988610984/nu/pattern/opencv/linux/x86_64/libopencv_java342.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2023-02-22 13:49:12,806 [main] ERROR loci.formats.Memoizer - deleting invalid memo file: ../memo/uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/.20170919_glyc_lac_1_MMStack_metadata.txt.bfmemo
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at ome.xml.model.Annotation.<init>(Annotation.java:123)
        at ome.xml.model.TextAnnotation.<init>(TextAnnotation.java:91)
        at ome.xml.model.XMLAnnotation.<init>(XMLAnnotation.java:97)

I even tried with export BF_MAX_MEM=56G But watching the process never got over 20G mem usage before crashing.

@sbesson
Copy link
Member

sbesson commented Feb 22, 2023

Pretty sure that BF_MAX_MEM is specific to the Bio-Formats command-line utilities and will not be recognized by bioformats2raw. Have you tried JAVA_OPTS="-Xmx<NN>G" ?

@dominikl
Copy link
Member

👍 It finally worked with export JAVA_OPTS="-Xmx50G" !

@dominikl dominikl moved this from test convert to re-import test image in NGFF conversion Feb 27, 2023
@sbesson
Copy link
Member

sbesson commented Feb 27, 2023

50G definitely feels excessive. I recall some improvements were targeting at handling similar issues for large Micro-Manager metadata files in the past. One thing possibly worth testing independently is whether bioformats2raw 0.6.0 would handle the same data will lower memory requirements /cc @melissalinkert

Semi-related, I would expect this particular file format to work without issues with OMERO 5.6.6. What is our policy for these types of submissions of mixed file formats (probably only a handful of them)? Are we converting everything or only the minimal amount of data? /cc @jburel

@dominikl
Copy link
Member

Oh, I should test a different image then. Didn't notice that this submission had different file formats.

@melissalinkert
Copy link

ome/bioformats#3229 is the last time we addressed memory issues in Micro-Manager, so I'd be surprised if bioformats2raw 0.6.0 helps. Based on the partial stack trace, I'd guess it's original metadata annotations that are causing the problem.

Comparing memory usage for showinf -nopix -omexml /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt and showinf -nopix -omexml -no-sas /uod/idr/filesets/idr0091-julou-lacinduction/20200622-ftp/Julou_2020_lacInduction_RawImages/20170919/20170919_glyc_lac_1/20170919_glyc_lac_1_MMStack_metadata.txt should confirm whether that is indeed the issue.

@dominikl
Copy link
Member

dominikl commented Mar 7, 2023

Also converted one of the pattern files, and re-imported. Worked fine. But the converted MMStack can't be re-imported, also memory issue:

2023-03-07 11:54:22,437 17151      [      main] ERROR     ome.formats.importer.cli.ErrorHandler - FILE_EXCEPTION: /data/ngff/idr0091/20170920_glyc_lac_6h_1_MMStack.ome.zarr/OME/METADATA.ome.xml
java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

@dominikl dominikl added the bug label Mar 7, 2023
@will-moore
Copy link
Member Author

@dominikl - are you able to try the --no-sas option suggested by @melissalinkert above and see if that affects memory usage?

@melissalinkert If that is the case, does it suggest a workaround for bioformats2raw or is a fix still a much bigger issue?

A possible option is to use omero-cli-zarr to export since it's only 342 Images (according to IDR/idr-utils#56)

@melissalinkert
Copy link

bioformats2raw does not have a direct equivalent to bfconvert's -no-sas. The closest workaround at the moment is bioformats2raw --no-ome-meta-export, which entirely prevents OME/METADATA.ome.xml from being written; that's likely not what you want. I'm not opposed to adding an equivalent to -no-sas in bioformats2raw, but would like to know if that actually would solve the problem first.

@will-moore
Copy link
Member Author

Going to start exporting with omero-cli-zarr since I can also do this on the idr-ftp machine which doesn't have the raw data mounted...

$ ssh -A idr-ftp.openmicroscopy.org
$ conda create -n omero_zarr_export -c ome python=3.9 zeroc-ice36-python
$ conda activate omero_zarr_export
$ conda install -c conda-forge omero-py
$ pip install git+https://github.com/will-moore/omero-cli-zarr.git@name_option
...
omero-cli-zarr-0.1.dev452+ge882a62

cd /data/ngff/
mkdir idr0091 && cd idr0091

Export 100 images

omero login
for id in 10648046 10648047 10648048 10648049 10648050 10648051 10648052 10648053 10648054 10648055 10648056 10648057 10648058 10648059 10648060 10648061 10648062 10648063 10648064 10648065 10648066 10648067 10648068 10648069 10648070 10648071 10648072 10648073 10648074 10648075 10648076 10648077 10648078 10648079 10648080 10648081 10648082 10648083 10648084 10648085 10648086 10648087 10648088 10648089 10648090 10648091 10648092 10648093 10648094 10648095 10648096 10648097 10648098 10648099 10648100 10648101 10648102 10648103 10648104 10648317 10648318 10648319 10648320 10648321 10648322 10648323 10648324 10648325 10648326 10648327 10648328 10648329 10648330 10648331 10648332 10648333 10648334 10648335 10648336 10648337 10648338 10648339 10648340 10648341 10648342 10648343 10648344 10648345 10648346 10648347 10648196 10648197 10648198 10648199 10648200 10648201 10648202 10648203 10648204 10648205; do
  echo $id;
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

After about 17 hours we have 50 images... (about 3 an hour):

(base) [wmoore@idrftp-ftp ~]$ ls -alh /data/ngff/idr0091
...
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 21:50 20151218_switch8h_pos2_GL02.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 22:16 20151218_switch8h_pos2_GL04.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 22:44 20151218_switch8h_pos2_GL05.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:07 20151218_switch8h_pos5_GL03.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:32 20151218_switch8h_pos5_GL05.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 11 23:52 20151218_switch8h_pos5_GL06.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 12 00:18 20151218_switch8h_pos5_GL08.pattern.ome.zarr
drwxrwxr-x.  3 wmoore wmoore   42 Jul 12 00:18 20151218_switch8h_pos5_GL09.pattern.ome.zarr

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

Moved 51 zarrs to batch1 and rename image.pattern.ome.zarr to image.ome.zarr...

(base) [wmoore@idrftp-ftp batch1]$  for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done

# zip, with -move
(base) [wmoore@idrftp-ftp batch1]$ for i in */; do zip -mr "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

Created s3 bucket for testing...

$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3 mb s3://idr0091
make_bucket: idr0091
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-policy --bucket idr0091 --policy file://policy.json
$ aws --endpoint-url https://uk1s3.embassy.ebi.ac.uk s3api put-bucket-cors --bucket idr0091 --cors-configuration file://cors.json
$ ./mc cp -r /data/ngff/idr0091/20151218_switch8h_pos6_GL01.pattern.ome.zarr uk1s3/idr0091/zarr
...pattern.ome.zarr/3/99/2/0/0: 574.64 MiB / 574.64 MiB ━━━━━━━━━━━━━━━━━━ 38.54 MiB/s 14s

Looks good: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0091/zarr/20151218_switch8h_pos6_GL01.pattern.ome.zarr

Screenshot 2023-07-12 at 04 40 15

@will-moore
Copy link
Member Author

will-moore commented Jul 12, 2023

Zipping of 51 images in batch1 above only took an hour.

Upload to BioStudies...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091/batch1/idr0091 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxx
...
20151218_switch8h_pos5_GL13.ome.zarr.zip                       100%  433MB  487Mb/s    06:04    
20151218_switch8h_pos5_GL14.ome.zarr.zip                       100%  433MB  323Mb/s    06:12    
Completed: 22054051K bytes transferred in 372 seconds
 (484481K bits/sec), in 51 files, 1 directory.

# deleted
$ rm -rf batch1/

@will-moore will-moore moved this from re-import test image to convert all data to NGFF in NGFF conversion Jul 13, 2023
@will-moore
Copy link
Member Author

Other 49 images from batch 1 completed...
Zipping..

Also starting to export ALL the remaining images...

for id in 10648206 10648207 10648208 10648209 10648210 10648211 10648212 10648213 10648214 10648215 10648216 10648217 10648218 10648219 10648220 10648221 10648222 10648223 10648224 10648225 10648226 10648227 10648228 10648229 10648230 10648231 10648232 10648233 10648234 10648235 10648236 10648237 10648238 10648239 10648240 10648241 10648242 10648243 10648244 10648245 10648246 10648247 10648248 10648249 10648250 10648251 10648252 10648253 10648254 10648255 10648256 10648257 10648258 10648259 10648260 10648261 10648262 10648263 10648264 10648265 10648266 10648267 10648268 10648269 10648270 10648271 10648272 10648273 10648274 10648275 10648276 10648277 10648278 10648279 10648280 10648281 10648282 10648283 10648284 10648285 10648286 10648287 10648288 10648289 10648290 10648291 10648292 10648293 10648294 10648295 10648296 10648297 10648298 10648299 10648300 10648301 10648302 10648303 10648304 10648305 10648306 10648307 10648348 10648349 10648350 10648351 10648352 10648353 10648354 10648355 10648356 10648357 10648358 10648359 10648360 10648361 10648362 10648363 10648364 10648365 10648366 10648367 10648368 10648369 10648370 10648371 10648372 10648373 10648374 10648375 10648376 10648377 10648378 10648379 10648380 10648381 10648382 10648383 10648384 10648385 10648386 10648387 10648388 10648389 10648390 10648391 10648392 10648393 10648394 10648395 10648396 10648397 10648398 10648399 10648400 10648401 10648402 10648403 10648404 10648405 10648406 10648407 10648408 10648409 10648410 10648411 10648412 10648413 10648414 10648699 10648700 10648701 10648702 10648703 10648704 10648705 10648706 10648707 10648708 10648709 10648710 10648711 10648712 10648713 10648714 10648715 10648716 10648717 10648718 10648719 10648720 10648721 10648722 10648723 10648724 10648725 10648726 10648727 10648728 10648729 10648730 10648731 10648732 10648733 10648734 10648735 10648736 10648737 10648738 10648739 10648740 10648741 10648742 10648743 10648744 10648745 10648746 10648747 10648748 10648749 10648750 10648751 10648752 10648753 10648754 10648755 10648756 10648757 10648758 10648759 10648760 10648761 10648762 10648763 10648764 10648765 10648766 10648767 10648768 10648769 10648770 10648771; do
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

Looks like the last 2 images here (batch1) didn't export properly - too small:

(base) [wmoore@idrftp-ftp idr0091]$ ls -alh
...
-rw-rw-r--. 1 wmoore wmoore 436M Jul 13 03:49 20160912_Pos0_GL11.pattern.ome.zarr.zip
-rw-rw-r--. 1 wmoore wmoore 437M Jul 13 03:49 20160912_Pos0_GL12.pattern.ome.zarr.zip
drwxrwxr-x. 3 wmoore wmoore   42 Jul 13 00:49 20160912_Pos0_GL14.pattern.ome.zarr
-rw-rw-r--. 1 wmoore wmoore 2.8M Jul 13 05:13 20160912_Pos0_GL14.pattern.ome.zarr.zip
drwxrwxr-x. 3 wmoore wmoore   42 Jul 13 00:49 20160912_Pos0_GL15.pattern.ome.zarr
-rw-rw-r--. 1 wmoore wmoore 427K Jul 13 05:14 20160912_Pos0_GL15.pattern.ome.zarr.zip

Deleted them.

Rename 49 others (remove .pattern) and zip..

for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done
(base) [wmoore@idrftp-ftp idr0091]$ ls
20151218_switch8h_pos6_GL01.ome.zarr  20160526_pos0_GL12.ome.zarr  20160526_pos0_GL26.ome.zarr  20160526_pos4_GL20.ome.zarr  20160912_Pos0_GL02.ome.zarr
20151218_switch8h_pos6_GL03.ome.zarr  20160526_pos0_GL13.ome.zarr  20160526_pos4_GL01.ome.zarr  20160526_pos4_GL21.ome.zarr  20160912_Pos0_GL03.ome.zarr
20151218_switch8h_pos6_GL04.ome.zarr  20160526_pos0_GL16.ome.zarr  20160526_pos4_GL03.ome.zarr  20160526_pos4_GL24.ome.zarr  20160912_Pos0_GL04.ome.zarr
20151218_switch8h_pos6_GL05.ome.zarr  20160526_pos0_GL17.ome.zarr  20160526_pos4_GL06.ome.zarr  20160526_pos4_GL25.ome.zarr  20160912_Pos0_GL05.ome.zarr
20151218_switch8h_pos6_GL06.ome.zarr  20160526_pos0_GL18.ome.zarr  20160526_pos4_GL09.ome.zarr  20160526_pos4_GL27.ome.zarr  20160912_Pos0_GL06.ome.zarr
20151218_switch8h_pos6_GL07.ome.zarr  20160526_pos0_GL19.ome.zarr  20160526_pos4_GL10.ome.zarr  20160526_pos5_GL03.ome.zarr  20160912_Pos0_GL07.ome.zarr
20151218_switch8h_pos6_GL09.ome.zarr  20160526_pos0_GL21.ome.zarr  20160526_pos4_GL11.ome.zarr  20160526_pos5_GL09.ome.zarr  20160912_Pos0_GL10.ome.zarr
20151218_switch8h_pos6_GL10.ome.zarr  20160526_pos0_GL22.ome.zarr  20160526_pos4_GL12.ome.zarr  20160526_pos5_GL12.ome.zarr  20160912_Pos0_GL11.ome.zarr
20160526_pos0_GL01.ome.zarr           20160526_pos0_GL23.ome.zarr  20160526_pos4_GL17.ome.zarr  20160526_pos5_GL13.ome.zarr  20160912_Pos0_GL12.ome.zarr
20160526_pos0_GL05.ome.zarr           20160526_pos0_GL24.ome.zarr  20160526_pos4_GL19.ome.zarr  20160912_Pos0_GL01.ome.zarr

@will-moore
Copy link
Member Author

will-moore commented Jul 13, 2023

Upload the 2nd lot of 49 images from batch1...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/xxxxxx
...
20160912_Pos0_GL03.ome.zarr.zip                               100%  437MB  309Mb/s    07:53    
20160912_Pos0_GL10.ome.zarr.zip                                100%  436MB  171Mb/s    08:05    
Completed: 19024387K bytes transferred in 485 seconds
 (320849K bits/sec), in 49 files, 1 directory.

@will-moore
Copy link
Member Author

Current progress....

Exported 127 of 342 Images.

(342 - 127) / 3 = 72 hours.

First batch of 100 images (2 failed and need re-exporting).
2nd batch of 242 images is running on idr-ftp server, into:

(base) [wmoore@idrftp-ftp ngff]$ ls -alh /data/ngff/idr0091_batch2/
total 4.0K
drwxrwxr-x. 29 wmoore wmoore 4.0K Jul 13 09:10 .
drwxr-xr-x.  9 wmoore root    208 Jul 13 00:52 ..
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:11 20160912_Pos0_GL14.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:29 20160912_Pos0_GL15.pattern.ome.zarr
drwxrwxr-x.  6 wmoore wmoore  100 Jul 13 01:48 20160912_Pos0_GL16.pattern.ome.zarr
...

...and this should complete in 3 days.

@will-moore
Copy link
Member Author

Looks like all remaining zarrs exported OK...

$ ls /data/ngff/idr0091_batch2/ | wc
    242     242    9327

rename to remove .pattern and zip...

$ screen -r idr0091_zip
$ cd /data/ngff/idr0091_batch2/
$ for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done
$ for i in */; do zip -mr "${i%/}.zip" "$i"; done

@will-moore will-moore moved this from convert all data to NGFF to Zip and upload to BioStudies in NGFF conversion Aug 7, 2023
@will-moore
Copy link
Member Author

will-moore commented Aug 15, 2023

Started uploading 242 zips...

$ screen -r idr0091_aspera
$ sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091_batch2/idr0091 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/****

@will-moore
Copy link
Member Author

Checked size of zips on BioStudies. 20160912_Pos4_GL06.ome.zarr.zip is smaller than others - as this is only single timepoint: https://idr.openmicroscopy.org/webclient/?show=image-10648217

Use JS to list files from submissions page:

let names = [];
[].forEach.call(document.querySelectorAll("div [role='row'] .ag-cell[col-id='name']"), function(div) {
  names.push(div.innerHTML.trim());
});
console.log(names.join("\nidr0091/"));
console.log(names.length);

@will-moore will-moore moved this from Zip and upload to BioStudies to BioStudies Submission in NGFF conversion Aug 16, 2023
@will-moore will-moore removed the bug label Aug 17, 2023
@will-moore
Copy link
Member Author

Looks like the pixels hasn't been updated for this image:

idr=> select path, name from pixels where image = 10648757;
                                       path                                       |            name            
----------------------------------------------------------------------------------+----------------------------
 demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837/20200817-pattern | 20161207_Pos1_GL06.pattern

@will-moore
Copy link
Member Author

The sql doesn't contain OME/METADATA.ome.xml...

4053851.sql

begin;
    select mkngff_fileset(
      4053851,
      '22c41bb8-36e5-4386-9825-179b180d8238',
      'cdf35825-def1-4580-8d0b-9c349b8f78d6',
      'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/',
      array[
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '.zattrs', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '.zgroup', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '0', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/0/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '1', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/1/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '2', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/2/', '.zarray', 'application/octet-stream'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/', '3', 'Directory'],
          ['demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/3/', '.zarray', 'application/octet-stream']
      ]::text[][]
    );
commit;

@will-moore
Copy link
Member Author

will-moore commented Aug 30, 2023

@joshmoore I see from https://github.com/IDR/omero-mkngff/blob/4c1e32bb32a7b92f427634630e6b552cbb186509/src/omero_mkngff/__init__.py#L108 that mkngff expects to find a METADATA.xml with which to update the pixels table, but in the case of omero-cli-zarr-exported NGFF data, we don't have METADATA.xml, so the pixels table won't get updated, leading to the errors above.

We'll need to pick another file to update the pixels table with.

I'll open an issue on the repo: IDR/omero-mkngff#7

@will-moore
Copy link
Member Author

Running this sql fixes the image

UPDATE pixels SET name = '.zattrs', path = 'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr' where image in (select id from Image where fileset = 5287497);

http://localhost:1080/webclient/?show=image-10648757

Screenshot 2023-08-30 at 17 32 31

@will-moore
Copy link
Member Author

Actually, it seems that Bio-Formats is not fussy which file is referenced in pixels table.
After this, the image is still viewable...

idr=> UPDATE pixels SET name = '.zarray', path = 'demo_2/Blitz-0-Ice.ThreadPool.Server-16/2020-10/03/18-15-40.837_mkngff/de82a935-3143-4ce3-9439-9ab986237b09.zarr/3' where image in (select id from Image where fileset = 5287497);

@will-moore will-moore moved this from Data on Embassy s3 to create new Filesets in idr-next in NGFF conversion Aug 31, 2023
@will-moore
Copy link
Member Author

We now have all 342 Filesets available at https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html

Lets use next batch (not first 11 above) for testing IDR/omero-mkngff#8
Testing on idr0138-pilot this time...

Update to branch

conda activate mkngff
pip uninstall omero-mkngff
pip install 'omero-mkngff @ git+https://github.com/will-moore/omero-mkngff@always_update_pixels'
idr0091/20160912_Pos8_GL21.ome.zarr,S-BIAD852/03e12e59-d0cd-456a-99fa-c55dba56b029,4053336
idr0091/20160526_pos5_GL03.ome.zarr,S-BIAD852/040d0262-cf47-4ddd-b5c7-cad13bf98ada,4053438
idr0091/20161130_switch_IPTG1uM_Pos0_GL06.ome.zarr,S-BIAD852/043c117e-1b42-4691-88e6-87f0bd67917d,4053797
idr0091/20161021_Pos5_GL04.ome.zarr,S-BIAD852/053569d5-6ca3-40ec-a1f0-ba163109cc0f,4053499
idr0091/20151218_switch8h_pos5_GL13.ome.zarr,S-BIAD852/057a0a1c-96d1-4cc5-8e4f-c63ce4961080,4053189
idr0091/20160526_pos4_GL21.ome.zarr,S-BIAD852/058b1fac-f751-48d1-8e54-65ce179e1bdb,4053434
idr0091/20161007_Pos0_GL05.ome.zarr,S-BIAD852/05eb785a-9989-4e93-a18d-adf6dd60615b,4053374
idr0091/20161212_Pos0_GL19.ome.zarr,S-BIAD852/07608c5c-ea6d-4e93-9443-efe56fc27ea0,4053451
idr0091/20151204_switch6h_pos0_GL10.ome.zarr,S-BIAD852/07cabbd4-5946-4cf7-ba0f-2b29b60f1184,4053146
idr0091/20160912_Pos4_GL12.ome.zarr,S-BIAD852/08f9303d-b58d-49f9-9655-b858d7218443,4053316

Took about 8 minutes to generate each sql file...

...
BEGIN
 mkngff_fileset 
----------------
        5811622
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811623
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811624
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811625
(1 row)
COMMIT
UPDATE 0
BEGIN
 mkngff_fileset 
----------------
        5811626
(1 row)
COMMIT
UPDATE 0

Find image from last Fileset created and check pixels name, path...

idr=> select id from image where fileset =5811626;
    id    
----------
 10648222
(1 row)

idr=> select name, path from pixels where image = 10648222;
            name            |                                      path                                       
----------------------------+---------------------------------------------------------------------------------
 20160912_Pos4_GL12.pattern | demo_2/Blitz-0-Ice.ThreadPool.Server-2/2020-10/02/23-00-58.921/20200817-pattern
(1 row)

Realise that this didn't work as I've used the OLD Fileset ID to update pixels after the new Fileset is created.
Pushed fix to IDR/omero-mkngff@2314311

Then re-installed...

@will-moore
Copy link
Member Author

Try with fresh filesets...

idr0091/20161014_Pos1_GL02.ome.zarr,S-BIAD852/09369079-50e6-486e-9e72-40e7a0eef8ec,4053346
idr0091/20151218_switch8h_pos5_GL12.ome.zarr,S-BIAD852/0a1ff011-a78f-4b11-b8f5-c24ffd0972f6,4053188
idr0091/20151204_switch6h_pos0_GL20.ome.zarr,S-BIAD852/0a812f66-99dd-4280-bb59-7d04f7e75b39,4053152
idr0091/20160912_Pos0_GL16.ome.zarr,S-BIAD852/0a858893-dcb1-40f7-ac4d-86cd80d1587d,4053302

@will-moore
Copy link
Member Author

After running sql commands, get Image IDs from Fileset IDs..

idr=> select id from image where fileset in (5811627, 5811628, 5811629, 5811630)
idr-> ;
    id    
----------
 10648252
 10648094
 10648058
 10648208
(4 rows)

Check pixels...
=> select path, name from pixels where image = 10648252;
                                                       path                                                       |  name   
------------------------------------------------------------------------------------------------------------------+---------
 demo_2/Blitz-0-Ice.ThreadPool.Server-12/2020-10/03/02-02-34.667_mkngff/09369079-50e6-486e-9e72-40e7a0eef8ec.zarr | .zattrs

Image is directly viewable!

Screenshot 2023-08-31 at 15 55 59

@will-moore
Copy link
Member Author

will-moore commented Sep 22, 2023

Going to generate mkngff sql on ALL Filesets on idr0125-pilot. https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html
Above tests were run on idr0138-pilot, so DB doesn't have original Fileset IDs now).

idr0091.csv commit

for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3)
  omero mkngff sql --symlink_repo /data/OMERO/ManagedRepository --secret=$SECRET $fsid "/bia-integrator-data/$biapath/$uuid.zarr" >> "$IDRID/$fsid.sql"
  psql -U omero -d idr -h $DBHOST -f "$IDRID/$fsid.sql"
done

NB: First 10 failed sql as had already been run on idr0125-pilot above - Need to sort out...

... took 25 mins in total.

@will-moore
Copy link
Member Author

Also saw another random fail for

idr0091/20161207_Pos1_GL06.ome.zarr,S-BIAD852/de82a935-3143-4ce3-9439-9ab986237b09,4053851

just caught this...

ERROR:  duplicate key value violates unique constraint "originalfile_repo_path_index"
DETAIL:  Key (repo, regexp_split_to_array((('/'::text || path) || name) || '/'::text, '/+'::text))=(cdf35825-def1-4580-8d0b-9c349b8f78d6, {"",demo_2,Blitz-0-Ice.ThreadPool.Server-16,2020-10,03,18-15-40.837_mkngff,de82a935-3143-4ce3-9439-9ab986237b09.zarr,.zattrs,""}) already exists.
CONTEXT:  SQL statement "insert into originalfile
          (id, permissions, creation_id, group_id, owner_id, update_id, mimetype, repo, path, name)
          values (nextval('seq_originalfile'), old_perms, new_event, old_group, old_owner, new_event,
            info[i][3], repo, info[i][1], uuid || info[i][2])
          returning id"
PL/pgSQL function mkngff_fileset(bigint,character varying,character varying,character varying,text[]) line 42 at SQL statement
ROLLBACK

@will-moore
Copy link
Member Author

Re-exporting on idr-ftp with pixels type fix as at ome/omero-cli-zarr#157 with merge branch

pip install 'omero-cli-zarr @ git+https://github.com/will-moore/omero-cli-zarr@merge_prs'

omero login
for id in 10648047 10648048 10648049 10648050 10648051 10648052 10648053 10648054 10648055 10648056 10648057 10648058 10648059 10648060 10648061 10648062 10648063 10648064 10648065 10648066 10648067 10648068 10648069 10648070 10648071 10648072 10648073 10648074 10648075 10648076 10648077 10648078 10648079 10648080 10648081 10648082 10648083 10648084 10648085 10648086 10648087 10648088 10648089 10648090 10648091 10648092 10648093 10648094 10648095 10648096 10648097 10648098 10648099 10648100 10648101 10648102 10648103 10648104 10648317 10648318 10648319 10648320 10648321 10648322 10648323 10648324 10648325 10648326 10648327 10648328 10648329 10648330 10648331 10648332 10648333 10648334 10648335 10648336 10648337 10648338 10648339 10648340 10648341 10648342 10648343 10648344 10648345 10648346 10648347 10648196 10648197 10648198 10648199 10648200 10648201 10648202 10648203 10648204 10648205; do
  echo $id;
  omero zarr export Image:$id --name_by name;
done

@will-moore
Copy link
Member Author

Also exported "batch2" as above...

Renamed ALL 342 filesets to remove pattern

for i in $(ls .); do mv $i `echo $i | sed 's/pattern.ome.zarr$/ome.zarr/'`; done

Zip - not deleting...

$ for i in */; do zip -r "${i%/}.zip" "$i"; done

@will-moore
Copy link
Member Author

will-moore commented Jan 4, 2024

on idr-testing... (goofys is at /usr/bin/goofys)...

sudo mkdir /idr0091 && sudo /usr/bin/goofys --endpoint https://uk1s3.embassy.ebi.ac.uk/ -o allow_other idr0091 /idr0091

(base) [wmoore@test120-omeroreadwrite ~]$ ls /idr0091/zarr/
20151218_switch8h_pos6_GL01.pattern.ome.zarr

On idr-ftp, delete the existing (invalid) data and upload all images...

./mc rm --recursive uk1s3/idr0091/zarr/20151218_switch8h_pos6_GL01.pattern.ome.zarr

./mc cp -r /data/ngff/idr0091/idr0091/ uk1s3/idr0091/zarr
..._Pos1_GL26.ome.zarr/3/99/2/0/0: 96.73 GiB / 96.73 GiB ━━━━━━━━━

idr-testing...

(base) [wmoore@test120-omeroreadwrite ~]$ ls /idr0091/zarr/ | wc
    342     342   10722

E.g. looks good: https://ome.github.io/ome-ngff-validator/?source=https://uk1s3.embassy.ebi.ac.uk/idr0091/zarr/20160526_pos0_GL01.ome.zarr

@will-moore
Copy link
Member Author

On idr-testing, let's try to update symlink to fix dtype issues...

Test with Image: 20151204_switch6h_pos0_GL01.pattern, ID: 10648046...
Existing failure:

$ python check_pixels.py --max-planes=sizeC Image:10648046
Start: 2024-01-04 22:12:48.846978
Checking Image:10648046
max_planes: sizeC
max_images: 0
0/1 Check Image:10648046 20151204_switch6h_pos0_GL01.pattern
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 174528 bytes

That Image has symlink like this:

(venv3) (base) [wmoore@test120-omeroreadwrite scripts]$ ls -alh !$
ls -alh /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/
total 8.0K
drwxr-sr-x.  2 omero-server omero-server  126 Nov  1 15:40 .
drwxrwsr-x. 22 omero-server omero-server 4.0K Oct 11 09:49 ..
lrwxrwxrwx.  1 omero-server omero-server  109 Oct 11 09:41 971f2809-c748-4259-8044-81ba6c774fdd.zarr -> /bia-integrator-data/S-BIAD852/971f2809-c748-4259-8044-81ba6c774fdd/971f2809-c748-4259-8044-81ba6c774fdd.zarr
-rw-r--r--.  1 omero-server omero-server   25 Nov  1 15:40 971f2809-c748-4259-8044-81ba6c774fdd.zarr.bfoptions

As omero-server...

rm /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/971f2809-c748-4259-8044-81ba6c774fdd.zarr

$ ln -s /idr0091/zarr/20151204_switch6h_pos0_GL01.ome.zarr /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/971f2809-c748-4259-8044-81ba6c774fdd.zarr

Symlink looks good:

$ ls -alh /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-41-08.138_mkngff/
total 8.0K
drwxr-sr-x.  2 omero-server omero-server  126 Jan  4 22:32 .
drwxrwsr-x. 22 omero-server omero-server 4.0K Oct 11 09:49 ..
lrwxrwxrwx.  1 omero-server omero-server   50 Jan  4 22:32 971f2809-c748-4259-8044-81ba6c774fdd.zarr -> /idr0091/zarr/20151204_switch6h_pos0_GL01.ome.zarr
-rw-r--r--.  1 omero-server omero-server   25 Nov  1 15:40 971f2809-c748-4259-8044-81ba6c774fdd.zarr.bfoptions

Fixed!

$ python check_pixels.py --max-planes=sizeC Image:10648046
Start: 2024-01-04 22:37:35.901700
Checking Image:10648046
max_planes: sizeC
max_images: 0
0/1 Check Image:10648046 20151204_switch6h_pos0_GL01.pattern
End: 2024-01-04 22:38:05.999497

@will-moore
Copy link
Member Author

We can actually use IDR/idr-utils#54 script to do this, if we provide mapping.csv

Test with a single Image on idr-testing...
ID: 10648047, Name 20151204_switch6h_pos0_GL02.pattern
mapping.csv (existing symlink -> new target)

f12bdada-57eb-4fab-90ef-9655e4106497.zarr,20151204_switch6h_pos0_GL02.ome.zarr

As omero-server...

$ echo f12bdada-57eb-4fab-90ef-9655e4106497.zarr,20151204_switch6h_pos0_GL02.ome.zarr > idr0091_symlinks.csv

login as public user, then..

$ python /uod/idr/metadata/idr-utils/scripts/managed_repo_symlinks.py Image:10648047 /idr0091/zarr/ --repo /data/OMERO/ManagedRepository --fileset-mappings idr0091_symlinks.csv --report

fileset_dirs {'f12bdada-57eb-4fab-90ef-9655e4106497.zarr': '20151204_switch6h_pos0_GL02.ome.zarr'}

Fileset: 6314412 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-46-33.031_mkngff/
Render Image 10648047
fs_contents ['f12bdada-57eb-4fab-90ef-9655e4106497.zarr', 'f12bdada-57eb-4fab-90ef-9655e4106497.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-20/2020-10/02/14-46-33.031_mkngff/f12bdada-57eb-4fab-90ef-9655e4106497.zarr to /idr0091/zarr/20151204_switch6h_pos0_GL02.ome.zarr
Symlink target not found: /idr0091/zarr/f12bdada-57eb-4fab-90ef-9655e4106497.zarr.bfoptions

Success!

$ python scripts/check_pixels.py Image:10648047 --max-planes=sizeC
Start: 2024-01-05 10:02:34.840789
Checking Image:10648047
max_planes: sizeC
max_images: 0
0/1 Check Image:10648047 20151204_switch6h_pos0_GL02.pattern
End: 2024-01-05 10:02:51.334694

@will-moore
Copy link
Member Author

will-moore commented Jan 5, 2024

On idr-testing, make idr0091_temp.csv which is idr0091.csv but modified to remove idr0091/ and S-BIAD on each row:

20161212_Pos0_GL14.ome.zarr,0008e8fc-721f-4465-8ff2-bebcce8bca8a,4053448
20161212_Pos1_GL19.ome.zarr,0044dd95-07e1-4937-938b-dde53ebbb719,4053473
20161007_Pos0_GL01.ome.zarr,00602c54-e3bd-406c-83fd-a802b58182b0,4053371
...

From that, we can make symlinks mapping file as above:

for r in $(cat idr0091_temp.csv); do
  name=$(echo $r | cut -d',' -f1)
  uuid=$(echo $r | cut -d',' -f2)
  echo "$uuid.zarr,$name" >> idr0091_symlinks.csv
done

Now we run managed_repo_symlinks for each Image...

for r in $(cat idr0091_imageids.csv); do
  python /uod/idr/metadata/idr-utils/scripts/managed_repo_symlinks.py Image:$r /idr0091/zarr/ --repo /data/OMERO/ManagedRepository --fileset-mappings idr0091_symlinks.csv --report
done
...
Fileset: 6314331 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-21/2020-10/02/17-11-44.019_mkngff/
Render Image 10648074
fs_contents ['b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr', 'b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-21/2020-10/02/17-11-44.019_mkngff/b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr to /idr0091/zarr/20151204_switch6h_pos5_GL12.ome.zarr
Symlink target not found: /idr0091/zarr/b80ac5e8-ff4d-4235-aaab-4adfeec0db48.zarr.bfoptions
...

EDIT... took about 15 mins to do 342 images...

...
Fileset: 6314271 /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2020-10/03/18-48-59.765_mkngff/
Render Image 10648771
fs_contents ['882f80fa-f40f-455b-b923-09dce086675b.zarr', '882f80fa-f40f-455b-b923-09dce086675b.zarr.bfoptions']
Link from /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-5/2020-10/03/18-48-59.765_mkngff/882f80fa-f40f-455b-b923-09dce086675b.zarr to /idr0091/zarr/20161207_Pos1_GL26.ome.zarr
Symlink target not found: /idr0091/zarr/882f80fa-f40f-455b-b923-09dce086675b.zarr.bfoptions

@will-moore
Copy link
Member Author

will-moore commented Jan 5, 2024

python /uod/idr/metadata/idr-utils/scripts/check_pixels.py Project:1351 --max-planes=sizeC > /tmp/check_pixels_20240105_idr0091.log

All good 👍

(base) [wmoore@test120-omeroreadwrite ~]$ grep pattern /tmp/check_pixels_20240105_idr0091.log | wc
    342    1368   20719
(base) [wmoore@test120-omeroreadwrite ~]$ grep Error /tmp/check_pixels_20240105_idr0091.log | wc
      0       0       0

@will-moore
Copy link
Member Author

On idr-ftp, the zips created on 18th Dec (above) have been uploaded (not sure of exact date), following deletion of the old idr0091 folder on 16th Jan:
from history...

sudo /root/.aspera/cli/bin/ascp -P33001 -i /root/.aspera/cli/etc/asperaweb_id_dsa.openssh -d /data/ngff/idr0091/idr0091 bsaspera_w@hx-fasp-1.ebi.ac.uk:5f/136e8d-e...

@will-moore
Copy link
Member Author

will-moore commented Feb 20, 2024

Images updated on https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD852.html

New idr0090.csv file at IDR/mkngff_upgrade_scripts@0522d43 and IDR/mkngff_upgrade_scripts@c92c217 based on csv provided by Kola.

Running mkngff on idr-next (since this has the NGFF filesets that we wish to replace), using --fs_suffix=None so we don't add an extra _mkngff to Fileset paths.

(venv3) [wmoore@prod120-omeroreadwrite ~]$ git clone https://github.com/IDR/mkngff_upgrade_scripts.git
(venv3) [wmoore@prod120-omeroreadwrite ~]$ cd mkngff_upgrade_scripts/ngff_filesets/


for r in $(cat $IDRID.csv); do
  biapath=$(echo $r | cut -d',' -f2)
  uuid=$(echo $biapath | cut -d'/' -f2)
  fsid=$(echo $r | cut -d',' -f3 | tr -d '[:space:]')
  omero mkngff sql $fsid --fs_suffix=None --clientpath="https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/$biapath/$uuid.zarr" "/bia-integrator-data/$biapath/$uuid.zarr" > "$IDRID/$fsid.sql"
done

EDIT: something went wrong as all the .sql files are empty!

Fixed the idr0091.csv (mising S-BIAD852/ from each row. Running again...

Pushed at IDR/mkngff_upgrade_scripts@03b02e7

Won't test these yet as idr-testing is being used for microservices testing.

@will-moore
Copy link
Member Author

On new pilot #675 (comment)

Ran all the mkngff SQL scripts... ending for idr0091 with...

...
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr -> /bia-integrator-data/S-BIAD852/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-12/2024-02/28/17-06-10.963_mkngff/fdfdbb32-c1c2-4eec-8bbd-ffc3b729958b.zarr.bfoptions
UPDATE 1
BEGIN
 mkngff_fileset
----------------
        6319888
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff/fe65c558-7099-48c4-8222-a5dc54da884a.zarr -> /bia-integrator-data/S-BIAD852/fe65c558-7099-48c4-8222-a5dc54da884a/fe65c558-7099-48c4-8222-a5dc54da884a.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-11/2024-02/28/16-45-47.233_mkngff/fe65c558-7099-48c4-8222-a5dc54da884a.zarr.bfoptions
UPDATE 1
BEGIN
 mkngff_fileset
----------------
        6319889
(1 row)

COMMIT
Using session for demo@localhost:4064. Idle timeout: 10 min. Current group: Public
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461
Creating dir at /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff
Creating symlink /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr -> /bia-integrator-data/S-BIAD852/fe795db1-82c3-42b0-bbf8-5c4230bebdc9/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr
Checking for prefix_dir /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461
write bfoptions to: /data/OMERO/ManagedRepository/demo_2/Blitz-0-Ice.ThreadPool.Server-6/2024-02/28/17-13-27.461_mkngff/fe795db1-82c3-42b0-bbf8-5c4230bebdc9.zarr.bfoptions

Last row in idr0091.csv at https://github.com/IDR/mkngff_upgrade_scripts/blob/1b64ab85fab537faafd62d6e19c01cf5ab32d11f/ngff_filesets/idr0091.csv
is
idr0091/20161212_Pos1_GL04.ome.zarr,S-BIAD852/fe795db1-82c3-42b0-bbf8-5c4230bebdc9,6314392

this image is http://localhost:1080/webclient/?show=image-10648367
and the Fileset ID is 4053461.

So, the idr0091.csv above is out of date, and was missed from the update at IDR/mkngff_upgrade_scripts@03b02e7

@will-moore
Copy link
Member Author

Try to clean-up (delete) the 342 Filesets we created above - last one ID 6319889.
First one ID = 6319548?

idr=> select id from Image where fileset=6319548;
 15150680
(1 row)

http://localhost:1080/webclient/?show=image-15150680 in webclient on pilot-idrngff is a tiff image but has wrong Fileset with 44e015db3952.zarr which corresponds to the first row of idr0090.csv.

For all Filesets 6319548 -> 6319889 we want to:

  • Find the original Fileset that it replaced.
  • Switch the Images back to the Original Fileset
  • Delete the new Fileset!

For Last Image/Fileset...

idr=> select child from FilesetAnnotationLink where parent=6319889;
  child   
----------
 38302449
idr=> select longvalue from Annotation where id=38302449;
 longvalue 
-----------
   6314392

This corresponds to the Fileset IDs updated in IDR/mkngff_upgrade_scripts@25c5372

So, NEW Fileset IDs are 6319548 -> 6319889
OLD Fileset IDs are in idr0091.csv before that commit.

First row...

  • Old Fileset ID 6314330 (from old idr0091.csv), New Fileset ID: 6319548 (to be deleted), Image: 15150680
update image set fileset = 6314330 where fileset = 6319548;
for i in {6319548..6319889}; do echo $i > idr0091_ids.csv; done

idr0091_ids.csv (removed first line 6319548,6314330 - already done update above.

NEW Fileset ID, OLD Fileset ID

6319549,6314371
6319550,6314286
6319551,6314139
...
6319887,6314352
6319888,6314232
6319889,6314392

Then

for r in $(cat idr0091_ids.csv); do
  newid=$(echo $r | cut -d',' -f1)
  oldid=$(echo $r | cut -d',' -f2)
  psql -U omero -d idr -h $DBHOST -c "update image set fileset = $oldid where fileset = $newid"
done


for r in $(cat idr0091_ids.csv); do
  newid=$(echo $r | cut -d',' -f1)
  echo $newid && omero delete Fileset:$newid
done

@will-moore will-moore moved this from check_pixels to NGFF studies in NGFF conversion May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: NGFF studies
Development

No branches or pull requests

5 participants