Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GE files with an EDF header #554

Merged
merged 4 commits into from
Mar 19, 2024
Merged

Conversation

psavery
Copy link
Contributor

@psavery psavery commented Mar 14, 2024

This file type started coming from APS in ~2022 with a .edf.ge5 extension.

In fabio, these were being interpreted as EDF files because of their EDF-like header. With the changes in this PR, the file is interpreted as a GE file instead and the frames are read in correctly.

The added logic is as follows:

  1. If the file is identified as EDF, we check the file extension, and if the file extension is ge plus a number, it is interpreted as GE instead.
  2. We check for EDF header formats in the GE reader, and if we have a match, we read in the header using EDF style and grab the number of frames from there.

I have an ~80 MB dark example... let me see if we can make that public for testing.

Fixes: #553

This file type started coming from APS in ~2022 with a `.edf.ge5` extension.

In fabio, these were being interpreted as EDF files because of their EDF-like
header. With the changes in this PR, the file is interpreted as a GE file
instead and the frames are read in correctly.

The added logic is as follows:

1. If the file is identified as EDF, we check the file extension, and if the
file extension is `ge` plus a number, it is interpreted as GE instead.
2. We check for EDF header formats in the GE reader, and if we have a match,
we read in the header using EDF style and grab the number of frames from there.

Signed-off-by: Patrick Avery <patrick.avery@kitware.com>
Signed-off-by: Patrick Avery <patrick.avery@kitware.com>
@psavery
Copy link
Contributor Author

psavery commented Mar 14, 2024

I can share the dark file, which is ~80 MB! Is that too large for public testing, or is it okay?

@jonwright
Copy link
Collaborator

Hi Patrick - I would suggest to gzip the file to make it smaller for testing. Normally fabio reads that transparently, but it breaks the format guessing from file extensions....

@kif
Copy link
Member

kif commented Mar 15, 2024

Hi, in FabIO, files are distributed as bz2 format and then decompressed and recompressed in gzip. This saves a lot of disk space. Don't worry too much about the size of that file.

Please send me such a file to put it on edna (our server, used for distributing those images).

Then remains the non regression test to implement.

@kif kif requested review from jonwright and kif March 15, 2024 10:05
Copy link
Collaborator

@jonwright jonwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it just misses the testcase ... ?

@psavery
Copy link
Contributor Author

psavery commented Mar 15, 2024

I sent a test case by email. We can make that public.

Signed-off-by: Patrick Avery <patrick.avery@kitware.com>
@psavery
Copy link
Contributor Author

psavery commented Mar 15, 2024

I added the file to the GE tests! Let me know if anything further is needed for merging this PR.

@jonwright
Copy link
Collaborator

I had a quick look. and it seems like compression gets broken like this, as openimage is not picking up the filetype from the name. Historically, I thought edfimage used to complain when the file is the wrong size?

In [1]: import fabio, fabio.test.utilstest

In [2]: fabio.open( fabio.test.utilstest.UtilsTest.getimage("dark_before_000428.edf.ge5") ).nframes
Out[2]: 10

In [3]: fabio.open( fabio.test.utilstest.UtilsTest.getimage("dark_before_000428.edf.ge5.gz") ).nframes
Out[3]: 1

In [4]: fabio.open( fabio.test.utilstest.UtilsTest.getimage("dark_before_000428.edf.ge5.bz2") ).nframes
Out[4]: 1

@jonwright
Copy link
Collaborator

So I guess the question is whether anyone would want to support reading dark_before_000428.edf.ge5.bz2, I will ping the private email thread

@psavery
Copy link
Contributor Author

psavery commented Mar 18, 2024

We can potentially modify the regex to allow .ge5.* as well.

Also, in this PR, I am assuming that Num_Images will be in the EDF header, since it was there in our example files. I wonder if it's safer to just figure out the number of frames by the file size instead? But I guess we could always change that in the future too, if needed.

Signed-off-by: Patrick Avery <patrick.avery@kitware.com>
@psavery
Copy link
Contributor Author

psavery commented Mar 18, 2024

fabio.open( fabio.test.utilstest.UtilsTest.getimage("dark_before_000428.edf.ge5.bz2") ).nframes

I just fixed this.

Copy link
Collaborator

@jonwright jonwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@kif kif merged commit 564346f into silx-kit:main Mar 19, 2024
7 checks passed
@psavery psavery deleted the edf-ge-format branch March 19, 2024 12:40
@psavery
Copy link
Contributor Author

psavery commented Mar 19, 2024

Thanks @jonwright and @kif!

I guess we'll need to wait for a new release on pypi to get access to these features in a release?

@kif
Copy link
Member

kif commented Mar 19, 2024

indeed ... I am making 2 to 3 releases a year.
The support for numpy2 is something important, and once h5py is compatible it would make sense to have such a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for newer GE files?
3 participants