Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open questions about supporting Exif #440

Open
sophie-h opened this issue Dec 16, 2023 · 2 comments
Open

Open questions about supporting Exif #440

sophie-h opened this issue Dec 16, 2023 · 2 comments

Comments

@sophie-h
Copy link
Contributor

Reading chunks after IDAT

Turns out that at least ImageMagic put the eXif chunk behind IDAT, not following PNG 1.3. So we will not catch those with read_until_image_data used in read_info(). Afaik there is no way to read anything behind IDAT chunks at all?

I would suggest at least adding an option to read to the end of the file to get all metadata. One argument for that is that text chunks are allowed behind IDAT even per standard. I have no idea yet how intrusive this change would be though.

Supporting legacy formats

The second thing is that there are legacy ways of storing Exif data in PNGs via text chunks. GIMP for example uses this method to this day. This information would be accessible to API users already without further support. The question is if we want to add support for it anyway. One argument would be that we can have a more reliable exif() function in image-rs one day. And maybe it makes sense to keep the code in the png crate? That's what the code would roughly look like.

The byteorder check is recommended for decoders supporting the eXif chunk as well.

    pub fn exif(&self) -> Option<Vec<u8>> {
        for chunk in self.compressed_latin1_text.iter() {
            if chunk.keyword.as_str() == "Raw profile type exif" {
                let mut chunk = chunk.clone();
                chunk.decompress_text().ok()?;
                let text = chunk.get_text().ok()?.replace('\n', "");
                let bytes = hex::decode(text.get(8..)?).ok()?;
                let relevant_bytes = bytes.get(8..)?;

                // Reject unkown byteorder
                let byteorder = relevant_bytes.get(..2)?;
                if byteorder != b"II" && byteorder != b"MM" {
                    return None;
                }

                return Some(bytes.to_vec());
            }
        }

        None
    }
@fintelia
Copy link
Contributor

In order to handle additional metadata after the image data, I think we'd have to change the decoder design to initially scan through all the chunks in the file like image-webp does, and then seek back to them later when needed. That would also have the benefit of not having to store/discard each piece of metadata as soon as Decoder::read_info is called. It would however require a breaking change to add the Seek bound on the reader.

When we add Exif support, I think it would be reasonable to also support detecting them from a text chunk. The two main questions there would be:

  1. What happens if there's both a text chunk with Exif data and a real eXif chunk?
  2. If the user request ignoring text chunks, should that also include Exif data encoded in text chunks?

@kornelski
Copy link
Contributor

Reader::info is free to update the Info whenever it finds a chunk, so it can support chunks after IDAT without API changes. The docs could be updated to tell users to check Info after reading all the image data whenever possible.

Decoder::read_header_info has header in the name, so I wouldn't blame it for not reading trailers.

Maybe there could be a new method Decoder::read_metadata that requires Seek bound? That would be backwards compatible, and wouldn't complicate the API for users who don't have a seekable source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants