Open questions about supporting Exif #440

sophie-h · 2023-12-16T22:14:19Z

Reading chunks after IDAT

Turns out that at least ImageMagic put the eXif chunk behind IDAT, not following PNG 1.3. So we will not catch those with read_until_image_data used in read_info(). Afaik there is no way to read anything behind IDAT chunks at all?

I would suggest at least adding an option to read to the end of the file to get all metadata. One argument for that is that text chunks are allowed behind IDAT even per standard. I have no idea yet how intrusive this change would be though.

Supporting legacy formats

The second thing is that there are legacy ways of storing Exif data in PNGs via text chunks. GIMP for example uses this method to this day. This information would be accessible to API users already without further support. The question is if we want to add support for it anyway. One argument would be that we can have a more reliable exif() function in image-rs one day. And maybe it makes sense to keep the code in the png crate? That's what the code would roughly look like.

The byteorder check is recommended for decoders supporting the eXif chunk as well.

    pub fn exif(&self) -> Option<Vec<u8>> {
        for chunk in self.compressed_latin1_text.iter() {
            if chunk.keyword.as_str() == "Raw profile type exif" {
                let mut chunk = chunk.clone();
                chunk.decompress_text().ok()?;
                let text = chunk.get_text().ok()?.replace('\n', "");
                let bytes = hex::decode(text.get(8..)?).ok()?;
                let relevant_bytes = bytes.get(8..)?;

                // Reject unkown byteorder
                let byteorder = relevant_bytes.get(..2)?;
                if byteorder != b"II" && byteorder != b"MM" {
                    return None;
                }

                return Some(bytes.to_vec());
            }
        }

        None
    }

The text was updated successfully, but these errors were encountered:

fintelia · 2023-12-17T21:45:09Z

In order to handle additional metadata after the image data, I think we'd have to change the decoder design to initially scan through all the chunks in the file like image-webp does, and then seek back to them later when needed. That would also have the benefit of not having to store/discard each piece of metadata as soon as Decoder::read_info is called. It would however require a breaking change to add the Seek bound on the reader.

When we add Exif support, I think it would be reasonable to also support detecting them from a text chunk. The two main questions there would be:

What happens if there's both a text chunk with Exif data and a real eXif chunk?
If the user request ignoring text chunks, should that also include Exif data encoded in text chunks?

kornelski · 2024-12-08T14:28:41Z

Reader::info is free to update the Info whenever it finds a chunk, so it can support chunks after IDAT without API changes. The docs could be updated to tell users to check Info after reading all the image data whenever possible.

Decoder::read_header_info has header in the name, so I wouldn't blame it for not reading trailers.

Maybe there could be a new method Decoder::read_metadata that requires Seek bound? That would be backwards compatible, and wouldn't complicate the API for users who don't have a seekable source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open questions about supporting Exif #440

Open questions about supporting Exif #440

sophie-h commented Dec 16, 2023

fintelia commented Dec 17, 2023

kornelski commented Dec 8, 2024

Open questions about supporting Exif #440

Open questions about supporting Exif #440

Comments

sophie-h commented Dec 16, 2023

Reading chunks after IDAT

Supporting legacy formats

fintelia commented Dec 17, 2023

kornelski commented Dec 8, 2024