Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PBM decoder robustness improvements and BufferedReadStream observability #2551

Merged
merged 7 commits into from
Oct 14, 2023

Conversation

antonfirsov
Copy link
Member

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Add additonal checks to handle corrupt files better in the PBM decoder.

Also extend BufferedReadStream to monitor the number of times it has been made to hit EOF by read calls. This improves the testability of decoder behavior. The perf impact of the BufferedReadStream change is within the margin of error.

Assert.Equal(default, info.Size);
Configuration.Default.ImageFormatsManager.TryFindFormatByFileExtension("pbm", out IImageFormat format);
Assert.Equal(format!, info.Metadata.DecodedImageFormat);
Assert.Throws<InvalidImageContentException>(() => Image.Identify(bytes));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter but I don't think decoding makes much sense when the EOF is right in the header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm happy with that.

stream.SkipWhitespaceAndComments();
int height = stream.ReadDecimal();
stream.SkipWhitespaceAndComments();
if (!stream.SkipWhitespaceAndComments() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing I would suggest is to change the method names to use Try prefix to match convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed those changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, didn't realize you'd explicitly removed them! I can revert at a later time if you feel strongly about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really prefer it to be the other way.

By convention, TryDoSomething(...) == false should indicate that the operation failed as a whole. This is not the case with TryReadDecimal(out value) == false when it happens at the very las digit in the file: the decoding of the digit was succesful (we should threat the result as valid!) just the file reached an EOF.

I like following such conventions to the letter, because ambiguity can lead to a misunderstandings and programming mistakes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not precious about it (and happy to revert) but there’s something a little clunky about the method there. We’re really returning two things, the decimal and the stream state. I’ve seen something in the runtime I’m sure that handles it better (Maybe inside guid parsing code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered returning a (bool, int) tuple, but that would lead to much more complicated code on the callsites, since we couldn't do !stream.SkipWhitespaceAndComments() || .... Defining the names without the Try prefix + documenting the behavior was the best idea I was able to come up with.

byte value = (byte)stream.ReadDecimal();
stream.SkipWhitespaceAndComments();
rowSpan[x] = new L8(value);
stream.ReadDecimal(out int value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we test here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't matter since we're simply assigning 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, when it's the last digit of the file the value is valid! For simplicity, I'm ignoring the retval here and letting the SkipWhitespaceAndComments call below to detect EOF.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Liking the EOF counter changes!

@JimBobSquarePants JimBobSquarePants merged commit d76fe6f into main Oct 14, 2023
7 checks passed
@JimBobSquarePants JimBobSquarePants deleted the pbm-eof-02 branch October 14, 2023 05:59
@@ -71,7 +71,11 @@ private static void ProcessGrayscale<TPixel>(Configuration configuration, Buffer

for (int y = 0; y < height; y++)
{
stream.Read(rowSpan);
if (stream.Read(rowSpan) == 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lame actually. I should have sliced down rowSpan with the result of stream.Read(), for the pixel conversion done later, or used the condition stream.Read(rowSpan) < rowSpan.Length (the latter isstricter, but simpler).

This has little practical relevance (BuffferedReadStream is hitting EOF twice & FromL8Bytes is converting some memory garbage for broken files), so no need to fix it now, but the code looks stupid :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can improve it with follow up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

antonfirsov added a commit that referenced this pull request Oct 14, 2023
JimBobSquarePants added a commit that referenced this pull request Oct 16, 2023
…eam observability (#2555)

* PBM decoder robustness improvements and BufferedReadStream observability

Backport of #2551 & #2552

* Remove DoesNotReturn attribute

---------

Co-authored-by: James Jackson-South <james_south@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants