Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fido hanging on skeleton stream (fmt/1000) #189

Open
ross-spencer opened this issue May 14, 2020 · 3 comments
Open

Fido hanging on skeleton stream (fmt/1000) #189

ross-spencer opened this issue May 14, 2020 · 3 comments
Labels
bug A product defect that needs fixing OAG Issues to be discussed by OPF Advisory Group
Milestone

Comments

@ross-spencer
Copy link

Bug spotted while writing docs, the following will hang when calling identify_stream(...).

# -*- coding: utf-8 -*-

"""Reference for Fido-based format identification

1. Create a byte-stream with a known magic number.
2. Call identify_stream(...) to identify the file against Fido's known formats.
"""

from __future__ import print_function

import io

from fido import fido

# Magic-number for fmt/1000.
BIN = b"\x5A\x58\x54\x61\x70\x65\x21\x1A\x01"

# Create the stream object with the known magic-number.
fstream = io.BytesIO(BIN)

# Create a Fido instance and call identify_stream. The identify_stream function
# will work on the stream as-is. This could be an open file handle that the
# caller is managing for itself.
f = fido.Fido()
f.identify_stream(fstream, "filename to display", extension=False)

I haven't looked at the code myself, but Carl has and identified we're getting to the end of the data during a loop and then it's blocking as we've nothing left to consume, but no way to exit. And it sounds like he has a fix, so that's good!

@ross-spencer
Copy link
Author

Of interest to #94

@ross-spencer ross-spencer added the bug A product defect that needs fixing label May 14, 2020
carlwilson added a commit that referenced this issue May 14, 2020
- added simple test, with no assert, for file identification; and
- added similar for stream identification which demonstrates hang.
@carlwilson carlwilson linked a pull request May 14, 2020 that will close this issue
@carlwilson
Copy link
Member

carlwilson commented May 14, 2020

The source of the error is in the blocking_read() function, and the problem is in this loop:

        while bytes_read < bytes_to_read:
            readbuffer = file.read(bytes_to_read - bytes_read)
            buffer += readbuffer
            bytes_read = len(buffer)
            # break out if EOF is reached.
            if readbuffer == '':
                break
        return buffer

specifically with the exit condition if readbuffer == '': which is never satisfied, I believe but haven't tested, that the test should be for None. What's happening is caused by the stream to identify's length been (much) shorter than the default buffer size, bytes_to_read = 131072. The first read slurps all five bytes, but that still leaves bytes_read < bytes_to_read by a margin of 131067. The loop then sticks, and this is demonstrated by this Travis job that hangs for both Python 3 tasks. The Python 2 version doesn't hang, which probably explains why the issue's appeared. It also demonstrates that we're a few tests short of a reliable build process.

I have a working fix for the tests but note that these tests are deficient, which segues into the link to #94 that @ross-spencer alluded to. I believe that the identify_file and identify_stream methods should return the matched set of results. They currently return nothing and pass the results to the CLI display function further down the line. A real API would allow these tests to examine some basic identification behaviour and check the results. A nice payoff from unit tests is that they force you to write testable code, which is usually well designed.

carlwilson added a commit that referenced this issue May 15, 2020
- fixed termination condition in `blocking_read`.
@carlwilson carlwilson added the OAG Issues to be discussed by OPF Advisory Group label Jun 15, 2022
@carlwilson carlwilson added this to the OPF Hackathon 2023 Tasks milestone Jun 19, 2023
@carlwilson
Copy link
Member

Hackathon 2023 Review: Selected, @darrendignam @sromkey this should be fixable from here I think.

@carlwilson carlwilson removed their assignment Jul 17, 2023
@carlwilson carlwilson modified the milestones: OPF Hackathon 2023 Tasks, v1.8 Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing OAG Issues to be discussed by OPF Advisory Group
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants