Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] assert_fail is called #4388

Open
DougRogers opened this issue Aug 24, 2024 · 17 comments
Open

[BUG] assert_fail is called #4388

DougRogers opened this issue Aug 24, 2024 · 17 comments

Comments

@DougRogers
Copy link

I keep getting assert_fail called in my app. I am using vcpkg, version 2.5.14.0

I don't have a complete stack track, just

OpenImageIO.dll!fmt::v11::detail::assert_fail(const char * file, int line, const char * message) Line 38 C++
OpenImageIO.dll!fmt::v11::detail::fwrite_fully(const void * ptr, unsigned __int64 count, _iobuf * stream) Line 77 C++
[Inline Frame] OpenImageIO.dll!fmt::v11::detail::print(_iobuf *) Line 1707 C++
OpenImageIO.dll!fmt::v11::vprint_buffered(_iobuf * f, fmt::v11::basic_string_view fmt, fmt::v11::basic_format_argsfmt::v11::context args) Line 1714 C++

Should I define FMT_THROW to something, so it will throw vs. crash? It is currently crashing my app.
I do not have a repro case, it occurs every 10 minutes or so in my app.

Windows 11

image

@lgritz
Copy link
Collaborator

lgritz commented Aug 24, 2024

My guess is that somewhere there is an error message or some other formatting that isn't needed very commonly and isn't encountered in the testsuite, in which the arguments don't match the format string.

You seem to be in a debugger, can you go up the stack frame and fine the format/print statement in OIIO that triggers the assert?

Also, in our fmt.h, there's a line that says # define FMT_EXCEPTIONS 0, maybe you can change that to 1 to trigger the exception rather than the terminate?

@DougRogers
Copy link
Author

It’s a release mode build with symbols in the debugger. The stack that is displayed in the image is all I have. I am trying to repro in debug mode, but I have not been able to so far.

I use vcpkg, I will see if there is a simple way to set that define.

@DougRogers
Copy link
Author

Is this behavior something that could be moved to runtime vs. compile time?

@lgritz
Copy link
Collaborator

lgritz commented Aug 27, 2024

I think that in sufficiently advanced C++ standard, a bunch of it may become constexpr and thus compile-time checkable. Short of that, I think it is possible to fix this to be compile checkable, if we can find the place where it happens (or the much larger job of fixing EVERY place where strings are formatted).

I think it would be fine to stop suppressing the fmt exceptions. I guess it would be problematic if we called it and it threw from within some API call we currently declare as noexcept, but probably not in a way that's worse than the current problem you're facing.

@DougRogers
Copy link
Author

App still crashes in release mode when the throw occurs from unhandled exception.

image

@DougRogers
Copy link
Author

More info.

buffer contents in vprint_buffered

"OpenImageIO exited with a pending error message that was never retrieved via OIIO::geterror(). This was the error message:\nCould not open file "D:\PhotosAndVideos\2024\Whale Watch\dng\_DHR9039.dng"

What is the solution?

@DougRogers
Copy link
Author

DougRogers commented Aug 28, 2024

My app is reading new files as them become available on disk. They may not be completely written when OIIO tries to read them. I have wrapped all my functions with

    if (OIIO::has_error())
    {
        OIIO::geterror();
    }

    inp = ImageInput::open(diskFilePath.toStdString(), &configSpec);

    if (OIIO::has_error())
    {
        OIIO::geterror();
        return;
    } 

@DougRogers
Copy link
Author

I don't think a support library should crash the program for any reason. There should be an error log created if an error is ignored and second is generated. If my app ignores errors, OIIO should not crash the app, IMHO.

@lgritz
Copy link
Collaborator

lgritz commented Aug 28, 2024

I completely agree.

@lgritz
Copy link
Collaborator

lgritz commented Aug 28, 2024

See #4400 for my attempt to address this:

  • No terminate
  • No actual thrown exception

-> should mean that a bad format won't crash the app.

But you will get one of those unretrieved global error messages (if you didn't already retrieve it) that will clue us into a bad format call.

In a debugger, if you set a breakpoint at the pvt::log_fmt_error function, you ought to be able to catch it in the act and figure out which format call is the one that was hitting the exception.

@DougRogers
Copy link
Author

This error is format independent, but it was a DNG file as shown above in the error message. As I mentioned, I think this is a result of reading partially written DNG files. I have another app creating hundreds of DNG files and my app (Tinta) is monitoring the hard disk for files. When new file is detected, Tinta attempts to read it to display its thumbnail.

I am assuming that the crash is caused because OIIO is given files by that are only partially created and causing repeated errors that I am not checking.

@DougRogers
Copy link
Author

Also, I have not been able to repro this failure in debug mode. I will set the breakpoint in release mode and provide an update soon.

@lgritz
Copy link
Collaborator

lgritz commented Aug 28, 2024

Have I misunderstood issue the whole time?

The first post with the stack trace was of fmt itself hitting its internal assertion, which I thought could only happen in a case where fmt had a mismatched formatting string and arguments, or something of a similar nature. I thought we were chasing a bug in OIIO where we called some format function with incorrect arguments.

I'm not sure how that would be connected to trying to open an incompete file, which I would think would fail its open(), issue an error of some kind, and then move on without having any reason to call assert_fail.

So there may be a second bug somewhere in the mix, but I stand by that PR #4400 and its attempt to avoid any calls to std::terminate just because a fmt call gives bad arguments. You're right that exiting at that point would be an unhelpful behavior, independent of anything else we may need to track down.

@DougRogers
Copy link
Author

DougRogers commented Aug 29, 2024

The overall issue is that when repeated oiio errors occur, the app is terminated. Thinking that OIIO shutdown by calling assert_fail when errors were not retrieved with geterror, I changed the definition of FMT_EXCEPTIONS to 1. This changed he call to do_throw, but that made no difference. The app still crashes since this exception was not handled.

There are several things at play here.

  1. Images are failing then open call and calling errorfmt, which sets the errors. This is expected behavior.
  2. Images are loaded in a separate thread, so when the thread terminates, ~ErrorHolder is called. This is a high frequency event.
  3. The call to print in ~ErrorHolder is failing in fwrite_fully . std::fwrite does not write all the requested data, then uses FMT_THROW, which terminates the app.

It look suspicious to me that the thread_local function ~ErrorHolder uses a global variable pvt::oiio_print_uncaught_errors for error reporting.

The error checking message is not the issue, there should just be an informational message printed.

The issues as I see them are:

A) Why is the call to std::fwrite failing?
B) When std::write fails, the app is terminated.
C) Should a global variable be use in this way by ~ErrorHolder?

inline void fwrite_fully(const void* ptr, size_t count, FILE* stream) {
  size_t written = std::fwrite(ptr, 1, count, stream);
  if (written < count)
    FMT_THROW(system_error(errno, FMT_STRING("cannot write to file")));
}
// ErrorHolder houses a string, with the addition that when it is destroyed,
// it will disgorge any un-retrieved error messages, in an effort to help
// beginning users diagnose their problems if they have forgotten to call
// geterror().
struct ErrorHolder {
    std::string error_msg;

    ~ErrorHolder()
    {
        if (!error_msg.empty() && pvt::oiio_print_uncaught_errors) {
            OIIO::print(
                "OpenImageIO exited with a pending error message that was never\n"
                "retrieved via OIIO::geterror(). This was the error message:\n{}\n",
                error_msg);
        }
    }
};

So my 'fix' works only because it is constantly reading and clearing the errors. This prevents the print message being issued in ~ErrorHolder that terminates the app.

@DougRogers
Copy link
Author

In ~ErrorHolder. is it possible that error_msg might cleared (or invalidated) by another thread after it is checked?

@lgritz
Copy link
Collaborator

lgritz commented Sep 1, 2024

static thread_local ErrorHolder error_msg_holder;

The error holder itself is thread local.

@DougRogers
Copy link
Author

Yes, but is it possible for a different thread to modify the value of error_msg when the destructor is executing?

lgritz added a commit that referenced this issue Sep 2, 2024
When fmt arguments don't match the format string, fmt will throw an
exception, or terminate if we disable exceptions (which we do). For
gcc11+, we tried intercepting these, but let's do it for all platforms,
and let's not terminate ourselves, but insted print and log the error.

But, oof, to do this properly, I needed to move some error recording
functionality from libOpenImageIO to libOpenImageIO_Util and make it
owned more properly by strutil.cpp, so that this all works even when
only using the util library. The logic isn't changing, it's just moving
over to the other library.

This all helps to address #4388

Unfortunately, the exception thrown by fmt doesn't tell us the bad
format string itself. That would have really allowed to probably zero
right in on it. But at least we know it's occurring, and one could put a
breakpoint on pvt::log_fmt_error to catch it in the act and see where
it's being called, revealing the bad line.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
zachlewis pushed a commit to zachlewis/OpenImageIO that referenced this issue Sep 16, 2024
…tion#4400)

When fmt arguments don't match the format string, fmt will throw an
exception, or terminate if we disable exceptions (which we do). For
gcc11+, we tried intercepting these, but let's do it for all platforms,
and let's not terminate ourselves, but insted print and log the error.

But, oof, to do this properly, I needed to move some error recording
functionality from libOpenImageIO to libOpenImageIO_Util and make it
owned more properly by strutil.cpp, so that this all works even when
only using the util library. The logic isn't changing, it's just moving
over to the other library.

This all helps to address AcademySoftwareFoundation#4388

Unfortunately, the exception thrown by fmt doesn't tell us the bad
format string itself. That would have really allowed to probably zero
right in on it. But at least we know it's occurring, and one could put a
breakpoint on pvt::log_fmt_error to catch it in the act and see where
it's being called, revealing the bad line.

Signed-off-by: Larry Gritz <lg@larrygritz.com>
Signed-off-by: Zach Lewis <zachcanbereached@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants