Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unset TZDIR will cause crash without /usr/share/zoneinfo #1577

Closed
scdub opened this issue Aug 10, 2023 · 9 comments
Closed

Unset TZDIR will cause crash without /usr/share/zoneinfo #1577

scdub opened this issue Aug 10, 2023 · 9 comments
Assignees
Labels

Comments

@scdub
Copy link

scdub commented Aug 10, 2023

When using ORC on a platform that doesn't have /usr/share/zoneinfo defined nor the TZDIR environment variable set, the library will crash. Specifically on Windows, the DEFAULT_TZDIR location is an invalid path, but even on Linux the file is not present on minimal images like the default alpine container. It looks like the TZDATA is downloaded during the build and is referenced during the C++ tests of ORC, but this change won't affect the runtime DLLs. Perhaps a relative location or option to configure a TZDATA location would make sense? In my case working with conda, I patched the routine to locate the tzdata that is included within the conda environment, but that wouldn't be applicable to all use cases.

@dongjoon-hyun
Copy link
Member

Thank you for reporting, @scdub . Do you mean ORC C++ library or ORC Java library?

@scdub
Copy link
Author

scdub commented Aug 11, 2023

@dongjoon-hyun Sorry for not being more specific — the crash happens in the ORC C++ library. When the readFile call is made in getTimezoneByFilename, a crash will occur on Windows where /usr/share/zoneinfo/UTC is an invalid path:

image

@dongjoon-hyun
Copy link
Member

Thank you for the details, @scdub .

cc @wgtmac

@wgtmac
Copy link
Member

wgtmac commented Aug 11, 2023

Sure, let me take a look.

@wgtmac wgtmac self-assigned this Aug 11, 2023
@wgtmac
Copy link
Member

wgtmac commented Aug 12, 2023

@scdub After a quick check, it seems that either TZDIR or /usr/share/zoneinfo to be set is a must to run the ORC C++ library. From the WIN32 test, it downloads TZDATA and sets TZDIR=${TZDATA_DIR}. Could you set TZDIR to point to TZDATA in the enviroment?

@scdub
Copy link
Author

scdub commented Aug 14, 2023

@wgtmac Yes, I can confirm it will work if the TZDIR is set and extracted, as is the case with the tests themselves. However, the TZDATA isn't included in the steps for building the distribution, and relying on an environment variable has other challenges. I believe that the code should provide a clear error message rather than crash if the current design is intended. From my perspective it would be nice to not be reliant on a POSIX time interface or include one in the distribution output with instructions about the requirement to configure TZDIR or ensure the host operation system provides /usr/share/zoneinfo for e.g. barebones Docker containers.

@wgtmac
Copy link
Member

wgtmac commented Aug 16, 2023

It would be a lot of maintenance burden if the ORC library includes a TZDB in its distribution output. I prefer providing a better error message when the local tzdb is not found.

BTW, C++20 has standardized std::chrono::tzdb to represent a copy of the IANA TZDB. As we are still on C++17 yet, we can switch to it once we make the C++20 transition.

@wgtmac
Copy link
Member

wgtmac commented Aug 16, 2023

Proposed fix on the exception message: #1587

@dongjoon-hyun
Copy link
Member

I hope the proposed fix can mitigate the situation in Apache ORC 2.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants