Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to read timezone file from $CONDA_PREFIX dir before trying the system dir #15214

Closed
wants to merge 5 commits into from

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 2, 2024

Description

Closes #12926

Modified the logic when opening TZ files to try multiple paths before giving up. Without user-specified path, we try $CONDA_PREFIX/share/zoneinfo and then /usr/share/zoneinfo/.

Testing is limited to consistency with previous logic. We don't have tests for the $CONDA_PREFIX path.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Mar 2, 2024
@vuule vuule self-assigned this Mar 2, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 2, 2024
@bdice
Copy link
Contributor

bdice commented Mar 4, 2024

I'd love to know if this has any impact on #7314. 🤔

@vuule vuule changed the title Bug conda prefix tz file Try to read timezone file from $CONDA_PREFIX dir before trying the system dir Mar 4, 2024
@vyasr
Copy link
Contributor

vyasr commented May 17, 2024

@vuule is this PR waiting on anything? Just noticed it from #12926.

For testing, one (somewhat drastic) option would be to run a separate test where you delete the system timezone file and then try running the tests. You'd want to isolate that test run so that you don't bork anything else that actually relies on that file existing, though, so it may not be worth the effort.

@vuule
Copy link
Contributor Author

vuule commented May 17, 2024

@vuule is this PR waiting on anything? Just noticed it from #12926.

Once I saw the CI failing with this change I got really discouraged to mess with this. I don't know what damage I can do on random systems by changing the default directory from which we read the timezone info.

@vyasr
Copy link
Contributor

vyasr commented May 20, 2024

This is... very weird. You seem to have gotten extraordinarily unlucky because all of the failures that I can see appear to be either unrelated issues that have since been fixed, or connectivity issues. The number of network issues is the especially weird part. Let's try again...

@vuule
Copy link
Contributor Author

vuule commented May 20, 2024

This is... very weird. You seem to have gotten extraordinarily unlucky because all of the failures that I can see appear to be either unrelated issues that have since been fixed, or connectivity issues. The number of network issues is the especially weird part. Let's try again...

Unless I'm misremembering, there were also ORC test failures with the change. Let's see :)

@vyasr
Copy link
Contributor

vyasr commented May 20, 2024

Ah OK I looked at like 4 different runs and didn't see anything that looked real, but there were multiple merges and reruns that I could have missed it. So we'll see this time :)

@vuule
Copy link
Contributor Author

vuule commented May 21, 2024

There it is
FAILED tests/test_orc.py::test_orc_reader_apache_negative_timestamp - AssertionError: DataFrame.iloc[:, 0] (column name="v") are different

@vyasr
Copy link
Contributor

vyasr commented May 21, 2024

From reading the code it looks like this PR changes things to make the conda-installed files the highest priority. Perhaps what we want is to use the conda files as backup if there isn't a system installation? It looks like you chose this order because of me, so sorry about that! I was going off of how we normally deal with conda, but maybe for timezones this isn't safe.

@karthikeyann @bdice WDYT? I've spent a lot less time on the tz work to know if this is a safe thing to do. If it's not, then we should just close the associated issue and this PR.

@bdice
Copy link
Contributor

bdice commented Jun 3, 2024

The most-correct answer is probably to respect $TZDIR if it is set, according to ogham/exa#856.

If we respected $TZDIR, it would also give the conda package tzdata a way to say "use this timezone data source!" without us having to explicitly reference $CONDA_PREFIX. However, this issue for tzdata would need to be resolved first: conda-forge/tzdata-feedstock#9

Would it be helpful to adjust this error message:

RuntimeError: CUDF failure at: /opt/conda/conda-bld/work/cpp/src/io/orc/timezone.cpp:138: Failed to open the timezone file.

... to say Failed to open the timezone file. Is tzdata installed on the system?

Given the complexity of this topic and the relatively low number of issue reports we've seen, I'm inclined to say we should continue to rely on only system-installed timezone information. I would vote to close this PR and the corresponding issue. We could revisit this in this future if it is misaligned with how other conda packages handle things.

@vuule
Copy link
Contributor Author

vuule commented Jun 3, 2024

@bdice should I modify the PR to try $TZDIR (if set) before the system dir?

@bdice
Copy link
Contributor

bdice commented Jun 4, 2024

We could do that. If we do, I would want to make sure of two things:

  • Setting $TZDIR to $CONDA_PREFIX/share/zoneinfo works as expected and uses data from the tzinfo package
  • Adding $TZDIR support doesn't make libcudf have subtle/unexpected behavior changes compared to pandas or Spark -- i.e. we should have some sense of how other libraries that we work with obey or ignore $TZDIR

@vyasr
Copy link
Contributor

vyasr commented Sep 26, 2024

Closing based on the above comment

Given the complexity of this topic and the relatively low number of issue reports we've seen, I'm inclined to say we should continue to rely on only system-installed timezone information. I would vote to close this PR and the corresponding issue. We could revisit this in this future if it is misaligned with how other conda packages handle things.

We can revisit if necessary in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] Issues while loading TimeStamp columns
3 participants