Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the best way to override the index download directory? #32

Closed
bdklahn opened this issue Mar 4, 2024 · 17 comments
Closed

What's the best way to override the index download directory? #32

bdklahn opened this issue Mar 4, 2024 · 17 comments

Comments

@bdklahn
Copy link

bdklahn commented Mar 4, 2024

I'm working on integrating this into a service.
If I want to cache the index files to a non-user-specific shared location (e.g.), would the best way to do that be to maybe override the XDG_DATA_DIR?

XDG_DATA_DIR = Path(user_data_dir("hostile", "Bede Constantinides"))

I was tracing through the code, and it looks like this isn't exposed as a parameter, like the output dir is.
But am I missing something?

Thanks!

@bede
Copy link
Owner

bede commented Mar 5, 2024

Hi Brian,
Thanks for reporting this, which definitely should be documented.

If you set the environment variable XDG_DATA_HOME to point to your custom shared location, user_data_dir() should notice and use it instead of the OS-specific default path. A subdir named hostile will be created containing downloaded indexes. I've just verified this, and it seems to work on Linux, but not MacOS (?!), meaning I might need to report an issue to the platformdirs maintainer. But unless using MacOS, this should work. Please let me know how you get on.

The other option is to download the indexes and directly reference them using paths supplied to --index, which is more clunky.

@bdklahn
Copy link
Author

bdklahn commented Mar 5, 2024

Hi Brian, Thanks for reporting this, which definitely should be documented.

If you set the environment variable XDG_DATA_HOME to point to your custom shared location, user_data_dir() should notice and use it instead of the OS-specific default path. A subdir named hostile will be created containing downloaded indexes. I've just verified this, and it seems to work on Linux, but not MacOS (?!), meaning I might need to report an issue to the platformdirs maintainer. But unless using MacOS, this should work. Please let me know how you get on.

The other option is to download the indexes and directly reference them using paths supplied to --index, which is more clunky.

Hi Bede,

Thanks for the response. My first thought was to maybe set via os.environ (or export . . . before running).
But then I thought, since I've been testing/trying via the Python API (since our tools wrapper is written in Python), maybe I can just import and set via the hostile.utils namespace(?). If possible, that ought to bypass any platformutils calls, right? It doesn't really matter for me, since our server is Linux (not OSX), but . . .

I can play with it in a conda env, and report back.
Thanks.

@bede
Copy link
Owner

bede commented Mar 5, 2024

Ah, you're using the Python API, nice. In that case I would suggest setting the environment variable in Python:

import os
os.environ["XDG_DATA_HOME"] = "/data"

If for some reason this doesn't work, I can implement a direct override using e.g. a function argument.

@bdklahn
Copy link
Author

bdklahn commented Mar 5, 2024

Ah, you're using the Python API, nice. In that case I would suggest setting the environment variable in Python:

import os
os.environ["XDG_DATA_HOME"] = "/data"

If for some reason this doesn't work, I can implement a direct override using e.g. a function argument.

Heh. That was my first thought (and what I meant, by "override the XDG_DATA_DIR"), but I seemed to talk myself out of it, thinking that was unnecessarily going out into the calling environment, only to allow loading back in. But maybe that ends up being identical (as part of any memory the Python instance already has loaded). At least it should be simple.
Thanks.

@bede
Copy link
Owner

bede commented Mar 5, 2024

I know where you're coming from! Environment variables are not especially pretty. If for some reason this doesn't work properly, let me know and we can sort it.

@bede
Copy link
Owner

bede commented Mar 7, 2024

In the next release, the Right Way to do with will be to use the new HOSTILE_CACHE_DIR environment variable which should work cross-platform. On Linux, XDG_DATA_HOME will still be respected so long as HOSTILE_CACHE_DIR is unset.

57be277

@bdklahn
Copy link
Author

bdklahn commented Mar 7, 2024

In the next release, the Right Way to do with will be to use the new HOSTILE_CACHE_DIR environment variable which should work cross-platform. On Linux, XDG_DATA_HOME will still be respected so long as HOSTILE_CACHE_DIR is unset.

57be277

Nice! Thanks!

@bdklahn
Copy link
Author

bdklahn commented Mar 7, 2024

BTW, I think os.environ is just a dictionary. So, I think, you should just be able to just use the second argument of get to set the default fallback, rather than using if.
setdefault is another useful one.

@bdklahn
Copy link
Author

bdklahn commented Mar 14, 2024

It doesn't look like setting . . .

os.environ['XDG_DATA_HOME']=. . .

. . . works for the Python API, for me.

My script reports the correct (overridden) value of . . .

f"os.environ['XDG_DATA_HOME']=}"

. . . but . . .
15:18:25 INFO: Saved human index (~/.local/share/hostile/human-t2t-hla)

Anyway, just thought I'd follow-up and let you know that.

Thanks.

@bede
Copy link
Owner

bede commented Mar 14, 2024 via email

@bdklahn
Copy link
Author

bdklahn commented Mar 18, 2024

Thank you for this and sorry for delay, I've been travelling this week. Which OS and Python version are you using? I'll look into it.

No worries. I've been traveling some, lately, as well.

GNU/Linux with Python 3.10
We're in a Singularity container, within a Conda env.
Looks like that is CentOS 6.10 (on a CentOS 6 host, I believe). The Conda env appears to be running Python 3.10.13.
Thanks.

@bede
Copy link
Owner

bede commented Apr 10, 2024

Hi Brian,
Thank you for your patience with this, I'm home from a second trip now – an unusual month for me. I'm afraid that I am not able to replicate your issue with an Ubuntu 22 VM, and testing the latest unreleased commit mentioned earlier allows the cache location to be overriden either using the legacy now-understood-to-be-linux-only $XDG_DATA_HOME, or the replacement and cross-platform $HOSTILE_CACHE_DIR

If it's possible for you to follow the instructions in the readme to create a development build with the latest changes, I would be very interested to know if things are working for you. Nevertheless I will probably create a new release soon.

@bede
Copy link
Owner

bede commented Apr 10, 2024

This change has been released in 1.1.0

@bdklahn
Copy link
Author

bdklahn commented Apr 11, 2024

This change has been released in 1.1.0

I'll just try it out after our service container is updated to the new package.
You can close this, if you want. I'll just open another issue, if necessary.

Also, I don't think its necessary/appropriate to transfer the minor version, when bumping the major.

https://semver.org

Also, I don't know if you knew a Python dict get method has a default parameter. No need to implement if else logic manually.
https://docs.python.org/3.11/library/stdtypes.html#dict.get

Thanks for the package update!

@bede
Copy link
Owner

bede commented Apr 12, 2024

Also, I don't think its necessary/appropriate to transfer the minor version, when bumping the major.

https://semver.org

You are mistaken – I neither bumped the major version nor 'transferred' the minor version in this release.

Also, I don't know if you knew a Python dict get method has a default parameter. No need to implement if else logic manually. https://docs.python.org/3.11/library/stdtypes.html#dict.get

Yes, and Hostile relies on it in places, feel free to submit a PR if you can simplify it without failing tests.

Thanks for the package update!

A pleasure! Feel free to reopen etc. if the problem persists.

@bdklahn
Copy link
Author

bdklahn commented Apr 12, 2024

You are mistaken – I neither bumped the major version nor 'transferred' the minor version in this release.

I guess I must be. For some reason I had the version pinned to 0.1.0 in our environment.yml file (which I thought you had bumped to 1.1.0). Maybe typo on my part, or I had come across that on an old conda web page.
Sorry.
Thanks.

@bede
Copy link
Owner

bede commented Apr 22, 2024

No problem. Sometimes conda solvers install an ancient package version in trying to satisfy pinned dependency versions, so perhaps something along these lines happened. Do let me know if you spot this happening again.

@bede bede closed this as completed Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants