-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change load_dataset cache dir default #1530
Comments
load_dataset
cache dir default
I think @jakevdp just copied what scikit-learn does. I basically agree that it's not the best option. But I don't really have a sense for how widespread or intuitive the "freedesktop specification" would be (I've never heard of it). |
On Linux I believe it is widely used, at least the major DEs follow it. On my machine the On other platforms there are probably similar directories, a quick search for windows brought me to a ticket of the electron project where a similar problem is discussed. It indicates that on Windows |
The example datasets are not very large so I would like to avoid too much complexity. I'm open to changing where the data goes, but I would like to pick a single alternative. |
I don't think that there is a one-size-fits-all solution, since seaborn is used in Windows, Mac OSX and Linux, where all of these have different concepts. The only one I can think of is to use a hidden folder. Of course, the datasets are not large, but imho that doesn't really affect the problem, since it's more about the intransparent creation of a directory. Also, I believe that the added complexity isn't really too much. I had a look at how Qt handles this, since they specifically aim to ease multiplatform development. They provide a utility which returns a proper path for all platforms. It could be implemented like this:
|
If there isn't, it would weigh in favor of keeping the status quo.
The point is, if the datasets are not large, it's less important that a user who is looking to clear space on their computer would need to find them and remove. So it's ok if they end up in a somewhat obscure place. |
For me, the argument is less about space that needs to be cleared but rather that it is very easy to inadvertently create these directories at a very visible location. This is how I noticed the behavior in the first place and why I reported it. I believe that the fact that for the major platforms there are official guidelines on where to put such data (MSDN, [Apple Developer](https://developer.apple.com/icloud/documentation/data-storage/index.html, Linux). I get that for such a minor feature one does not want to add too much code which might fail in the future, however I feel that when targeting a platform one should stick to the recommended best practices.
If that isn't an option however, might using hidden files be an alternative, or disabling caching by default? EDIT: Added alternative. |
Yes, we're talking past each other. I agree that defaulting to $HOME is suboptimal. I'm giving an argument for using a consistent, possibly obscure (even hidden) location across file systems rather than trying to pick the "correct" place for every file system. I think |
Afaik using hidden files with Windows involves some magic with setting file attributes. Which other non-unix systems are there? I currently can't really think of any.
|
If you're okay with another dependency, there are things like appdirs which take care of following all the cross-platform standards for you. |
The cache locations So I’d argue those are the only correct locations for cache data on the respective OSs. |
It looks like |
You should use
|
It seems like appdirs solves the very specific problem here so it’s not obvious that stalled development is a problem. Also, flying-sheep is now blocked for being rude. |
The current default for
seaborn.load_dataset
sdata_home
directory, which is used for caching is$HOME/seaborn-data
. This leads to unwanted pollution of the user's home directory. The calls used in many seaborn examples use only the dataset's name, just from this, it is unclear that by default a cache directory will be created in such a prominent location.I believe a good alternative location be
$XDG_CACHE_DIR
, which according to the freedesktop specification is supposed to be used for user-specific non-essential (cached) data (see specification). If the environment variable is unset, a default of$HOME/.cache
is recommended.The text was updated successfully, but these errors were encountered: