-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should S3Path stop using the FilePathsBase interface? #226
Comments
I think it would make sense for an S3Path to roughly mimic the AWS console commands,
(I don't think we need I think they should match Base's semantics though, not the AWS console ones, e.g. And then some level of path-manipulation operations could make sense too (manipulating S3Path objects, i.e. not as an API to interact with s3):
But they need to be pretty careful to follow s3's semantics for prefixes. And I think these functions don't really make sense:
And these are the tricky ones:
which I don't know how to deal with. |
...our team has lost a lot of time collectively to [diagnosing/working around/bug fixing/knowledge sharing] issues that boil down to
and
so I'm strongly in favor of this proposal. |
If I'm understanding this correctly, the problem isn't with trying to have an overlapping API for various filesystem operations, but rather with having fallbacks that don't work with the special cases introduced in this package and S3? Back when the API was initially designed around a type hierarchy the assumption is that you'd just overload whatever didn't work for your particular type. If you think That being said, even if FilePathsBase.jl didn't make any fallback assumption regarding the interface, any operations that extend the Base.Filesystem or FilePathsBase.jl interface would need to provide consistent behaviour otherwise we can't really write generic application code (same issue as we're finding with the default fallbacks). Perhaps the safest path forward is to introduce |
The issue with overloading all the methods that don’t work is that we don’t know which ones those are and tend to find out at bad times or in code that’s hard to quickly iterate on for unrelated reasons. That’s why I think it’s better to start with no fallbacks for AWSS3 and implement things as-needed with a particular eye to s3’s semantics. This is particular to s3 because the semantics are really different than other file systems, so these problems come up more. We could add a bunch of overloads with MethodErrors instead but I don’t really see the advantage of that; sounds annoying to maintain and update as FilePathsBase gets new functions. |
That's fine, but I think my recommendation still holds that you shouldn't plan to extend Base.Filesystem functions if you plan to have functions that behave fundamentally differently (e.g., changing what
That doesn't happen very often, but sure. |
I don't really get this point, do you mean returning file = first(readdir(path; join=true))
bytes = read(file) whether or not I would argue that returning |
urgh, rofinn/FilePathsBase.jl#152 broke things for AWSS3 since it assumes Workaround: Base.parse(::Type{P}, path::P; kwargs...) where {P <: S3Path} = path edit: turns out this was already filed here: #227 |
Yeah, cause |
Ok, well I think that's breaking the interface presented by Base, and moreover adding the |
I don't think it's the responsibility of the FilePathBase package to adjust to adhoc API decisions within |
adhoc API decisions? You approved that PR back in January and then 10 months later decided to make a different API decision in FilesPathBase... |
I'm also referring to various changes to how configs, constructors and delimiters are handled over the past couple years, not just that particular PR. I approved cause there was an explicit use-case that needed to be solved and there wasn't anything technically wrong with the solution within that context. I noticed an issue that I missed during code review which made generalizing it problematic. I realize that's frustrating, but it happens. |
As AWSS3.jl is very sensitive to FilePathsBase.jl changes we should probably use a fixed version of FilePathsBase.jl or a fixed version range. At least this would be a good short term fix but we'd need extensive testing in AWSS3.jl to ensure we can update this range safely |
Would it be possible to use |
Yeah, I can just tag a 1.0 release of FilePathsBase.jl. Any new features (to support new julia releases) would then be minor releases that you can choose to accept or not.
Sure, but that would be yet another breaking release before 1.0. Why can't
|
My take on this is that the problem is that the underlying S3 API is a mess and contains some half-assed features without proper support (e.g. versioning). I suppose there's a good argument to be made that it was just a key-value store so nobody should have been treating it like a file system, but the fact is that all sorts of things that commonly make use of S3 assume it can be used that way and this has certainly made its way into the API itself by now. I therefore cannot see the benefit in dropping FilePathsBase as we all know it's going to be used like a file system and it's just going to wind up implementing all the methods anyway. The practical effect will be that we will have FilePathsBase look-alike which is far less compatible than it could be. As @rofinn pointed out, even if every method from FilePathsBase had to be re-implemented, it seems preferable to do so rather than abandon it. That said, I think this package has exposed a number of issues that suggest some reformation of FilePathsBase is in order. It seems somewhat overzealous in assuming that everything which looks like a file system must implement everything required for POSIX compatibility. Certainly one could implement a file system without any concept of permissions, for example. It's been a while since I've looked at FilePathsBase, but I think that some traits-based interface for indicating different file system capabilities on the part of implementations is in order. |
I have made a speculative draft PR outlining what I think a more general FilePathsBase might look like. I have done very little, but I was hoping that interested parties in this thread could comment on the approach before I put any real effort into it. Please let me know your thoughts. |
232: Set FilePathsBase compatibility to 0.9.11 - 0.9.15 r=omus a=omus Fixes #227 by avoiding using FilePathsBase versions that require `readdir` when using `join=true` to return strings. As changing the return type could be breaking for some users we'll just limit the versions of FilePathsBase supported here for now for this release. For AWSS3.jl version 0.10 we'll correct that problem in this package and start using a minimum version of FilePathsBase 0.9.16 At first glance appears to address #226 but it does not as we are just temporarily setting an upperbound. Also, we cannot use Pkg version ranges using a hypen as that requires a minimum version of Julia 1.4 Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
232: Set FilePathsBase compatibility to 0.9.11 - 0.9.15 r=omus a=omus Fixes #227 by avoiding using FilePathsBase versions that require `readdir` when using `join=true` to return strings. As changing the return type could be breaking for some users we'll just limit the versions of FilePathsBase supported here for now for this release. For AWSS3.jl version 0.10 we'll correct that problem in this package and start using a minimum version of FilePathsBase 0.9.16 At first glance appears to address #226 but it does not as we are just temporarily setting an upperbound. Also, we cannot use Pkg version ranges using a hypen as that requires a minimum version of Julia 1.4 Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
Another suggestion might be to strip this package down to fundamental functionality and create a separate package If I wound up being the steward of S3Paths.jl (which I'd be willing to do), I'd do the following;
|
I do worry that splitting this up further might just make the compatibility issues worse. Part of the problem is that FilePathsBase.jl doesn't have any control over how AWSS3 overrides functionality, apart from the provide test suite. If we were to introduced a 3rd package then it would be one more point of failure. I do think your suggestion of moving towards a more trait based approach could help reduce those issues to some extent, I just haven't gotten around to reviewing your proposal yet. I wonder if it'd be simpler for S3Paths.jl package to just extend AWS.jl directly and ignore some of the assumptions/conventions that AWSS3.jl is making?
You mean drop support for object versioning? Yeah, I agree, I don't think AWS ever fully flushed that out. I always kinda thought versions should be accessible via a more specific convention (ie: |
Versioning is the worst... I hate being able to recover something I accidentally overwrote. |
That would be wonderful, but unfortunately the reality is that AWS.jl is very thin, and the AWS API is not very good, so sadly I don't think there's any way around wrapping it.
I wasn't addressing the merits of versioning, only that the AWS API treats it as a "second class" feature, not to mention that it's not a concept that's built into FilePathsBase. |
I think different folks have different needs. Recently, I've been advocating internally that we should write our own s3 utilities on top of AWS.jl, since that already provides a full wrapper of the AWS API, and then drop AWSS3. I see Beacon's needs (or at least that of my team) as...
Currently,
Patch releases of dependencies must not knowingly break the code. It's just not viable to have issues like that at the base of a software stack in my opinion. |
For me portability of code between S3 and local file systems is a huge part of why AWSS3.jl is so useful. There are a number of things such as parquet which sort of force you to treat them the same way, and it can be useful for "microservices" which treat S3 as a shared storage resource. I agree that pretending S3 paths are real file system paths is not ideal, but too many things already do that for me to embrace that. |
The purpose of the path-type for me has always to been to support the creation of generic functions which require something path-like and to allow for specialization when needed. Over time FilePathsBase.jl has become more specific to file systems while AWSS3.jl's I think the best path forward forward would be some kind of trait system where Most likely how we actually proceed here will be to create new |
It provides a lot of methods for free, but an S3Path is not like a file system path in a lot of ways, and the abstractions break down in frustrating ways. I think it might be better for AWSS3 to define it's own methods for most things, and have it such that things that are not explicitly supported by AWSS3 are not supported at all, rather than supported in a somewhat broken way (which you might not notice at first but can bite you later-- for example
cp
, which works until you have nested directories, ref #225).inspired by rofinn/FilePathsBase.jl#128 (comment)
The text was updated successfully, but these errors were encountered: