feat: Allow gcs paths without prefix #16
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
I want to use google cloud storage paths that do not have prefix like
gs://gwas_catalog_data/manifests/
with theGCSPath
object. This will be required for data relocation within thegenetics_etl
dag.Currently
GCSPath
object regex requires path to haveprefix
andfilename
groups which prevents us from using paths without the prefix, or paths that point directly to the directories within this class.Implementation
This PR tries to solve it by changing the regex pattern to catch
path
instead offilename and prefix
and extract the logic behind pathfilename
andprefix
parsing to theGCSPath.segments
method.POSIX_PATH_PATTERN
to capturepath
instead ofprefix
andfilename
.GCSPath.segments
andGCSPath.path
methods to preserve backwards compatibility.GCSPath
object.