-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46776: Add zip creation and ingest #1105
Conversation
fd8d31d
to
4eecd6a
Compare
c932f1c
to
03874b4
Compare
@timj, ticket says that |
@andy-slac it's in the pipe_base PR. lsst/pipe_base#452 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, small bunch of comments/questions. I tried to understand as much as I could, but some pieces are very new to me (or I forgot much of it already).
python/lsst/daf/butler/datastores/file_datastore/retrieve_artifacts.py
Outdated
Show resolved
Hide resolved
python/lsst/daf/butler/datastores/file_datastore/retrieve_artifacts.py
Outdated
Show resolved
Hide resolved
41d08e1
to
37090e1
Compare
6a2484d
to
9449aea
Compare
Fpr chained datastore retrieveArtifacts was checking each dataset for file existence in a loop which can be really slow on GCS. Instead use knows_these and assume the datastore does have the file if it has the ref.
Refactor some of the zip index code as part of this.
No longer use list[DatasetRef] but use SerializedDatasetRefContainer
This should be fine since we attempt to make the file names unique. It also works around an issue in ResourcePath where a "#" character is treated as part of a directory path if found in the non-file part of the path.
For now rejects if any refs are rejected.
9449aea
to
1b157d3
Compare
…rtifacts This allows Zip retrieval to work in QBB mode.
This makes it easier to work out which file comes from which dataset ref and guarantees each file will be unique without directory path.
This does not fix the problem of retrieve-artifacts retrieving Zip files and ending up with a bad index.
Co-authored-by: Andy Salnikov <salnikov@slac.stanford.edu>
And remove some duplicated code that was writing the index a second time.
This allows pydantic to drop them from the JSON.
Which is rare with the new query system but is possible.
There is nothing to be gained by including all the dimensions other than taking extra space.
This makes it much more explicit what the parameters mean.
The mapping included all the paths so no reason to return the paths separately.
This is important to ensure that the same prefix is used each time if the artifact has multiple refs associated with it.
1b157d3
to
c510c7f
Compare
Depends on lsst/resources#98
Checklist
doc/changes
configs/old_dimensions