Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_helper_opts.read_args_from_stdin code means that cat retrieve-files | gsutil cp -I . -L log != cat retrieve-files | gsutil cp -L log -I . #1785

Open
jsoref opened this issue May 27, 2024 · 0 comments

Comments

@jsoref
Copy link
Contributor

jsoref commented May 27, 2024

  -I             Use ``stdin`` to specify a list of files or objects to copy. You can use
                 gsutil in a pipeline to upload or download objects as generated by a program.
                 For example:

                   cat filelist | gsutil -m cp -I gs://my-bucket

                 where the output of ``cat filelist`` is a one-per-line list of
                 files, cloud URLs, and wildcards of files and cloud URLs.
  -L <file>      Outputs a manifest log file with detailed information about
                 each item that was copied. This manifest contains the following
                 information for each item:

                 - Source path.
                 - Destination path.
                 - Source size.
                 - Bytes transferred.
                 - MD5 hash.
                 - Transfer start time and date in UTC and ISO 8601 format.
                 - Transfer completion time and date in UTC and ISO 8601 format.
                 - Upload id, if a resumable upload was performed.
                 - Final result of the attempted transfer, either success or failure.
                 - Failure details, if any.

                 If the log file already exists, gsutil uses the file as an
                 input to the copy process, and appends log items to
                 the existing file. Objects that are marked in the
                 existing log file as having been successfully copied or
                 skipped are ignored. Objects without entries are
                 copied and ones previously marked as unsuccessful are
                 retried. This option can be used in conjunction with the ``-c`` option to
                 build a script that copies a large number of objects reliably,
                 using a bash script like the following:

                   until gsutil cp -c -L cp.log -r ./dir gs://bucket; do
                     sleep 1
                   done

                 The -c option enables copying to continue after failures
                 occur, and the -L option allows gsutil to pick up where it
                 left off without duplicating work. The loop continues
                 running as long as gsutil exits with a non-zero status. A non-zero
                 status indicates there was at least one failure during the copy
                 operation.

                 NOTE: If you are synchronizing the contents of a
                 directory and a bucket, or the contents of two buckets, see
                 "gsutil help rsync".

Nothing here appears to say you must place -L ... before -I ..., but the condition below forces it:

gsutil/gslib/commands/cp.py

Lines 1066 to 1068 in a32d8f5

if copy_helper_opts.read_args_from_stdin:
if len(self.args) != 1:
raise CommandException('Source URLs cannot be specified with -I option')

When run in one order, things work fine, in the other, one gets:

CommandException: Source URLs cannot be specified with -I option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant