Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Add azcopy upload support and switch to the default max_concurrency #1556

Merged
merged 10 commits into from
Jun 30, 2022
Merged

Add azcopy upload support and switch to the default max_concurrency #1556

merged 10 commits into from
Jun 30, 2022

Conversation

puhley
Copy link
Contributor

@puhley puhley commented Jan 3, 2022

Summary of the Pull Request

This pull request is to address issue #1477 where low bandwidth connections could not support multiple concurrent upload streams without encountering a timeout. To address this issue, two changes are being introduced.

The first change is to modify the upload_file method to use the azcopy command as its default approach. It was determined that azcopy has more robust code for handling different connection speeds than the Azure Python SDK. Therefore, the code will now attempt to use azcopy first. It is possible that the azcopy command could fail due to it not being present, an azcopy login step needing to be performed, or a similar scenario. If this occurs, then the code will fall back to using the Azure Python SDK. If they both fail, then the code will re-attempt both approaches up to the retry limit. In order to support this, a new method was added to azcopy.py for performing azcopy copy commands.

The second change is to modify upload_file_data method to use upload_file. This change will ensure that both file upload approaches are consistent in their use of azcopy, max_concurrency, and retries.

PR Checklist

Info on Pull Request

This pull request adds azcopy copy support. The existing azcopy code leveraged the sync functionality which is for coordinating two files or directories that already exist. When creating a new job, you may be uploading a new file to a new destination container. Therefore, the azcopy_copy method was created for copy operations.

The max_concurrency of 10 was removed from the Azure SDK upload_blob call within upload_file so that the default is used instead. This will help resolve an issue where low bandwidth connections could not support multiple concurrent uploads without encountering a timeout.

The upload_file_data function was converted to call upload_file rather than have its own upload code. This approach ensures that both upload_file and upload_file_data have the same behavior as requested in the issue discussion. The data passed as a string to upload_file_data is now written to a temporary file for azcopy to access within upload_file. Testing for this approach is limited since upload_file_data is not currently used within onefuzz.

Validation Steps Performed

Creating a fuzzing job will leverage the upload_file functionality. This was tested creating a job with that required both the target_exe and extra_files flags to upload files.

It was noted in the original discussion that low latency tests can be performed using:

# Create a big blob.
dd if=/dev/zero of=zeros.bin bs=300M count=1

# Create destination container.
onefuzz containers create big

# Simulate a low-bandwidth link.
sudo tc qdisc add dev eth0 root tbf rate 104kbit latency 50ms burst 1540

# Try to upload blob.
onefuzz containers files upload_file big -v zeros.bin

@puhley
Copy link
Contributor Author

puhley commented May 9, 2022

Let me know if there is anything else that you need from my side on this request.

@stishkin
Copy link
Contributor

stishkin commented May 9, 2022

@puhley is it ok if I do "Update branch" on your PR first before approving ?

@puhley
Copy link
Contributor Author

puhley commented May 10, 2022

I tried performing an update branch and two checks failed. Both are related to the new api-service code which is unrelated to this merge. One of them was a whitespace issue with one of the new files. The other was a missing zip file. Would you override these checks or should I just wait and merge again?

@stishkin
Copy link
Contributor

this PR should fix the issue:
#1925

@stishkin stishkin self-requested a review May 11, 2022 19:54
@stishkin
Copy link
Contributor

@puhley - what's the workflow you'd prefer ? should I merge the PR or you will merge it ?

@puhley
Copy link
Contributor Author

puhley commented Jun 26, 2022

@stishkin I am not a project maintainer. Therefore, I don't have the authority to merge pull requests.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants