Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option for using the Microsoft VSO-Hash digest function #84

Merged
merged 1 commit into from
Jun 6, 2019

Conversation

erikmav
Copy link
Contributor

@erikmav erikmav commented Jun 3, 2019

This allows Remote Exec implementations for Microsoft Build Accelerator (https://github.com/Microsoft/BuildXL) to exchange its standard hashes

This allows Remote Exec implementations for Microsoft Build Accelerator (https://github.com/Microsoft/BuildXL) to exchange its standard hashes
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no Pull requests whose authors are not covered by a CLA with Google. label Jun 3, 2019
@buchgr
Copy link
Contributor

buchgr commented Jun 4, 2019

LGTM. Can you please sign the CLA?

@erikmav
Copy link
Contributor Author

erikmav commented Jun 4, 2019

Signed CLA

@buchgr
Copy link
Contributor

buchgr commented Jun 5, 2019

@Erikma I believe it didn't work. Did you sign with the same e-mail address as is used in your git commit?

@erikmav
Copy link
Contributor Author

erikmav commented Jun 5, 2019

CLA problems, working on finding the right MS-internal CLA owner. ETA probably a couple of days.

@erikmav
Copy link
Contributor Author

erikmav commented Jun 6, 2019

Retry on CLA

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes Pull requests whose authors are covered by a CLA with Google. and removed cla: no Pull requests whose authors are not covered by a CLA with Google. labels Jun 6, 2019
@buchgr buchgr merged commit a6504b1 into bazelbuild:master Jun 6, 2019
@johnterickson
Copy link

Hey @EdSchouten - You asked about this PR at BazelCon and I thought I'd add to @Erikma's answer :)

The pages in the VSO0 hash are used for performance as they are hashed concurrently on a single processor using SIMD operations (e.g. as provided by bcrypt).

The blocks can be hashed concurrently on multiple threads or systems. This is super helpful when performing large uploads to a CAS. If you have an X GB file, at no point does any HTTP handler have to hash all X bytes. Each call is only responsible for hashing 2MB.

We also have intra-file deduplication. We don't talk about it much (yet!) but it powers Universal Packages and Pipeline Artifacts in Azure DevOps. It's based on https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/overview so it uses variable-sized blocks.

Best way to play with it is to fire up a free DevOps account and tinker with Pipeline Artifacts. You can see the efficacy in the output spew:

Total Content: 21,232.0 MB
Physical Content Uploaded: 105.7 MB
Logical Content Uploaded: 220.6 MB
Compression Saved: 114.9 MB
Deduplication Saved: 21,011.4 MB
Number of Chunks Uploaded: 3,655
Total Number of Chunks: 927,490

santigl pushed a commit to santigl/remote-apis that referenced this pull request Aug 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Pull requests whose authors are covered by a CLA with Google.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants