-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Replaced the S3 connector implementation to use the AWS transfer manager #4272
[Perf] Replaced the S3 connector implementation to use the AWS transfer manager #4272
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Hi, I am going to have to have some help to fix the CI for my fix. On my dev box I installed the AWS SDK and the build works but apparently not all of the AWS SDK is in the CI as it cannot find the transfer library. |
@paul-amonson can you share the details of the hardware you used to evaluate? |
I used a r6i.8xlarge EC2 Instance in AWS. The smaller numbers (below 20,000,000) vary widely but in all cases "pread mt" ALWAYS under performs. The use of the TransferManager API instead of doing home grown threading, was just a choice to reduce code changes and new code (i.e. decrease new bugs). GetObject:Run: 20,000,000 Gap: 0 Count: 1 Repeats: 5 TransferManager:Run: 20,000,000 Gap: 0 Count: 1 Repeats: 5 |
transfer_manager->DownloadFile(bucket_, key_, offset, length, [&]() { | ||
return Aws::New<UnderlyingStreamWrapper>("TestTag", &streamBuffer); | ||
}); | ||
downloadHandle->WaitUntilFinished(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does WaitUntilFinished() wait indefinitely, or times out after some time?
If it is a timeout, please mention the value as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait's forever...good catch. Is there a timeout value that should be observed or should this be a new configuration value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to 5s, and expose the value as a configuration.
@majetideepak - what do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we end up in a case when 5s
is not enough? Ideally, we want to avoid configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything I do to add a timeout value here will be a busy wait. There is no TransferHandle.Wait() with a timeout. I can test completion of all parts but I don't think I can recommend a busy loop. I agree with @majetideepak, I added one for TransferManager max threads at a request but I always thought 25 threads max was a good number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to log some message somewhere if these requests take a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, you all win! I will put in the busy loop with logging warnings. BTW: How do I log in Velox. I looked in random files but its not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: I will not implement a timed abort for now. This can be added later. The warnings will suffice for now.
@paul-amonson Can you share the S3 API cost implication with the Transfer Manager? I see There is always enough parallelism at the operator level, and I worry that TM might introduce more overhead. See another PR #4276. |
It appears setting max threads to 1 is NOT the same as GetObject. Looking at the TransferManager source did not show any huge benefits except that the TM uses GetObjectAsync in the thread pool. It does use internal buffers and therefore has memcpy's. I will continue looking to see why there is this difference with 1 thread. GetObject Small:Run: 20,000,000 Gap: 0 Count: 1 Repeats: 5 TransferManager Small:Threads Available: 1 GetObject Large:Run: 900,000,000 Gap: 0 Count: 1 Repeats: 3 TransferManager Large:Threads Available: 1 |
@@ -222,6 +227,10 @@ class S3Config { | |||
"hive.s3.iam-role-session-name", std::string("velox-session")); | |||
} | |||
|
|||
int getTranferManagerMaxThreads() const { | |||
return config_->get<int>("hive.s3.transfer-manager-max-threads", 25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the file was already like this, but it would be nice to pull out these literal strings to the top of the file as constants and give them a name (kS3TransferManagerMaxThreadsConfig
or something similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to move this to HiveConfig. Filed #4300
Should be done in a separate PR.
transfer_manager->DownloadFile(bucket_, key_, offset, length, [&]() { | ||
return Aws::New<UnderlyingStreamWrapper>("TestTag", &streamBuffer); | ||
}); | ||
downloadHandle->WaitUntilFinished(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to log some message somewhere if these requests take a long time.
…er manager Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
…erManager creation. Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
@paul-amonson Do we know anything about this? We should also run the VeloxTpchBenchmark with data on S3 with this change to get further insights. |
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
The difference appears to be inside TransferManager. In fact when 1 is specified for the max_threads for TransferManager there are 2 threads in fact used. i.e. DoDownload internal API is called on a thread and then GetObjecctAsync is used using a second thread. All work is done by the calling GetObject on the callers thread. I am trying to run the benchmark suggested but I need to ramp up on Kube clusters first. |
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
CI is failing installing Minio. I am not sure why, it appears my changes here in the PR all built ok including installing the new AWS SDK. I need help understanding what is wrong. Thanks. #!/bin/bash -eo pipefail
Exited with code exit status 1 |
The upgraded SDK package is being installed at a different location(from |
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
So yet another failure: FAILED: third_party/arrow_ep/src/arrow_ep-stamp/arrow_ep-download I tried a wget on the above URL and got a 404 error. Ideas? |
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also run the VeloxTpchBenchmark with data on S3 with this change to get further insights.
@paul-amonson We need more benchmark data to evaluate this change. Let's start by comparing the TableScan times of velox-tpch-benchmark queries. The benchmark will need some modifications (listing files) since the data is now on S3.
I am also setting up an i4i EC2 instance to evaluate this change.
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
Signed-off-by: Paul Amonson <paul.d.amonson@intel.com>
I ran query 6 with the modified tpch-benchmark here #4522 and saw a performance degradation. This branch took 57.83 seconds while the main branch took 31.76 seconds. The TableScan with this change is taking a lot more time.
|
What are your observations on other TPC-H queries? |
tpch query 1 has a similar behavior
|
Can you re-run the velox_s3read_benchmark with |
Can you set the S3 threads to 1 to see if this is on par with GetObject()? FYI: I am still unable to run the velox_tpch_benchmark without a segfault. |
There is currently no benefit to adding the transfer manager to the S3 connector as long as there are CURL lib issues with higher number of threads being used for download. A future solution not using (or fixing) libcurl may require a need to revisit this but for now closing this PR. Our focus is going to be on tuning Velox IO for S3 (much work is already being done in this direction). |
RE: #3938
Summary:
Averages 3-4x or more improvement in speed. Improvement is less for smaller files but still a positive improvement. There appears to be be more variability from run to run than I like but Orri said this is expected. Discussed improvements/tuning at higher layers with Orri as future work.
Note:
Commas added for readability on large numbers.
Both Runs were using these parameters:
-gap 0 -num_in_run 1 -bytes 900,000,000 -seed 6000 -s3_config s3_config -path s3://velox-benchmark-cesg-hyperscalers-aws/test-files/1000MB.file
s3_config Contents:
hive.s3.transfer-manager-max-threads=25
Old GetObject Implementation 900MB:
Run: 900,000,000 Gap: 0 Count: 1 Repeats: 3
85.04564 MB/s 1 pread
94.39024 MB/s 1 preadv
99.87883 MB/s multiple pread
245.17638 MB/s 1 pread mt
238.32858 MB/s 1 preadv mt
248.44832 MB/s multiple pread mt
New TransferManager Implementation 900MB:
Run: 900,000,000 Gap: 0 Count: 1 Repeats: 3
401.64597 MB/s 1 pread
589.92163 MB/s 1 preadv
647.39044 MB/s multiple pread
288.90533 MB/s 1 pread mt
699.8762 MB/s 1 preadv mt
826.0291 MB/s multiple pread mt