-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSFuse Read/Write Perf - gcloud storage api #1300
Comments
I apologize, it actually looks like it isn't a different API - but the difference in tooling within the sdk. Are there any plans to change the way the gcsfuse interacts with the API for perf improvements obtained by the new |
Hi @pmorse-cr, Thank you for the showing interest in GCSFuse! Is your workload read heavy or write heavy? To give a little context, GCSfuse is a FUSE (Filesystem in Userspace) implementation that allows you to mount a GCS bucket as a local filesystem. This means that you can access your GCS objects as if they were regular files on your computer. GCSfuse is a good option if you need to access GCS objects with standard file system operations, such as cat, ls, cp, and mv. As far as I know, both GCSFuse and gcloud storage use similar APIs to interact with the GCS. However, in the case of GCSFuse, requests go through the kernel (by design), which can result in a difference in the performance for some operations. Still, you can follow this doc to improve the performance based on your use-case. -Prince |
Hi @pmorse-cr, In addition to the above:
Also, are you talking about these improvements? I'll discuss within team, if we can incorporate any of these in GCSFuse. We look forward to hearing from you. Thanks, |
Thanks very much for the reply and sorry about the delays. We are helping a Google Cloud customer build a new transcoding pipeline for VOD services delivery globally. They are currently on AWS and one of the goals is to use as much serverless as possible, while not moving files everywhere. This is why we were hoping to use GCSFuse. We have run into some performance challenges within the drm/packaging services, where it looks like we are io-bound. In regards to performance improvements with |
Hi @pmorse-cr you mention serverless (Cloud Run?) but then talk about using K8S (GKE). Which will be used? Re Perf: Please see performance best practices, limitations, and benchmarks. Does the performance with gcloud meet your needs? If not, GCS is not the right solution. If you send me your email address, I can reach out to you with more details as i would also like to better understand the use case and see how we can collaborate |
I'd love to send you my email address and collaborate more. Is there a private way to do that vs open on this issue? Thanks @marcoa6 @raj-prince |
We will track this as a feature request to support parallel reads in which large objects are read in chunks in parallel to improve performance |
Parallel downloads, which accelerates reads of large files, by using the file cache directory as a prefetch buffer using multiple workers to download large files in parallel, was released as part of GCSfuse GCSfuse v2.4.0 |
Describe the issue
We are building a new video transcoding pipeline within Google Cloud and are looking to minimize the amount of files moving around. In video transcoding most of the time is taken in CPU/GPU, but we have found that GCSFuse is working well for smaller source files and as soon as we src >6GB video performance drops. We are testing copying to local-ssd/pd and filestore, but would like to maintain our use of GCSFuse and would like to know what the plans are to move to the new
gcloud storage
api/functions vsgsutil
. Thank you very much~!The text was updated successfully, but these errors were encountered: