Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md to point to new docs links on cloud.google.com #1187

Merged
merged 1 commit into from
Jun 21, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 11 additions & 193 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,138 +15,19 @@ Cloud Storage FUSE is open source software, released under the
# ABOUT
## What is Cloud Storage FUSE?

Cloud Storage FUSE is a FUSE adapter that lets you mount and access Cloud
Storage buckets as a local filesystem, so applications can read and write Cloud
Storage objects using standard file system semantics. Cloud Storage FUSE can be
run anywhere with connectivity to Cloud Storage, including Google Kubernetes
Engine, Google Compute Engine VMs or on-premises systems.

With Cloud Storage FUSE, you can take advantage of the scale, affordability,
throughput, and simplicity that Google Cloud Storage provides, while maintaining
compatibility with applications that use filesystem semantics without having to
refactor applications to use native Cloud Storage APIs
Cloud Storage FUSE is an open source FUSE adapter that lets you mount and access Cloud Storage buckets as local file systems. For a technical overview of Cloud Storage FUSE, see https://cloud.google.com/storage/docs/gcs-fuse.

## Cloud Storage FUSE for machine learning

Cloud Storage is a common choice for users looking to store training data,
models, checkpoints, and logs for machine learning projects in Cloud Storage
buckets. With Cloud Storage FUSE, objects in Cloud Storage buckets can be
accessed as files mounted as a local filesystem. Cloud Storage FUSE is
officially supported when used for workloads based on PyTorch and TensorFlow
Machine Learning frameworks.

Cloud Storage FUSE provides the following benefits:
- Mount and access Cloud Storage buckets with standard file system semantics,
providing a portable journey for ML workloads that eliminates application
refactoring costs
- From training, to inference, to checkpointing, leverage the native high scale
and performance of Cloud Storage to run your ML workloads at scale
- Start training jobs quickly by providing compute resources with direct access
to data in Cloud Storage, rather than having to copy it down to a local
filesystem instance.

Cloud Storage FUSE can be deployed as a regular Linux package, but is also
available as part of a managed turn-key offering with Google Kubernetes Engine
(GKE). In addition, Cloud Storage FUSE is integrated with [Vertex AI] including
Google pre-built Machine Learning images such as a [Deep Learning Virtual Machine]
image, or [Deep Learning Container image].

[Vertex AI]: https://cloud.google.com/vertex-ai/docs/training/code-requirements#fuse
[Deep Learning Virtual Machine]: https://cloud.google.com/deep-learning-vm
[Deep Learning Container image]: https://cloud.google.com/deep-learning-containers

## Technical Overview

Cloud Storage FUSE works by translating object storage names into a file and
directory system, interpreting the “/” character in object names as a directory
separator so that objects with the same common prefix are treated as files in
the same directory. Applications can interact with the mounted bucket like a
simple file system, providing virtually limitless file storage running in the cloud.

While Cloud Storage FUSE has a file system interface, it is not like a true
POSIX, NFS or CIFS file system on the backend. Cloud Storage FUSE retains the
same fundamental characteristics of Cloud Storage, preserving the scalability of
Cloud Storage in terms of size and aggregate performance while maintaining the
same latency and single object performance. As with the other access methods,
Cloud Storage does not support concurrency and locking. For example, if multiple
Cloud Storage FUSE clients are writing to the same file, the last flush wins.

## Key Differences from a POSIX file system:

Cloud Storage FUSE does not provide full POSIX support. While Cloud Storage
FUSE has a file system interface, it is not like a true POSIX, NFS or CIFS file
system on the backend. It is ideal for use cases where Cloud Storage has the right
performance and scalability characteristics for an application, and only basic file
system semantics are missing. When deciding if Cloud Storage FUSE is an appropriate
solution, there are some additional differences compared to local file systems that
should be taken into account:

**Performance**: Cloud Storage FUSE is a client interface to Cloud Storage and
therefore has the same performance characteristics of Cloud Storage. See
performance best practices [here](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/performance.md)

**Metadata**: Cloud Storage FUSE does not transfer metadata along with the file when
uploading to Cloud Storage. This means that if you wish to use Cloud Storage FUSE
as an uploading tool, you will not be able to set [metadata] such as content type
and ACLs as you would with other uploading methods. If metadata properties are
critical, considering using [gsutil], the [JSON API] or the [Google Cloud console].
The exception to this is that Cloud Storage FUSE does store mtime and symlink targets.

[metadata]: https://cloud.google.com/storage/docs/metadata#editable
[gsutil]: https://cloud.google.com/storage/docs/gsutil
[JSON API]: https://cloud.google.com/storage/docs/json_api
[Google Cloud console]: https://console.developers.google.com/

**Concurrency**: There is no concurrency control for multiple writers to a file.
When multiple writers try to replace a file the last write wins and all previous
writes are lost - there is no merging, version control, or user notification of
the subsequent overwrite.

**Linking**: Cloud Storage FUSE does not support hard links.

**Semantics**: Some semantics are not exactly what they would be in a traditional
file system, and are documented [here](https://github.com/googlecloudplatform/gcsfuse/blob/master/docs/semantics.md#surprising-behaviors).

**Access**: Authorization for files is governed by Cloud Storage permissions.
POSIX-style access control can only be specified at the time of mounting,
works for users accessing the mount on that machine and does not affect ACLs
of objects or buckets in Cloud Storage.

**Local storage**: Objects that are new or modified will be stored in their
entirety in a local temporary file until they are closed or synced.
When working with large files, be sure you have enough local storage capacity
for temporary copies of the files, particularly if you are working with
[Google Compute Engine instances].

[Google Compute Engine instances]:https://cloud.google.com/compute/docs/instances

**Directories**: By default, only directories that are explicitly defined
(that is, they are their own object in Cloud Storage) will appear in the
file system. A flag is available to change this behavior. Atomic directory
rename is not supported.

**Error Handling**: Transient errors can occur in distributed systems like Cloud
Storage, such as network timeouts. Cloud Storage FUSE implements the Cloud
Storage [retry best practices] with exponential backoff.

[retry best practices]:https://cloud.google.com/storage/docs/retry-strategy

For additional information, see the [semantics documentation].

[semantics documentation]: https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md#files-and-dirs
To learn about the benefits of using Cloud Storage FUSE for machine learning projects, see https://cloud.google.com/storage/docs/gcsfuse-integrations#machine-learning.

For a fully supported enterprise grade filesystem, see Google Cloud [Filestore].

[Filestore]: https://cloud.google.com/filestore
## Limitations and key differences from POSIX file systems

## Charges incurred with Cloud Storage FUSE
To learn about limitations and differences between Cloud Storage FUSE and POSIX file systems, see https://cloud.google.com/storage/docs/gcs-fuse#differences-and-limitations.

Cloud Storage FUSE is available free of charge, but the storage, metadata,
and network I/O it generates to and from Cloud Storage are charged like any
other Cloud Storage interface. For information on how to estimate the costs
from Cloud Storage FUSE operations, see [Cloud Storage FUSE pricing].
## Pricing for Cloud Storage FUSE

[Cloud Storage FUSE pricing]: https://cloud.google.com/storage/docs/gcs-fuse#charges
For information about pricing for Cloud Storage FUSE, see https://cloud.google.com/storage/docs/gcs-fuse#charges.

# CSI Driver

Expand All @@ -158,76 +39,13 @@ providing a turn-key experience.

# Support

## Supported applications and platforms

Cloud Storage FUSE officially supports the latest two versions of the following
Machine Learning frameworks:

- TensorFlow V2.x
- TensorFlow V1.x
- PyTorch V1.x
## Supported frameworks and operating systems

Support outside of the above Machine Learning use cases is non-guaranteed, and on
a best effort basis.
To find out which frameworks and operating systems are supported by Cloud Storage FUSE, see https://cloud.google.com/storage/docs/gcs-fuse#supported-frameworks-os.

## Supported operating systems
## Getting support

Cloud Storage FUSE supports usage with the following operating systems:
- Ubuntu 18.04 and above
- Debian 10 and above
- CentOS 7 and above
- RHEL 7 and above
You can get support, submit general questions, and request new features by [filing issues in GitHub](https://github.com/GoogleCloudPlatform/gcsfuse/issues). You can also get support by using one of [Google Cloud's official support channels](https://cloud.google.com/support-hub).

## Getting support
See [Troubleshooting](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/troubleshooting.md) for common issue handling.

Request support via your regular Google Cloud support channels, or by opening an
issue via GitHub [here](https://github.com/GoogleCloudPlatform/gcsfuse/issues)

See [Troubleshooting] for common issue handling

[Troubleshooting]:https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/troubleshooting.md

## Limitations and restrictions

Cloud Storage FUSE is not supported for usage with the following workloads or scenarios:

- Workloads that expect POSIX compliance.
- Cloud Storage FUSE is not a traditional POSIX filesystem. For a fully supported
enterprise grade filesystem, see [Cloud Filestore]. See [Key Differences from a
POSIX filesystem](https://cloud.google.com/storage/docs/gcs-fuse#expandable-1) and [Semantics doc](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md)
- Workloads that require file locking, or concurrent writes.
- You should not write to the same file at the same time from different clients,
as there is no concurrency control for multiple writers to a file. When multiple
writers try to replace a file the last write wins and all previous writes are
lost - there is no merging, version control, or user notification of the subsequent overwrite.
For example, if multiple Cloud Storage FUSE clients are writing to the same file, the last flush wins.

- Object versioning: Cloud Storage FUSE does not support buckets with object versioning enabled.
No guarantees are made about the behavior when used with a bucket with this feature enabled, which may
return non-current object versions.
- Transcoding: Reading or modifying a file backed by an object with a contentEncoding property set may
yield surprising results, and no guarantees are made about the behavior. It is recommended that you do not use
Cloud Storage FUSE to interact with such objects.
- Retention policy: Cloud Storage FUSE does not support writing to a bucket with a Retention policy enabled - writes will fail.
Reading objects from a bucket with a Retention policy enabled is supported, but the bucket should be mounted as Read-Only
by passing the ```-o RO``` flag during mount.
- Workloads that do file patching (overwrites in place).
- Cloud Storage write semantics are for whole objects only. There is no mechanism for patching. Therefore, write patterns
like these may result in poor performance and ballooning operations costs.
- Workloads that do directory renaming:
- [Renaming] directories is by default not supported. A directory rename cannot be performed atomically in Cloud Storage
and would therefore be expensive(object copy, then delete) in terms of Cloud Storage operations, and for large directories
would have high probability of failure, leaving the two directories in an inconsistent state. For example, Hadoop writes files to a
temporary directory and at the end re-names the directory to indicate completion.
- Databases
- Databases typically use very small data sizes. Cloud Storage FUSE higher latency than a local file system, and as such,
latency and throughput will be reduced when reading or writing one small file at a time.
- Traditional Databases on File Servers require overwriting in the middle of the file, which is not supported.
- Version control systems
- Typically require file locking and file patching
- A filer replacement
- Amongst many other things that users require from filers, Cloud Storage FUSE does not support file patching, file locking,
or support native directories.

[Cloud Filestore]: https://cloud.google.com/filestore
[Renaming]:https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/semantics.md#missing-features