Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve object_store crate documentation #2260

Merged
merged 4 commits into from
Aug 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions object_store/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ version = "0.3.0"
edition = "2021"
license = "MIT/Apache-2.0"
readme = "README.md"
description = "A generic object store interface for uniformly interacting with AWS S3, Google Cloud Storage and Azure Blob Storage"
description = "A generic object store interface for uniformly interacting with AWS S3, Google Cloud Storage, Azure Blob Storage and local files."
keywords = [
"object",
"storage",
Expand Down Expand Up @@ -77,4 +77,4 @@ aws = ["rusoto_core", "rusoto_credential", "rusoto_s3", "rusoto_sts", "hyper", "
[dev-dependencies] # In alphabetical order
dotenv = "0.15.0"
tempfile = "3.1.0"
futures-test = "0.3"
futures-test = "0.3"
17 changes: 15 additions & 2 deletions object_store/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,21 @@

# Rust Object Store

A crate providing a generic interface to object stores, such as S3, Azure Blob Storage and Google Cloud Storage.
A focused, easy to use, idiomatic, high performance, `async` object
store library interacting with object stores.
Comment on lines +22 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A focused, easy to use, idiomatic, high performance, `async` object
store library interacting with object stores.
A high performance, `async` object store crate that provides a
uniform interface for interacting with various kinds of object store

I think all the qualifiers is a bit much 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am practicing "marketing" -- I would like to run an experiment to see how much attention I can garner with a buzzword laden lead compared to a more utilitarian, less self-aggrandizing, summary.


Originally developed for [InfluxDB IOx](https://github.com/influxdata/influxdb_iox/) and later split out and donated to Apache Arrow.
Using this crate, the same binary and code can easily run in multiple
clouds and local test environments, via a simple runtime configuration
change. Supported object stores include:

* [AWS S3](https://aws.amazon.com/s3/)
* [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
* [Google Cloud Storage](https://cloud.google.com/storage)
* Local files
* Memory
* Custom implementations


Originally developed for [InfluxDB IOx](https://github.com/influxdata/influxdb_iox/) and later split out and donated to [Apache Arrow](https://arrow.apache.org/).

See [docs.rs](https://docs.rs/object_store) for usage instructions
5 changes: 3 additions & 2 deletions object_store/src/aws.rs
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,7 @@ impl From<Error> for super::Error {
}
}

/// Configuration for connecting to [Amazon S3](https://aws.amazon.com/s3/).
/// Interface for [Amazon S3](https://aws.amazon.com/s3/).
pub struct AmazonS3 {
/// S3 client w/o any connection limit.
///
Expand Down Expand Up @@ -599,7 +599,8 @@ fn convert_object_meta(object: rusoto_s3::Object, bucket: &str) -> Result<Object
/// # let BUCKET_NAME = "foo";
/// # let ACCESS_KEY_ID = "foo";
/// # let SECRET_KEY = "foo";
/// let s3 = object_store::aws::AmazonS3Builder::new()
/// # use object_store::aws::AmazonS3Builder;
/// let s3 = AmazonS3Builder::new()
/// .with_region(REGION)
/// .with_bucket_name(BUCKET_NAME)
/// .with_access_key_id(ACCESS_KEY_ID)
Expand Down
5 changes: 3 additions & 2 deletions object_store/src/azure.rs
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ impl From<Error> for super::Error {
}
}

/// Configuration for connecting to [Microsoft Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/).
/// Interface for [Microsoft Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/).
#[derive(Debug)]
pub struct MicrosoftAzure {
container_client: Arc<ContainerClient>,
Expand Down Expand Up @@ -587,7 +587,8 @@ fn url_from_env(env_name: &str, default_url: &str) -> Result<Url> {
/// # let ACCOUNT = "foo";
/// # let BUCKET_NAME = "foo";
/// # let ACCESS_KEY = "foo";
/// let azure = object_store::azure::MicrosoftAzureBuilder::new()
/// # use object_store::azure::MicrosoftAzureBuilder;
/// let azure = MicrosoftAzureBuilder::new()
/// .with_account(ACCOUNT)
/// .with_access_key(ACCESS_KEY)
/// .with_container_name(BUCKET_NAME)
Expand Down
5 changes: 3 additions & 2 deletions object_store/src/gcp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ struct CompleteMultipartUpload {
parts: Vec<MultipartPart>,
}

/// Configuration for connecting to [Google Cloud Storage](https://cloud.google.com/storage/).
/// Interface for [Google Cloud Storage](https://cloud.google.com/storage/).
#[derive(Debug)]
pub struct GoogleCloudStorage {
client: Arc<GoogleCloudStorageClient>,
Expand Down Expand Up @@ -792,7 +792,8 @@ fn reader_credentials_file(
/// ```
/// # let BUCKET_NAME = "foo";
/// # let SERVICE_ACCOUNT_PATH = "/tmp/foo.json";
/// let gcs = object_store::gcp::GoogleCloudStorageBuilder::new()
/// # use object_store::gcp::GoogleCloudStorageBuilder;
/// let gcs = GoogleCloudStorageBuilder::new()
/// .with_service_account_path(SERVICE_ACCOUNT_PATH)
/// .with_bucket_name(BUCKET_NAME)
/// .build();
Expand Down
128 changes: 121 additions & 7 deletions object_store/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,129 @@

//! # object_store
//!
//! This crate provides APIs for interacting with object storage services.
//! This crate provides a uniform API for interacting with object storage services and
//! local files via the the [`ObjectStore`] trait.
//!
//! It currently supports PUT (single or chunked/concurrent), GET, DELETE, HEAD and list for:
//! # Create an [`ObjectStore`] implementation:
//!
//! * [Google Cloud Storage](https://cloud.google.com/storage/)
//! * [Amazon S3](https://aws.amazon.com/s3/)
//! * [Azure Blob Storage](https://azure.microsoft.com/en-gb/services/storage/blobs/#overview)
//! * In-memory
//! * Local file storage
//! * [Google Cloud Storage](https://cloud.google.com/storage/): [`GoogleCloudStorageBuilder`](gcp::GoogleCloudStorageBuilder)
//! * [Amazon S3](https://aws.amazon.com/s3/): [`AmazonS3Builder`](aws::AmazonS3Builder)
//! * [Azure Blob Storage](https://azure.microsoft.com/en-gb/services/storage/blobs/):: [`MicrosoftAzureBuilder`](azure::MicrosoftAzureBuilder)
//! * In Memory: [`InMemory`](memory::InMemory)
//! * Local filesystem: [`LocalFileSystem`](local::LocalFileSystem)
//!
//! # Adapters
//!
//! [`ObjectStore`] instances can be composed with various adapters
//! which add additional functionality:
//!
//! * Rate Throttling: [`ThrottleConfig`](throttle::ThrottleConfig)
//! * Concurrent Request Limit: [`LimitStore`](limit::LimitStore)
//!
//!
//! # Listing objects:
//!
//! Use the [`ObjectStore::list`] method to iterate over objects in
//! remote storage or files in the local filesystem:
//!
//! ```
//! # use object_store::local::LocalFileSystem;
//! # // use LocalFileSystem for example
//! # fn get_object_store() -> LocalFileSystem {
//! # LocalFileSystem::new_with_prefix("/tmp").unwrap()
//! # }
//!
//! # async fn example() {
//! use std::sync::Arc;
//! use object_store::{path::Path, ObjectStore};
//! use futures::stream::StreamExt;
//!
//! // create an ObjectStore
//! let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
//!
//! // Recursively list all files below the 'data' path.
//! // 1. On AWS S3 this would be the 'data/' prefix
//! // 2. On a local filesystem, this would be the 'data' directory
//! let prefix: Path = "data".try_into().unwrap();
//!
//! // Get an `async` stream of Metadata objects:
//! let list_stream = object_store
//! .list(Some(&prefix))
//! .await
//! .expect("Error listing files");
//!
//! // Print a line about each object based on its metadata
//! // using for_each from `StreamExt` trait.
//! list_stream
//! .for_each(move |meta| {
//! async {
//! let meta = meta.expect("Error listing");
//! println!("Name: {}, size: {}", meta.location, meta.size);
//! }
//! })
//! .await;
//! # }
//! ```
//!
//! Which will print out something like the following:
//!
//! ```text
//! Name: data/file01.parquet, size: 112832
//! Name: data/file02.parquet, size: 143119
//! Name: data/child/file03.parquet, size: 100
//! ...
//! ```
//!
//! # Fetching objects
//!
//! Use the [`ObjectStore::get`] method to fetch the data bytes
//! from remote storage or files in the local filesystem as a stream.
//!
//! ```
//! # use object_store::local::LocalFileSystem;
//! # // use LocalFileSystem for example
//! # fn get_object_store() -> LocalFileSystem {
//! # LocalFileSystem::new_with_prefix("/tmp").unwrap()
//! # }
//!
//! # async fn example() {
//! use std::sync::Arc;
//! use object_store::{path::Path, ObjectStore};
//! use futures::stream::StreamExt;
//!
//! // create an ObjectStore
//! let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
//!
//! // Retrieve a specific file
//! let path: Path = "data/file01.parquet".try_into().unwrap();
//!
//! // fetch the bytes from object store
//! let stream = object_store
//! .get(&path)
//! .await
//! .unwrap()
//! .into_stream();
//!
//! // Count the '0's using `map` from `StreamExt` trait
//! let num_zeros = stream
//! .map(|bytes| {
//! let bytes = bytes.unwrap();
//! bytes.iter().filter(|b| **b == 0).count()
//! })
//! .collect::<Vec<usize>>()
//! .await
//! .into_iter()
//! .sum::<usize>();
//!
//! println!("Num zeros in {} is {}", path, num_zeros);
//! # }
//! ```
//!
//! Which will print out something like the following:
//!
//! ```text
//! Num zeros in data/file01.parquet is 657
//! ```
//!

#[cfg(feature = "aws")]
Expand Down