Skip to content

Commit

Permalink
add more download and trace information
Browse files Browse the repository at this point in the history
  • Loading branch information
1a1a11a committed Oct 18, 2020
1 parent cf828b5 commit 5ad9e4c
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 16 deletions.
54 changes: 38 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
## Anonymized Cache Request Traces from Twitter Production

### Trace Overview
This repository describes the traces from Twitter's in-memory caching ([Twemcache](https://github.com/twitter/twemcache)/[Pelikan](https://github.com/twitter/pelikan)) clusters. The current traces were collected from one instance of 54 clusters in Mar 2020. The traces are one-week-long.
This repository describes the traces from Twitter's in-memory caching ([Twemcache](https://github.com/twitter/twemcache)/[Pelikan](https://github.com/twitter/pelikan)) clusters. The current traces were collected from 54 clusters in Mar 2020. The traces are one-week-long.
More details are described in the following paper and blog.
* [Juncheng Yang, Yao Yue, Rashmi Vinayak, A large scale analysis of hundreds of in-memory cache clusters at Twitter. _14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)_, 2020](https://www.usenix.org/conference/osdi20/presentation/yang).
* blog post

---

### Trace Format
The traces are compressed with [zstd](https://github.com/facebook/zstd), to decompress run `zstd -d /path/file`.
Expand All @@ -26,43 +27,64 @@ Note that during key anonymization, we preserve the namespaces, for example, if
A sample of the traces are attached under samples.


---

### Trace Download
The full traces are large (3.2 TB in compressed form, 14 TB uncompressed), and can be downloaded from the following places.
The full traces are large (2.8 TB in compressed form, 14 TB uncompressed), and can be downloaded from the following places.

#### Carnegie Mellon University PDL cluster
https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/open_source

#### SNIA

#### Storj
see [storj](storj) for how to access (Good for worldwide access, especially Asia and Europe, but not available after Dec 2020)

#### Baidu pan
https://pan.baidu.com/s/1Jm2nAW-UhsjXU6JYoA07LA access code: wcws (Good for Asia access, but UI only has Chinese)


* Carnegie Mellon University PDL lab cluster: https://ftp.pdl.cmu.edu/pub/datasets/twemcacheWorkload/open_source
* SNIA
* Storj
* Baidu pan
These traces are splitted into smaller files of 1000000000 lines (smaller for SNIA) each and compressed with zstd, so a file with name clusterN.0.zst means this file contains the first 1000000000 requests of cluster N.

These traces are splitted into smaller files of 1000000000 lines (100000000 for SNIA) each and compressed with zstd, so a file with name X_cache.0.zst means this file contains the first 1000000000 requests of X_cache cluster trace.
Feel free to contact us if you have problem downloading the traces.


---

### Choice of traces for different evaluations
miss ratio related (admission, eviction)
For different evaluation purposes, we recommend the following clusters/workloads

* **miss ratio related (admission, eviction)**: cluster52, cluster17 (low miss ratio), cluster18 (low miss ratio), cluster24, cluster44, cluster45, cluster29.

write-heavy workloads

* **write-heavy workloads**: cluster12, cluster15, cluster31, cluster37.

TTL-related
mix of small and large TTLs

* **TTL-related**: mix of small and large TTLs: cluster 52, cluster22, cluster25, cluster11; small TTLs only: cluster18, cluster19, cluster6, cluster7.

small TTLs only

---

Object sizes
Small objects

### More information about each workload is included under `stat/`
We release a computed statistics of each cluster workload under `stat/`, the latest is [here](stat/2020Mar.md).
This table includes the following fields, each field is the mean value of the metric either from production or from the traces.

Large objects
The fields include `production miss ratio`,
`workload category` (1: storage, 2: computation, 3: transient item), `key size`, `value size`, `request rate`, `mean object frequency`, `one-hit-wonder ratio (%)`, `compulsory miss ratio (%)`, `common TTLs`, `working set size`, `operations`, `Zipf alpha`.


---

### Misc
* Please join our [discussion channel](http://groups.google.com/group/?) for questions and updates.
* We provide a **[trace bibliography](bibliography.bib)** of papers that have used and/or analyzed the traces, and encourage anybody who publishes one to add it to the bibliography by creating an issue or pull request on GitHub.


---

### Acknowledgement
We thank SNIA and Carnegie Mellon University PDL lab for hosting the traces.
We thank Carnegie Mellon University PDL lab, SNIA and Storj for hosting the traces.


### License
Expand Down
Loading

0 comments on commit 5ad9e4c

Please sign in to comment.