Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

momento: support store workflow #240

Merged
merged 4 commits into from
Aug 5, 2024

Conversation

tylerburdsall
Copy link
Contributor

@tylerburdsall tylerburdsall commented Jul 16, 2024

Problem

  • Momento client needs to be updated to v0.42.0
  • Need support for Momento store workflow
    • Unlike Redis/Memcache/Momento cache, a store is a high-performance, low-latency storage system for storing persistent items
    • Unlike Redis/Memcache/Momento cache, the amount of APIs a Momento store supports is very small for now: get/set/delete
  • Need support to have a storage workflow run simultaneously to a cache workflow

Solution

  • Upgraded Momento client
  • Added support for a separate storage client config. client and storage should not share the same config settings
  • Added support for separate store workflow(s), only supporting the Momento protocol (for now).
  • DRY'd up the WorkItem typing so that reconnection code can be reused between the cache client and store client code, but the distinct enum per work item type remains distinct. This ensures there is still compile-time checks for the right enum being sent to a Receiver<T> (see https://github.com/iopsystems/rpc-perf/pull/240/files#r1680025693)

Result

Built binary and deployed to a local testing environment, running with only a store config. Example config:

[general]
# specify the protocol to be used
protocol = "momento"
# the interval for stats integration and reporting
interval = 60
# the number of intervals to run the test for
duration = 3600
# optionally, we can write some detailed stats to a file during the run
#json_output = "stats.json"
# run the admin thread with a HTTP listener at the address provided, this allows
# stats exposition via HTTP
admin = "127.0.0.1:4444"

[debug]
# choose from: error, warn, info, debug, trace
log_level = "info"
# optionally, log to the file below instead of standard out
# log_file = "rpc-perf.log"
# backup file name for use with log rotation
log_backup = "rpc-perf.log.old"
# trigger log rotation when the file grows beyond this size (in bytes). Set this
# option to '0' to disable log rotation.
log_max_size = 1073741824

[target]
# we don't need to specify any endpoints for momento
endpoints = []

[storage]
# number of threads used to drive client requests
threads = 4
# number of gRPC clients to initialize, each maintains at least one TCP stream
poolsize = 43
# an upper limit on the number of concurrent requests per gRPC client
concurrency = 20
# the connect timeout in milliseconds
connect_timeout = 1000
# set the timeout in milliseconds
request_timeout = 1000
store_name = "my-store"

[workload]
# the number of threads that will be used to generate the workload
threads = 1

[workload.ratelimit]
# set a global ratelimit for the workload
start = 20_000


# Note that we can constrain the number of keys in the keyspace and specify that
# the generated values are random bytes with 128B values.
[[workload.stores]]
# sets the relative weight of this keyspace: defaults to 1
weight = 1
# sets the length of the key, in bytes
klen = 100
# sets the number of keys that will be generated
nkeys = 10_000
# sets the value length, in bytes
vlen = 4000
# use random bytes for the values
vkind = "bytes"
# controls what commands will be used in this keyspace
commands = [
    { verb = "get", weight = 1 },
    { verb = "set", weight = 1 },
]

Upon executing the newly generated binary, I could confirm that the store workflow successfully exercised traffic and recorded expected metrics a distinct SET and GET metrics.

Comment on lines +888 to +897
#[derive(Debug, PartialEq)]
pub enum ClientWorkItemKind<T> {
Reconnect,
Request { request: T, sequence: u64 },
}

pub async fn reconnect<TRequestKind>(
work_sender: Sender<ClientWorkItemKind<TRequestKind>>,
config: Config,
) -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the impetus for DRY'ing up the WorkItem types. It seemed unnecessary to copy and paste this entire sequence of code to support a Store client when the end result and means is the exact same.

The only difference between the two is the inner request type, which can be specified at compile-time (as seen in changes throughout the codebase).

src/config/workload.rs Outdated Show resolved Hide resolved
@brayniac
Copy link
Contributor

Overall this looks good to me. We can ignore the cargo audit failure. I think if you merge main into this feature branch and address the merge conflict, that this might go away.

Let's also add some CLI metrics that are meaningful for storage workloads.

Once that's all addressed, I'm fine with merging this. Thanks for opening the PR!

@tylerburdsall
Copy link
Contributor Author

Overall this looks good to me. We can ignore the cargo audit failure. I think if you merge main into this feature branch and address the merge conflict, that this might go away.

Let's also add some CLI metrics that are meaningful for storage workloads.

Once that's all addressed, I'm fine with merging this. Thanks for opening the PR!

@brayniac thank you for the feedback! I've gone ahead and fixed merge conflicts as well as added the CLI metrics for storage workloads in c168050

Yes, a lot of it is copy-paste from client metrics, but I realized as I was adding it that the storage client is indeed a distinct set of metrics and we should capture that. As a user, there is a very real possibility I have a cache client with different set of configurations and resulting behaviors when compared to the store client, and I would want to see a good comparison of how the two are performing.

I've also gone ahead and changed SET to PUT, since that is what the actual API is.

Finally, I added distinct FOUND and NOT_FOUND metrics. A store is intended to be persistent, hence the distinction. HIT and MISS are cache-specific and infer the possibility of an item being there

@brayniac brayniac merged commit 23b1c11 into iopsystems:main Aug 5, 2024
13 of 14 checks passed
@tylerburdsall tylerburdsall deleted the momento-0.42.0 branch August 5, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants