Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract Data Storage away from Metric objects, and introduce Swappable Data Stores #1

Closed
wants to merge 11 commits into from

Conversation

dmagliola
Copy link

@dmagliola dmagliola commented Sep 13, 2018

This PR does a lot of things. It should really be 10 different PRs, one per commit. That was highly impractical, though.

However, please don't click "Files Changed" on the top-right, that's going to be impossible to digest. I really suggest going through these commits one-by-one


Main things this PR does:

  • A few refactorings / improvements here and there
  • Add keyword arguments to methods, to be more idiomatic within modern Ruby standards
  • Create the concept of Data Stores, separate from the Metrics the user declares, to have a clear interface that allows us to swap different implementations.

All of this is groundwork to allow us to tackle the real issue we want to tackle: Pre-fork servers

We will be working on that, and on other improvements to the Client to bring it closer in line to the Prometheus Best Practices for Client Libraries on subsequent PRs

@dmagliola dmagliola force-pushed the pluggable_data_stores branch 4 times, most recently from c787dab to e0e87e7 Compare September 14, 2018 16:45
@dmagliola dmagliola force-pushed the pluggable_data_stores branch 7 times, most recently from 4c9a9be to e2c1154 Compare October 16, 2018 14:54
@dmagliola dmagliola changed the title WIPWIPWIWPIPWIPWIWPIPW Pluggable data stores Abstract Data Storage away from Metric objects, and introduce Swappable Data Stores Oct 16, 2018
Copy link

@tscolari tscolari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM!
I think it's looking a lot better

@dmagliola dmagliola force-pushed the pluggable_data_stores branch from 0c1c0d4 to 1943b2a Compare October 18, 2018 16:34
Daniel Magliola added 10 commits October 18, 2018 17:37
Ruby < 2.0 has been EOL'd over 3 years ago, and we want to use kwargs,
so we're dropping support for it.
This should make it cleaner and more obvious to call into these methods.
Instead of allowing the first observation / setting of a metric to have
any labels, and then validating that any further observations match those,
we now require declaring what labels will be specified at the time of
instantiating the metric.

The Validator now only needs to verify that the provided labels match the
expectation, without needing to keep track of the first observation. For
now, it's still caching the validations, for performance, but that should
disappear in a soon-to-come commit.

`base_labels` now works slightly different. Instead of being a "base"
value that is set when creating the metric, and then combined with the
"other" labels at the time of exporting, it's now called `preset_values`,
and it that get merged as part of the labels stored with every observation.

This means the storage that holds the values will always have all the labels,
including the common ones. This allows having a method like the `labels()`
one recommended by the best practices, to have a "view" of a metric with
some labels pre-applied. This will be added on a future commit, once we
have a central data store.

Finally, this needed a little adaptation of the Collector middleware
that comes bundled in. This Collector automatically tracks metrics with
some pre-defined labels, but allows the user to define which labels
they want, on a per-request basis, by specifying a `proc` that will generate
the labels given a Rack `env`.

Given this design, we can't know what labels the custom `proc` will return
at the time of initializing the metric, and we don't have an `env` to
pass to them, so the interface of these procs has changed such that they
now need to be able to handle an empty `env`, and still return a `hash`
with all the right keys.
Expose only `sum` and `count` instead, with no quantiles / percentiles.

The main reason to do this is that all this change is based on the idea
of having a "Store" interface that can be shared between Ruby processes.

The `quantile` gem doesn't play well with this, since we'd need to store
instances of `Quantile::Estimator`, which is a complex data structure, and
tricky to share between Ruby processes.

Moreover, individual Summaries on different processes cannot be aggregated,
so all processes would actually have to share one instance of this class,
which makes it extremely tricky, particularly to do performantly.

Even though this is a loss of functionality, this puts the Ruby client
on par with other client libraries, like the Python one, which also only
offers sum and count without quantiles.

Also, this is actually more compliant with the Client Library best practices:
- Summary is ENCOURAGED to offer quantiles as exports
- MUST allow not having quantiles, since just count and sum is quite useful
- This MUST be the default

The original client didn't comply with the last 2 rules, where this one
does, just like the Python client.

And quantiles, while seemingly the point of summaries, are encouraged but
not required.

I'm not discarding the idea of adding quantiles back. I have ideas on how
this could be done, but it'd probably be pretty expensive, as a single
Estimator instance would have to be marshalled / unmarshalled into bytes
for every call to `observe`, and it'd have a hard requirement of having
some sort of low-cardinality Process ID as a label (possible with Unicorn)
to avoid aggregation.
The reason we need this gem is to be able to provide a good concurrency
story in our "Data Stores".

We want the data stores to be thread-safe to make it easy for metrics to
use them, and to shield away the complexity so that the store can do what's
most efficient, and metrics don't need to make assumptions on how it
works.

However, we also need metrics to be able to update multiple values at once
so in some cases (most notably, histograms), the store does need to provide
a synchronization method.

Since the normal `set` / `increment` methods call a Mutex already, having
a Mutex block around them means we end up calling `Mutex.synchronize` twice,
which *should* work, but Ruby Mutexes are not reentrant.

`concurrent-ruby` provides a Reentrant lock. It's a Read-Write Lock, not
a Mutex, but by always grabbing Write locks, it's functionally the same
Currently, each Metric object stores its data (labels and values) in a Hash
object internal to itself. This change introduces the concept of a Data Store
that is external to the metrics themselves, which holds all the data.

The main reason to do this is that by having a standardized and simple
interface that metrics use to access the store, we abstract away the details
of storing the data from the specific needs of each metric.

This allows us to then simply swap around the stores based on the needs
of different circumstances, with no changes to the rest of the client. The
main use case for this is solving the "pre-fork" server scenario, where
multiple processes need to share the metric storage, to be able to report
coherent numbers to Prometheus.

This change provides one store, a Simple Hash, that pretty much has the same
characteristics that the existing Storage had. Future commits may introduce
new ones, and also each consumer of the Prometheus Client can make their own
and simply swap them for the built-in ones, if they have specific needs.

The interface and requirements of Stores are specified in detail in a README.md
file in the `client/data_stores` directory. This is the documentation that must
be used by anyone wishing to create their own store.
The existing spec on `Formats::Text` is based on mocking a registry with
fake metrics and values inside, which means if anything changes in the
interface for metrics, the test will not catch it.

And there's no test validating that interface.

This test is more realistic, and it actually catches the kind of bugs
we introduced in the process of this refactor
`with_labels` returns a metric object, augmented with some pre-set labels.
This is useful to be able to reduce repetition if a certain part of the
code is passing the same label over and over again, but the label can
have different values in different parts of the codebase (so passing in
`preset_labels` when declaring the metric is not appropriate.
Every time a metric gets observed / incremented, we are composing the
labels hash, merging the labels that were just passed in as part of the
observation with the metric's `preset_labels`, and validating that the
labelset is valid.

If all labels are pre-set already, however (either as `preset_labels` when
declaring the metric, or by use of `with_labels`), we can validate when
the labels are pre-set, and then skip the validation on every observation,
which can lead to some decent time savings if a metric is observed often.
LabelSetValidator has methods `valid?` and `validate`. It's not very clear
what these do, what's the difference between them, and the name `valid?`
would imply (by convention) that it returns a Boolean, rather than raising
an exception on invalid input.

Renamed these to `validate_symbols!` and `validate_labelset!`, to make it
clearer what each of them do. The `!` also follows the usual Ruby
convention of "do X, and if that doesn't work, Raise"

This commit also starts drawing the distinction between an array of `labels`,
and a hash of label keys and values (which we call a `labelset`). The term
`labels` is used interchangeably for both concepts throughout the code.
This commit doesn't fix all instances, but it's a step in that direction.
@dmagliola dmagliola force-pushed the pluggable_data_stores branch from 1943b2a to 8480c94 Compare October 18, 2018 16:37
@dmagliola
Copy link
Author

Look, we haz Travis now!

@dmagliola dmagliola closed this Oct 23, 2018
@dmagliola
Copy link
Author

Closed to avoid confusing with the actual PR we've created in the actual prometheus client repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants