Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw mysql fields to the mysql event #3001

Merged
merged 1 commit into from
Nov 18, 2016

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Nov 14, 2016

This will include all mysql stats into the event under mysql.stats.raw. All keys will change to lower case.

Discussions:

  • What should be namespace for these fields? raw, extended
  • Should we normalise the keys to lower case?
  • Should we replace _ by .?
    • This can cause namespace conflicts if a_b and a_b_c has a value (happens in mysql)
    • Also we could hit index.mapping.depth.limit (default 20)
  • Should this be experimental and available for all metricsets?
  • What are we doing about the dynamic filed limit? -> disable it? Add size or number of fields limit on dynamic mapping elasticsearch#11443 (default 1000)

Potential improvements:

  • Remove duplicates which are already part of the event

TODO:

  •  Update docs
  •  Update changelog

@ruflin ruflin added the discuss Issue needs further discussion. label Nov 14, 2016
@tsg
Copy link
Contributor

tsg commented Nov 14, 2016

+1 for raw and I think we should stick to the original names as much as possible (since this is the "raw" section). That means keeping _ as they are, which also avoids the transformation issues.

I guess it would be nicer to have this in it's own metricset.

@ruflin
Copy link
Contributor Author

ruflin commented Nov 15, 2016

@tsg I did some tests with key names in elasticsearch and also spaces and lots of special characters just work. That means also for MongoDB we can use the original keys. In case one of the services uses dots we then have to decide if these should be replaced by _ to prevent potential conflicts.

I would prefer to have the "raw" fields inside the same metricset as it is like an extension of the metricset itself and only one config option is need to enable the additional fields. Overtime I think more and more fields will be moving from the raw to the structured document.

Implementation wise I would probably make raw a default config option that is available in each metricset, but it is up to the MetricSet to implement the additional fields.

@andrewkroh
Copy link
Member

Adding both the curated and raw fields seems like it creates a lot of duplicated data. As a user I think I would only want to store one copy of the data. Maybe we could remove fields that are in the schema from the raw fields. This means the module can send data for fields that are unknown a priori, and we can still provide an optimized experience for the fields that are known (like normalizing storage units to bytes so they render nicely in Kibana or using scaled_floats in mappings for percentages).

+1 for lower-casing raw fields

@ruflin
Copy link
Contributor Author

ruflin commented Nov 15, 2016

Agree that we should not duplicate fields. It would be nice if we could use apply.Schema() to remove the fields which were actually found so we would only iterate over the existing fields afterwards.

@tsg
Copy link
Contributor

tsg commented Nov 15, 2016

I was trying to think through a likely user scenario:

  1. user tries Metricbeat with default settings. She is generally happy.
  2. But one day, she discovers that a metric that is exported by the service is not present in Metricbeat
  3. Looks up a solution, learns about the raw option. Enables it.
  4. Happy about the one metric she wanted, but not happy about the other 435 metrics added 😢

To me it sounds like we should not care about duplication but rather allow to cherry pick raw metrics. I'm not sure how to do that from the usability point of view, especially how to find out which metrics are available.

CC @brandonmensing

@brandonmensing
Copy link

I think part of the joy of Metricbeat is that it's easy so I'm with you on the sadness of that user scenario!

I think this usage pattern might not be the most common so maybe a phased approach here would be good. Maybe we do 1 today, 2 later?

  1. Just grab the whole thing. Having the options is still better than not.
  2. Allow the user to specify a list of extra fields to pick up. User has to look it up in the docs or a 3rd party reference to figure out what those fields are.
  3. When we learn more about how much anyone uses this, maybe we'll migrate extra fields into the main metricset and/or come up with a better way to allow them to configure extra fields. Maybe we develop an extra metricset of 'advanced' fields.

@ruflin
Copy link
Contributor Author

ruflin commented Nov 15, 2016

@tsg I think your use case will describe it pretty well. Our "standard" metricset should cover 90-95% of the uses cases. More experience users will look for additional fields. Based on the unhappiness in point 4 I hope 5 happens: Open Github issue or PR to add the missing metric. The hard part will be where do we say "no" and not add it to the base metricset?

If someone only wants to add one metric I would hope we can solve this with filters:

Drop event if (mysql.status.info.raw.* AND NOT mysql.status.info.raw.your_metric)

Otherwise a whitelist approach as proposed by @brandonmensing could also be interesting. But both approaches should probably only be added in a second step when we see how / if people are going to use it.

BTW: This reminds me of the first metricbeat implementation where there was a base, extended and full metricset instead of the filters.

@ruflin ruflin force-pushed the mysql-extended branch 2 times, most recently from 2f7537d to c6ee4d5 Compare November 17, 2016 15:51
@ruflin ruflin added Metricbeat Metricbeat review in progress Pull request is currently in progress. and removed discuss Issue needs further discussion. labels Nov 17, 2016
@@ -57,6 +57,7 @@ type MetricSet interface {
Host() string // Host returns a hostname or other module specific value
// that identifies a specific host or service instance from which to collect
// metrics.
Raw() bool // Raw defines if raw metrics should be sent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid adding new methods to this interface were possible. In this case, when you need this information it is available in the MetricSet by m.Module().Config().Raw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, changed.

@@ -111,6 +118,7 @@ type ModuleConfig struct {
MetricSets []string `config:"metricsets" validate:"required"`
Enabled bool `config:"enabled"`
Filters processors.PluginConfig `config:"filters"`
Raw bool `config:"raw"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also add a godoc comment explaining the meaning of this field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_, exists := rawData["Threads_cached"]
assert.False(t, exists)

// Check random oder raw field if available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Check a raw field if it is available

@@ -111,6 +118,7 @@ type ModuleConfig struct {
MetricSets []string `config:"metricsets" validate:"required"`
Enabled bool `config:"enabled"`
Filters processors.PluginConfig `config:"filters"`
Raw bool `config:"raw"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also add a godoc comment explaining the meaning of this field.

assert.True(t, schema.HasKey("test_b"))
assert.True(t, schema.HasKey("test_a"))
assert.False(t, schema.HasKey("nooo"))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think HasKey should return true for both the "source" keys and the "output" keys. Here's the behavior I expected which returns true only for the "source" keys.

func TestHasKey(t *testing.T) {
    schema := Schema{
        "test": Conv{Key: "Test", Func: nop},
        "test_obj": Object{
            "test_a": Conv{Key: "TestA", Func: nop},
            "test_b": Conv{Key: "TestB", Func: nop},
        },  
    }   

    assert.True(t, schema.HasKey("Test"))
    assert.True(t, schema.HasKey("TestA"))
    assert.True(t, schema.HasKey("TestB"))
    assert.False(t, schema.HasKey("test"))
    assert.False(t, schema.HasKey("test_obj"))
    assert.False(t, schema.HasKey("test_a"))
    assert.False(t, schema.HasKey("test_b"))
    assert.False(t, schema.HasKey("other"))
}

which would mean that hasKey() would be:

func hasKey(key string, mappers map[string]Mapper) bool {
    for _, mapper := range mappers {
        if mapper.HasKey(key) {
            return true
        }
    }
    return false
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously you are right -> wrong test leads to wrong implementation 👎

@ruflin
Copy link
Contributor Author

ruflin commented Nov 17, 2016

@andrewkroh New version pushed. The current config option is generic. Even though I currently only documented it in the status metricset as this is the only place available. As soon as we add it to other metricsets, we should have a general documentation.

@@ -32,6 +32,7 @@ type MetricSet struct {
mb.BaseMetricSet
dsn string
db *sql.DB
raw bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@@ -81,6 +81,7 @@ type BaseMetricSet struct {
name string
module Module
host string
raw bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@ruflin ruflin removed the in progress Pull request is currently in progress. label Nov 18, 2016
Each metricset can add raw fields under the `raw` namespace in addition to the existing fields. The raw fields should contain the raw data coming the service itself. Fields are not changed in any way.

As a first example, raw fields were added to mysql status.

Remove unused raw variables
@andrewkroh andrewkroh merged commit 0b33ee3 into elastic:master Nov 18, 2016
@ruflin ruflin deleted the mysql-extended branch November 18, 2016 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants