Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbeat: Refactor error handling in schema.Apply() #7335

Merged
merged 1 commit into from
Jul 3, 2018

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Jun 14, 2018

schema.Apply() returned a schema.Errors object as error, with this
object is not easy to check if there was really an error, specially when
optional fields are involved. This leads to lots of cases where these
errors were being ignored.

This change introduces a series of changes to have better control on the
errors happening during the application of an schema.

The interface is kept as event, err := schema.Apply(data), but now the
error returned is always a plain error in case of one or more errors
happened, or nil if no error happened.

schema.ApplyTo() returns the list of errors, and can be used where
multiple schemas are applied for the same event.

Apply() and ApplyTo() can now receive a list of options, these
options are functions that can be used to filter the errors returned or
to access the unfiltered list of errors. This allows different
combinations of resulting errors in case of optional or required fields.
It also allows to add additional checks on all the errors happened, even
on the ones ignored for the final result. Three options are added by now:

  • AllRequired to fail if any non-optional field is missing, this
    mimics the current behaviour, and is set by default if no other option
    is set.
  • FailOnRequired to fail if any required field is missing. A new
    Required option is also added to be able to declare required fields.
  • NotFoundKeys(cb func([]string)) is an option builder that receives
    a function, this function is called with the list of missing keys, so
    additional custom checks or tests can be done.

For lists of errors, the multierror library is used now, some custom
errors have been added to help on previous modifications and to improve
some error messages. Errors on structured data contain now the full
paths of the fields.

@jsoriano jsoriano added in progress Pull request is currently in progress. libbeat labels Jun 14, 2018
@@ -55,7 +51,7 @@ func (conv Conv) HasKey(key string) bool {
// implements Mapper interface for structure
type Object map[string]Mapper

func (o Object) Map(key string, event common.MapStr, data map[string]interface{}) *Errors {
func (o Object) Map(key string, event common.MapStr, data map[string]interface{}) multierror.Errors {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Object.Map should have comment or be unexported

key string
message string
errorType ErrorType
type KeyNotFoundError struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type KeyNotFoundError should have comment or be unexported

@ruflin
Copy link
Contributor

ruflin commented Jun 15, 2018

+100 on that change. Now that we have a result struct we can also add potentially more / other data in the future if we need it.

@ruflin
Copy link
Contributor

ruflin commented Jun 15, 2018

Two other options I was thinking of on what we could return:

event, result, err := schema.Apply()
if err != nil {
   return err
}

It would make error checking more go like and often people would probably skip the result.

result, err := schema.Apply()
if err != nil {
   return err
}
result.Event ...

An other option is to have event as part of the result. When I read the code I asked myself why is event not part of the result as it's kind of the result.

I'm good with all 3 option as all 3 are much better then we have now. Thanks for coming up with this.

@jsoriano
Copy link
Member Author

jsoriano commented Jun 15, 2018

The main reason why I chose to keep the event as first returned value and the "result" as second was to keep it more similar to the current interface so it continue working the same as before for the majority of cases where the error is being ignored.

But it is true that in any case the signature is changing and maybe it is a good idea to force the review of uses of this function, specially where the error is being ignored. I like the result, err := schema.Apply() form, I'll give a try to this.

}

func (err *Error) IsType(errorType ErrorType) bool {
return err.errorType == errorType
func NewWrongFormatError(key string, msg string) *WrongFormatError {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function NewWrongFormatError should have comment or be unexported

}

func (err *Error) SetType(errorType ErrorType) {
err.errorType = errorType
type WrongFormatError struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type WrongFormatError should have comment or be unexported

key string
message string
errorType ErrorType
func NewKeyNotFoundError(key string) *KeyNotFoundError {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function NewKeyNotFoundError should have comment or be unexported

k.key = v
}

type KeyNotFoundError struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type KeyNotFoundError should have comment or be unexported

)

type ErrorType int
type KeyError interface {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type KeyError should have comment or be unexported

for _, err := range errors {
if err, ok := err.(schema.KeyError); ok {
err.SetKey(convMap.Key + "." + err.Key())
}
Copy link
Member Author

@jsoriano jsoriano Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to have more meaningful errors on nested objects, previously a missing key would report something like Key "segments" not found, now it reports key "total.segments" not found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, that was not very nice in the past.

@@ -16,68 +16,67 @@ var (
"docs": c.Dict("docs", s.Schema{
"count": c.Int("count"),
"deleted": c.Int("deleted"),
}),
}, c.DictOptional),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these optional modifiers because there are test files that don't contain these fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of that, added option to Apply to fail only on required fields.

errors, ok := errs.(*s.Errors)
if ok {
assert.True(t, errors.HasRequiredErrors(), "mapping error: %s", errors)
assert.EqualError(t, errors, "Required fields are missing: ,source")
Copy link
Member Author

@jsoriano jsoriano Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruflin this is the only place where I have seen that we'd make use of ApplyResult.NotFound, to check if source is in the list of missing fields, but to take advantage of this we'd have to complicate the eventsMapping function to return the result and not only the event and the errors. Also, I think that here it'd be enough with testing that it returns an error for the case of having a missing field, going further looks like if we were testing here the behaviour of schema.Apply, that is already being tested in the schema package.

Given that, I wonder again if it worths it to have the ApplyResult object. From the perspective of a consumer of the API, it is cleaner to just have event, err := schema.Apply(data), with an event and an error that can be used directly. I think it is also more valuable to be able to give directly the semantics of the use case, as in event.MetricSetFields, err = schema.Apply(index) or nodeData, _ := schemaXpack.Apply(node) instead of obtaining the result and then assigning result.Event to a meaningful variable. From our perspective, if we want to test more particular cases of missing fields, I think that a better place would be the schema package.

I'd go back to the approach in #7167, but with the additional improvements in error handling added here too, and with ApplyTo still returning a list of errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that from a user perspective it is much more convenient if the event can be assigned directly. It's also the reason why I started to think about 3 arguments.

For testing for the missing fields which are optional I think this is something specific to metricsets and not necessarly internal testing of Apply. The reason it only exists in one place so far I think is more that it was too complicated to do it. When we start to add more versions and have very subtle differences between each version I think tests for missing fields become more common.

I assume moving forward we will make optional fields the default as we discussed in the past and not required. In this scenario most fields will be optional. So I our testing will potentially more look like: max 3 missing fields and we get an error if there are more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the approach again, to event, err := schema.Apply(data, options...), options is optional, and can be used to select the mode of operation of Apply (so for example we can make all fields optional by default). Or to access unfiltered errors, so additional tests can be done on the missing fields for example. The test that was checking the missing fields has been refactored to this approach.

I have updated the description of the PR accordingly.

}

func TestInvalidJsonForBadFormatShouldThrowError(t *testing.T) {
t.Skip("the code is not really checking for uknown fields")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was passing because schema.Apply() returned the empty schema.Errors, it was not detecting the unknown field in the (valid) JSON file. This test can be probably removed if the missing field is optional, if is not optional, an explicit check should be added in eventsMapping to check that it exists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is not optional, explicit check added and skip removed.

@jsoriano jsoriano added the discuss Issue needs further discussion. label Jun 22, 2018
"total": c.Int("total"),
"disconnects": c.Int("disconnects"),
"total": c.Int("total", s.Optional),
"disconnects": c.Int("disconnects", s.Optional),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields don't exist in the test input:

	Error:      	Received unexpected error:
	            	2 errors: key `requests.total` not found; key `requests.disconnects` not found
	Test:       	TestStats
	Messages:   	_meta/test/stats.700.json

keyErr, ok := errs[0].(s.KeyError)
if assert.True(t, ok, c.Description) {
assert.Equal(t, c.Expected, keyErr.Key(), c.Description)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyTo can be used as a more advanced method for special cases, it still returns the full list of errors, that can be further analyzed if needed.

@jsoriano jsoriano added the Metricbeat Metricbeat label Jun 22, 2018
@jsoriano
Copy link
Member Author

jenkins, test this again, please

"github.com/elastic/beats/libbeat/common"
)

var DefaultApplyOptions = []ApplyOption{AllRequired}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported var DefaultApplyOptions should have comment or be unexported

@@ -130,6 +135,13 @@ func Optional(c Conv) Conv {
return c
}

// The required flag provokes an error in case the key

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported function Required should be of the form "Required ..."

@@ -294,6 +312,12 @@ func DictOptional(c ConvMap) ConvMap {
return c
}

// The optional flag provokes an error in case the key doesn't exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported function DictRequired should be of the form "DictRequired ..."

@jsoriano jsoriano changed the title libbeat: Return a result object on schema.Apply() libbeat: Refactor error handling in schema.Apply() Jun 28, 2018
@jsoriano jsoriano removed the in progress Pull request is currently in progress. label Jun 28, 2018
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach and now that I see it I'm quite happy that we kept iterating on it.

This will give us a lot of flexibility on how we can test things and making the switch to optional by default really easy.

for _, err := range errors {
if err, ok := err.(schema.KeyError); ok {
err.SetKey(convMap.Key + "." + err.Key())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, that was not very nice in the past.

@@ -66,25 +66,24 @@ var (
}
)

func eventMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) []error {
func eventMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) error {
var all struct {
Data map[string]interface{} `json:"_all"`
}

err := json.Unmarshal(content, &all)
if err != nil {
r.Error(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nice thing that we can do now is that we can check for the error from eventMapping and report it in only 1 place. The current code with reporting and returning the error now is only returning the error as there is only 1 type of errors left.

@@ -29,28 +31,34 @@ var (
schema = s.Schema{
"insert_order": c.Int("insert_order"),
"priority": c.Str("priority"),
"source": c.Str("source"),
"source": c.Str("source", s.Required),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick this field as required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a test failing if this field is not present, but I'm thinking now that all fields here seem to be required, I'm going to revert this part.

@jsoriano jsoriano force-pushed the schema-result branch 4 times, most recently from 4f3231c to b63168f Compare June 29, 2018 14:43
`schema.Apply()` returned a `schema.Errors` object as error, with this
object is not easy to check if there was really an error, specially when
optional fields are involved. This leads to lots of cases where these
errors were being ignored.

This change introduces a series of changes to have better control on the
errors happening during the application of an schema.

The interface is kept as `event, err := schema.Apply(data)`, but now the
error returned is always a plain error in case of one or more errors
happened, or nil if no error happened.

`schema.ApplyTo()` returns the list of errors, and can be used where
multiple schemas are applied for the same event.

`Apply()` and `ApplyTo()` can now receive a list of options, these
options are functions that can be used to filter the errors returned or
to access the unfiltered list of errors.  This allows different
combinations of resulting errors in case of optional or required fields.
It also allows to add additional checks on all the errors happened, even
on the ones ignored for the final result. Three options are added by now:
* `AllRequired` to fail if any non-optional field is missing, this
mimics the current behaviour, and is set by default if no other option
is set.
* `FailOnRequired` to fail if any required field is missing. A new
`Required` option is also added to be able to declare required fields.
* `NotFoundKeys(cb func([]string))` is an option builder that receives
a function, this function is called with the list of missing keys, so
additional custom checks or tests can be done.

For lists of errors, the `multierror` library is used now, some custom
errors have been added to help on previous modifications and to improve
some error messages. Errors on structured data contain now the full
paths of the fields.
@jsoriano jsoriano removed the discuss Issue needs further discussion. label Jun 29, 2018
@jsoriano
Copy link
Member Author

jsoriano commented Jul 2, 2018

@exekias @andrewkroh if you have time, could you also take a look to this PR? Thanks!

@jsoriano jsoriano requested review from exekias and andrewkroh July 2, 2018 10:23
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work here! I think error handling has improved a lot with this, thank you for taking it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants