Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dependencies between monitor, SLO & dashboards #668

Closed

Conversation

Tony-Proum
Copy link

Don't know if this could handle of the complexity of this issue : #667
But maybe this simple fix could do the trick

@Tony-Proum Tony-Proum requested review from a team as code owners September 17, 2020 09:34
@jirikuncar
Copy link
Contributor

@Tony-Proum it looks good. Would you mind adding a test scenario?

@Tony-Proum
Copy link
Author

Oh, ok, I thought that the solution may be more complicated, but as it looks good, I'll add a test for this feature

@Tony-Proum
Copy link
Author

@jirikuncar I think that those test are able to demonstrate what we are doing here. But I'm not sure that they follow the practices in this repository ( naming / way of writing and so forth ). Feel free to tell me if something is wrong.

@Tony-Proum
Copy link
Author

Oh, sorry, I just realized that I forgot to assert that the dashboard were forced to be recreated in the last test I have commit, I'll fix that

@Tony-Proum Tony-Proum force-pushed the feat/handle-dependencies branch 2 times, most recently from 5b78d3e to 7155738 Compare September 18, 2020 11:23
@Tony-Proum
Copy link
Author

Tony-Proum commented Sep 18, 2020

@jirikuncar It works on my machine 😆
I removed the fail fast flag on the github action part in order to see if tests succeed only on my machine. But seems like they works on linux and not on other types of OS. unfortunately I does not have access to a mac or windows build environment. And even if it was the case, I'm not a golang expert and don't know if I'll be able to locate the problem.

Do you have any idea on what could go wrong with this ?

@Tony-Proum Tony-Proum force-pushed the feat/handle-dependencies branch 2 times, most recently from 955ad89 to bc9a103 Compare September 18, 2020 19:46
@Tony-Proum
Copy link
Author

I managed to fix the test for MacOs and Windows case by removing the destroyCheck of this test. Even if it was running properly on linux, it seems to have some kind of race condition on mac or windows that's make the test fail for those OS.

Do you think that we could omit this check for this particular test ?

@@ -27,6 +27,7 @@ jobs:
env:
GOFLAGS: "-mod=vendor"
strategy:
fail-fast: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please revert it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@nmuesch nmuesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at the related issue I'm wondering if there's a way to use the newer force_delete flag on the monitor resource - https://github.com/DataDog/terraform-provider-datadog/pull/535/files to satisfy this use case.

I believe that would avoid having to delete+create any dependent dashboards/SLOs and let those update in place with the new monitor id.

@@ -3802,6 +3802,7 @@ func getServiceLevelObjectiveDefinitionSchema() map[string]*schema.Schema {
"slo_id": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause the whole dashboard to be deleted + recreated if the slo changes? I'm not sure if that'll be expected for users, since it'd mean any dashboard lists or favorited dashboards will need to be updated with the new dashboard id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's an issue any more: the dasboard_list can now be handle by the dashboard itself (see: https://github.com/DataDog/terraform-provider-datadog/pull/654/files) so in theory it should works just fine. But I' m agree, delete + recreate could be a little ... overkill 😄

@Tony-Proum
Copy link
Author

Tony-Proum commented Sep 28, 2020

Taking a look at the related issue I'm wondering if there's a way to use the newer force_delete flag on the monitor resource - https://github.com/DataDog/terraform-provider-datadog/pull/535/files to satisfy this use case.

Oh seems to be a way better solution ! (I didn't known that feature exists through the API). If this flag exists on each of this resources: dashboard, slo and monitor, it could do the trick.

I believe that would avoid having to delete+create any dependent dashboards/SLOs and let those update in place with the new monitor id.

I think that it exists a similar link for dashboard and SLO resource (but not sure). So maybe will I have to add this force_delete attribute to the SLO resource before being able to cover this entire issue (if the API supports it).

Also, I'm working on another subject today, I'll check when it's possible

Finally, I have a question about this force_delete flag, for the terraform use-case, maybe could it be default set to true ?
IMHO having to have a terraform apply fail in order to be able to see that the resource is not able to be deleted, and having to find out a force option in the documentation is not really user friendly.

Maybe could it be a lot more simpler forcing this attribute in the first place or even to use the delete + recreate solution by default ?

I do not have a strong opinion on which solution is better than the other, but I think that the user experience for terraform users could be really impacted by the solution that will emerge

@nmuesch
Copy link
Contributor

nmuesch commented Oct 9, 2020

Hey @Tony-Proum I apologize for my delay here.

Please let me know if this isn't the case, but I believe monitors is the only resource that would fail to delete if its referenced in an SLO? So we'd only need the force_delete option there?

Originally we decided against enabling forceDelete to be on by default, since the option is designed as a safegaurd (see - #451 (comment)) from inadvertently deleting monitors that are referenced inside an SLO definition. Though I do think the experience could be a bit better. Maybe we could call this out in the error thats logged and raised to the user and explicitly note the force_delete option is available?

@Tony-Proum
Copy link
Author

Tony-Proum commented Oct 14, 2020

if I remember correctly, when doing some tests, the same validation exists between SLO widget and Dasboards (but I'm not sure anymore).
I'll test the flag and see if it helps in my case: Luckily I have a monitor to change today so it will be a good test case.

@Tony-Proum
Copy link
Author

Tony-Proum commented Oct 14, 2020

@nmuesch
I tried to use the force_delete and it seems to do the job in a certain way. I got some SLOs with no monitor when something went wrong during deleting and recreating a monitor (successful delete and erroneous creation) As you said, this is a good safeguard and I'm not sure that it should be used in a ¨normal¨ use case.
Also, I'm agree, the error message could help to find this solution.

I also tried, to change the SLO on a monitor and for this part, I think that the issue were a PEBKAC as I were able to change it without issue.

If you are agree with this, I'll remove the ForceNew flag from dashboard resource as suggested by @jirikuncar, I also have to delete the associated integration test that remains. And would like to pursue with this PR: I think that it allows a better developer experience on this specific situation and a is a more straight forward solution for a normal use case. I keep in mind that the force_delete flag exists but I'm keep struggled by the workflow I had encounter to finally have it working. It's not really simple as we have to know that the force_delete flag exists, so even if the solution is in the error, the user will have to rollback is changes and then applying the force_delete flag on all impacted monitor and then apply a change on the impacted monitor.

IMHO Instead of simply delete the SLO, it seems cumbersome.
Let me know if I missed something because I'm a complete beginner with datadog 😅

@Tony-Proum
Copy link
Author

Tony-Proum commented Oct 23, 2020

@nmuesch & @jirikuncar just happens to me this morning while trying to modify an SLO:
Error: error deleting service level objective: json: cannot unmarshal string into Go struct field SLODeleteResponse.errors of type map[string]string: {"errors":"slo <slo_id> is referenced in dashboard_id: <dashboard_id>","data":null}

So I finally think that the link exists both between dashboard and slo, and between slo and monitors.

@Tony-Proum Tony-Proum requested review from jirikuncar and removed request for a team October 23, 2020 10:00
@therve
Copy link
Contributor

therve commented Jan 6, 2021

We've added the force_delete flag to SLO resources, I think this is the recommended solution for now. Thanks for your patience around this issue.

@therve therve closed this Jan 6, 2021
@Tony-Proum
Copy link
Author

Tony-Proum commented Jan 6, 2021

@therve Maybe should this flag be added to monitors as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants