Add dependencies between monitor, SLO & dashboards #668

Tony-Proum · 2020-09-17T09:34:33Z

Don't know if this could handle of the complexity of this issue : #667
But maybe this simple fix could do the trick

jirikuncar · 2020-09-17T10:32:48Z

@Tony-Proum it looks good. Would you mind adding a test scenario?

Tony-Proum · 2020-09-17T12:54:56Z

Oh, ok, I thought that the solution may be more complicated, but as it looks good, I'll add a test for this feature

Tony-Proum · 2020-09-18T06:17:41Z

@jirikuncar I think that those test are able to demonstrate what we are doing here. But I'm not sure that they follow the practices in this repository ( naming / way of writing and so forth ). Feel free to tell me if something is wrong.

Tony-Proum · 2020-09-18T07:00:03Z

Oh, sorry, I just realized that I forgot to assert that the dashboard were forced to be recreated in the last test I have commit, I'll fix that

Tony-Proum · 2020-09-18T12:45:53Z

@jirikuncar It works on my machine 😆
I removed the fail fast flag on the github action part in order to see if tests succeed only on my machine. But seems like they works on linux and not on other types of OS. unfortunately I does not have access to a mac or windows build environment. And even if it was the case, I'm not a golang expert and don't know if I'll be able to locate the problem.

Do you have any idea on what could go wrong with this ?

Tony-Proum · 2020-09-18T20:50:03Z

I managed to fix the test for MacOs and Windows case by removing the destroyCheck of this test. Even if it was running properly on linux, it seems to have some kind of race condition on mac or windows that's make the test fail for those OS.

Do you think that we could omit this check for this particular test ?

jirikuncar · 2020-09-18T12:48:24Z

.github/workflows/test.yml

@@ -27,6 +27,7 @@ jobs:
    env:
      GOFLAGS: "-mod=vendor"
    strategy:
+      fail-fast: false


can you please revert it?

nmuesch

Taking a look at the related issue I'm wondering if there's a way to use the newer force_delete flag on the monitor resource - https://github.com/DataDog/terraform-provider-datadog/pull/535/files to satisfy this use case.

I believe that would avoid having to delete+create any dependent dashboards/SLOs and let those update in place with the new monitor id.

nmuesch · 2020-09-25T18:03:29Z

datadog/resource_datadog_dashboard.go

@@ -3802,6 +3802,7 @@ func getServiceLevelObjectiveDefinitionSchema() map[string]*schema.Schema {
 		"slo_id": {
 			Type:     schema.TypeString,
 			Required: true,
+			ForceNew: true,


Will this cause the whole dashboard to be deleted + recreated if the slo changes? I'm not sure if that'll be expected for users, since it'd mean any dashboard lists or favorited dashboards will need to be updated with the new dashboard id.

I don't know if it's an issue any more: the dasboard_list can now be handle by the dashboard itself (see: https://github.com/DataDog/terraform-provider-datadog/pull/654/files) so in theory it should works just fine. But I' m agree, delete + recreate could be a little ... overkill 😄

Tony-Proum · 2020-09-28T12:35:52Z

Taking a look at the related issue I'm wondering if there's a way to use the newer force_delete flag on the monitor resource - https://github.com/DataDog/terraform-provider-datadog/pull/535/files to satisfy this use case.

Oh seems to be a way better solution ! (I didn't known that feature exists through the API). If this flag exists on each of this resources: dashboard, slo and monitor, it could do the trick.

I believe that would avoid having to delete+create any dependent dashboards/SLOs and let those update in place with the new monitor id.

I think that it exists a similar link for dashboard and SLO resource (but not sure). So maybe will I have to add this force_delete attribute to the SLO resource before being able to cover this entire issue (if the API supports it).

Also, I'm working on another subject today, I'll check when it's possible

Finally, I have a question about this force_delete flag, for the terraform use-case, maybe could it be default set to true ?
IMHO having to have a terraform apply fail in order to be able to see that the resource is not able to be deleted, and having to find out a force option in the documentation is not really user friendly.

Maybe could it be a lot more simpler forcing this attribute in the first place or even to use the delete + recreate solution by default ?

I do not have a strong opinion on which solution is better than the other, but I think that the user experience for terraform users could be really impacted by the solution that will emerge

nmuesch · 2020-10-09T16:05:22Z

Hey @Tony-Proum I apologize for my delay here.

Please let me know if this isn't the case, but I believe monitors is the only resource that would fail to delete if its referenced in an SLO? So we'd only need the force_delete option there?

Originally we decided against enabling forceDelete to be on by default, since the option is designed as a safegaurd (see - #451 (comment)) from inadvertently deleting monitors that are referenced inside an SLO definition. Though I do think the experience could be a bit better. Maybe we could call this out in the error thats logged and raised to the user and explicitly note the force_delete option is available?

Tony-Proum · 2020-10-14T09:22:25Z

if I remember correctly, when doing some tests, the same validation exists between SLO widget and Dasboards (but I'm not sure anymore).
I'll test the flag and see if it helps in my case: Luckily I have a monitor to change today so it will be a good test case.

datadog/resource_datadog_dashboard.go

datadog/resource_datadog_dashboard_dependencies_test.go

Tony-Proum · 2020-10-14T13:39:52Z

@nmuesch
I tried to use the force_delete and it seems to do the job in a certain way. I got some SLOs with no monitor when something went wrong during deleting and recreating a monitor (successful delete and erroneous creation) As you said, this is a good safeguard and I'm not sure that it should be used in a ¨normal¨ use case.
Also, I'm agree, the error message could help to find this solution.

I also tried, to change the SLO on a monitor and for this part, I think that the issue were a PEBKAC as I were able to change it without issue.

If you are agree with this, I'll remove the ForceNew flag from dashboard resource as suggested by @jirikuncar, I also have to delete the associated integration test that remains. And would like to pursue with this PR: I think that it allows a better developer experience on this specific situation and a is a more straight forward solution for a normal use case. I keep in mind that the force_delete flag exists but I'm keep struggled by the workflow I had encounter to finally have it working. It's not really simple as we have to know that the force_delete flag exists, so even if the solution is in the error, the user will have to rollback is changes and then applying the force_delete flag on all impacted monitor and then apply a change on the impacted monitor.

IMHO Instead of simply delete the SLO, it seems cumbersome.
Let me know if I missed something because I'm a complete beginner with datadog 😅

datadog/resource_datadog_service_level_objective_test.go

Tony-Proum · 2020-10-23T09:59:30Z

@nmuesch & @jirikuncar just happens to me this morning while trying to modify an SLO:
Error: error deleting service level objective: json: cannot unmarshal string into Go struct field SLODeleteResponse.errors of type map[string]string: {"errors":"slo <slo_id> is referenced in dashboard_id: <dashboard_id>","data":null}

So I finally think that the link exists both between dashboard and slo, and between slo and monitors.

therve · 2021-01-06T09:04:06Z

We've added the force_delete flag to SLO resources, I think this is the recommended solution for now. Thanks for your patience around this issue.

Tony-Proum · 2021-01-06T09:17:11Z

@therve Maybe should this flag be added to monitors as well ?

Tony-Proum requested review from a team as code owners September 17, 2020 09:34

jirikuncar added resource/dashboard labels Sep 17, 2020

Tony-Proum force-pushed the feat/handle-dependencies branch 2 times, most recently from 5b78d3e to 7155738 Compare September 18, 2020 11:23

Tony-Proum force-pushed the feat/handle-dependencies branch 2 times, most recently from 955ad89 to bc9a103 Compare September 18, 2020 19:46

jirikuncar reviewed Sep 24, 2020

View reviewed changes

Tony-Proum force-pushed the feat/handle-dependencies branch from bc9a103 to f3de3db Compare September 25, 2020 09:11

Tony-Proum requested a review from jirikuncar September 25, 2020 09:15

nmuesch reviewed Sep 25, 2020

View reviewed changes

Tony-Proum requested a review from nmuesch September 28, 2020 13:00

Tony-Proum force-pushed the feat/handle-dependencies branch from f3de3db to f4438a8 Compare October 8, 2020 15:55

Tony-Proum requested a review from a team as a code owner October 8, 2020 15:55

nmuesch mentioned this pull request Oct 13, 2020

Handle dependency between monitors, SLOs and dashboards #667

Open