[amtool] - Add a new `silence import` command #1082

lebinh · 2017-11-06T09:30:19Z

This command read silences data from a query JSON output and import to
alertmanager. It allows amtool to be used as a backup/restore tool for
silences, i.e. fix #1000

Backup / export:

amtool silence -o json > silences.json

Restore / import:

amtool silence import silences.json

This adds new silence one by one and will be slow if you have thousands of silences but alertmanager API doesn't allow multiple silences in POST so ¯\(ツ)/¯

Kellel · 2017-11-06T20:18:00Z

cli/silence_import.go

+	for _, silence := range silences {
+		// reset the ID, otherwise alertmanager API will try to replace an existing silence
+		// this *might* create duplicated silences but that should generally be OK
+		silence.ID = ""


I think we wan't to avoid duplicate silences if possible. Does the alertmanager API accept posting a new silence with the ID field present?

It will accept if the silence ID exists and will replace it with the new one. But if the ID doesn't exist then it will return an error, not found. We can check and only remove ID after receive "not found" error but that will create a lot of extra round trip in the backup / restore use case (as every add will have to check and wait for a not found error first).

Yeah, don't think we should have duplicated silences as well. Maybe skip the import if silence already exist?

Since this is for disaster recovery/data migration, I'm assuming maintaining the IDs wouldn't be important (and wouldn't expect there to be any pre-existing silences in the new instance of AM).

I will add a flag to force this, it is need as we are running Alertmanager 0.6 and thus cannot update an existing silences. I will keep the silence ID by default as it appear Alertmanager will only update existing one without creating any new silence in that case.

lebinh · 2017-11-27T03:07:33Z

Hi @stuartnelson3, @Kellel can you guys help take a look and see if we can merge this PR?

stuartnelson3 · 2017-11-27T04:31:41Z

Hey sorry for the long delay, I've been away for the last few weeks and have been behind on reviewing things. I'll take a look at this in the next few days, thanks for your patience!

lebinh · 2017-11-27T07:12:40Z

Sure @stuartnelson3 no problem, thank you for maintaining this.

josedonizetti · 2017-11-30T13:07:52Z

cli/silence_import.go

+	var err error
+
+	if len(args) == 1 {
+		input, err = os.Open(args[0])


You need to close the opened fd. defer input.Close()

This should be added after the err is checked below

stuartnelson3 · 2017-12-03T12:04:28Z

cli/silence_import.go

+	var err error
+
+	if len(args) == 1 {
+		input, err = os.Open(args[0])


This should be added after the err is checked below

stuartnelson3 · 2017-12-03T12:05:16Z

cli/silence_import.go

+		}
+	}
+
+	data, err := ioutil.ReadAll(input)


ioutil.ReadAll will load the entire input into memory. It would be better to process the information as a stream.

See https://golang.org/pkg/encoding/json/#example_Decoder_Decode_stream

Actually it ended up being a super easy change:

dec := json.NewDecoder(input) t, err := dec.Token() if err != nil { return err } for dec.More() { var s types.Silence err := dec.Decode(&s) if err != nil { log.Fatal(err) } s.ID = "" err = addSilence(s) if err != nil { msg := fmt.Sprintf("couldn't add silence: %s", s) return errors.Wrap(err, msg) } } t, err = dec.Token() if err != nil { return err } return nil

stuartnelson3 · 2017-12-03T12:33:27Z

cli/silence_import.go

+		// this *might* create duplicated silences but that should generally be OK
+		silence.ID = ""
+
+		err = addSilence(silence)


Since this is being done in serial, it will be relatively slow. We can always create a small worker pool (https://gobyexample.com/worker-pools) without much trouble. I'm fine adding it later, but if you want to add it now, feel free.

stuartnelson3 · 2017-12-03T12:35:25Z

cli/silence_import.go

+}
+
+func bulkImport(cmd *cobra.Command, args []string) error {
+	input := os.Stdin


also accepting a json stream on stdin is 👌

lebinh · 2017-12-05T06:37:17Z

Sorry for the close / re-open as I pushed the wrong commit 😞
I added a new flag, force to remove the silence ID before calling add API. The default behavior, i.e. without --force, is to only remove that ID and re-try if we received "not found" error result.

There is also a new flag --worker to specify the number of concurrent workers calling Alertmanager API.

This command read silences data from a query JSON output and import to alertmanager. It allows `amtool` to be used as a backup/restore tool for silences, i.e. prometheus#1000 Backup / export: ``` amtool silence -o json > silences.json ``` Restore / import: ``` amtool silence import silences.json ```

lebinh · 2017-12-05T07:03:08Z

Probably worth mention that theforce flag can also be used to improve speed of import in case of new silences:

$ time ./amtool silence import -w 4 silences.json
...
real	0m6.188s
user	0m0.247s
sys	0m0.087s

$ time ./amtool silence import -w 4 -f silences.json
...
real	0m3.549s
user	0m0.218s
sys	0m0.068s

The end results (with or without -f) are the same but the difference in speed is because we don't have to wait for a network roundtrip of "not found" error and then re-try.

stuartnelson3 · 2017-12-05T17:22:18Z

Awesome work! I'll give this a try tomorrow morning.

stuartnelson3

Thanks for adding the worker pool! Unfortunately, the way it's currently constructed, it will deadlock if you try to import more than 100 silences. The error channel isn't drained until AFTER all silences have been created, so the send to errc in addSilenceWorker will block.

I've added some code comments inline which fixes the problem.

stuartnelson3 · 2017-12-06T10:04:58Z

cli/silence_import.go

+	}
+
+	silences := make(chan *types.Silence, 100)
+	errs := make(chan error, 100)


can you rename this to errc?

stuartnelson3 · 2017-12-06T10:05:44Z

cli/silence_import.go

+		return errors.Wrap(err, "couldn't unmarshal input data, is it JSON?")
+	}
+
+	silences := make(chan *types.Silence, 100)


go convention for naming channels is typically to name something and then put c after it (to indicate channel). could you change this to silencec?

stuartnelson3 · 2017-12-06T10:11:02Z

cli/silence_import.go

+	}
+	for w := 0; w < workers; w++ {
+		go addSilenceWorker(silences, errs)
+	}


These three lines can be updated to help prevent a deadlock:

var wg sync.WaitGroup for w := 0; w < workers; w++ { go func(w int) { wg.Add(1) addSilenceWorker(silc, errc) wg.Done() }(w) } go func() { for err := range errc { if err != nil { errCount++ } } }()

Check out down below where we wg.Wait(), which indicates that all the silence workers are finished, and then we can close(errc).

stuartnelson3 · 2017-12-06T10:12:54Z

cli/silence_import.go

+		silences <- &s
+		count++
+	}
+	close(silences)


The final piece is here:

close(silencec) wg.Wait() close(errc)

once we're done sending all the parsed silences, we can close the channel we're ranging over (anything still in the channel waiting to be processed will be processed). from there, we wg.Wait() to know all the silences have been created and their errors sent on the error channel, and then we can close(errc).

stuartnelson3 · 2017-12-06T10:13:21Z

cli/silence_import.go

+		if err != nil {
+			errCount++
+		}
+	}


This code should be removed, it has been placed into the goroutine further up.

Move error channel reading to a goroutine to prevent deadlock and thus add a WaitGroup to synchronize.

lebinh · 2017-12-06T16:12:52Z

Thanks @stuartnelson3 these feedbacks are very helpful as I only started with Go recently. I've updated the PR as suggested.

lebinh · 2017-12-06T16:14:57Z

cli/silence_import.go

-	// read closing bracket
-	_, err = dec.Token()
-	if err != nil {
-		return errors.Wrap(err, "invalid JSON")


Removed this as I think it doesn't really make sense to throw an error over JSON format here, the silences are added already at this point.

I would actually prefer to keep it in. Even though all the silences have been added, a user should still be informed if they are attempting to use invalid json.

stuartnelson3 · 2017-12-06T16:21:07Z

I'll take a look at this tomorrow to confirm it's working, thanks for the work on this

stuartnelson3

Ok, works awesome! Took 3sec to insert 10,000 silences (and half that with --force)

lebinh · 2017-12-08T01:38:47Z

Great! Thanks @stuartnelson3

Kellel reviewed Nov 6, 2017

View reviewed changes

lebinh changed the title ~~Add a new silence import command to amtool~~ [amtool] - Add a new silence import command to amtool Nov 27, 2017

lebinh changed the title ~~[amtool] - Add a new silence import command to amtool~~ [amtool] - Add a new silence import command Nov 27, 2017

josedonizetti reviewed Nov 30, 2017

View reviewed changes

stuartnelson3 suggested changes Dec 3, 2017

View reviewed changes

lebinh closed this Dec 5, 2017

lebinh force-pushed the master branch from 4f1ea40 to ee8ac8e Compare December 5, 2017 06:30

lebinh reopened this Dec 5, 2017

lebinh force-pushed the master branch from 32b936e to cb494a7 Compare December 5, 2017 06:43

lebinh force-pushed the master branch from cb494a7 to c157667 Compare December 5, 2017 06:48

stuartnelson3 suggested changes Dec 6, 2017

View reviewed changes

Add a WaitGroup barrier

5ee9f5b

Move error channel reading to a goroutine to prevent deadlock and thus add a WaitGroup to synchronize.

lebinh commented Dec 6, 2017

View reviewed changes

stuartnelson3 approved these changes Dec 7, 2017

View reviewed changes

stuartnelson3 merged commit 9b12714 into prometheus:master Dec 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[amtool] - Add a new `silence import` command #1082

[amtool] - Add a new `silence import` command #1082

lebinh commented Nov 6, 2017

Kellel Nov 6, 2017

lebinh Nov 7, 2017

josedonizetti Nov 30, 2017

stuartnelson3 Dec 3, 2017

lebinh Dec 5, 2017

lebinh commented Nov 27, 2017

stuartnelson3 commented Nov 27, 2017

lebinh commented Nov 27, 2017

josedonizetti Nov 30, 2017 •

edited

Loading

stuartnelson3 Dec 3, 2017

stuartnelson3 Dec 3, 2017

stuartnelson3 Dec 3, 2017

stuartnelson3 Dec 3, 2017

stuartnelson3 Dec 3, 2017

stuartnelson3 Dec 3, 2017

lebinh commented Dec 5, 2017

lebinh commented Dec 5, 2017

stuartnelson3 commented Dec 5, 2017

stuartnelson3 left a comment

stuartnelson3 Dec 6, 2017

stuartnelson3 Dec 6, 2017

stuartnelson3 Dec 6, 2017

stuartnelson3 Dec 6, 2017

stuartnelson3 Dec 6, 2017

lebinh commented Dec 6, 2017

lebinh Dec 6, 2017

stuartnelson3 Dec 6, 2017

stuartnelson3 commented Dec 6, 2017

stuartnelson3 left a comment

lebinh commented Dec 8, 2017

[amtool] - Add a new silence import command #1082

[amtool] - Add a new silence import command #1082

Conversation

lebinh commented Nov 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lebinh commented Nov 27, 2017

stuartnelson3 commented Nov 27, 2017

lebinh commented Nov 27, 2017

josedonizetti Nov 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lebinh commented Dec 5, 2017

lebinh commented Dec 5, 2017

stuartnelson3 commented Dec 5, 2017

stuartnelson3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lebinh commented Dec 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuartnelson3 commented Dec 6, 2017

stuartnelson3 left a comment

Choose a reason for hiding this comment

lebinh commented Dec 8, 2017

[amtool] - Add a new `silence import` command #1082

[amtool] - Add a new `silence import` command #1082

josedonizetti Nov 30, 2017 •

edited

Loading