Skip to content

Commit f452ecd

Browse files
authored
Add Getting Started Guide for SLM (#42878)
This commit adds a basic Getting Started Guide for SLM.
1 parent 0adf14d commit f452ecd

File tree

4 files changed

+179
-0
lines changed

4 files changed

+179
-0
lines changed

docs/reference/ilm/apis/slm-api.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ well as a separate API for immediately invoking a snapshot based on a policy.
1616
Since SLM falls under the same category as ILM, it is stopped and started by
1717
using the <<start-stop-ilm,start and stop>> ILM APIs.
1818

19+
[[slm-api-put]]
1920
=== Put Snapshot Lifecycle Policy API
2021

2122
Creates or updates a snapshot policy. If the policy already exists, the version
@@ -95,6 +96,7 @@ The top-level keys that the policy supports are described below:
9596
To update an existing policy, simply use the put snapshot lifecycle policy API
9697
with the same policy id as an existing policy.
9798

99+
[[slm-api-get]]
98100
=== Get Snapshot Lifecycle Policy API
99101

100102
Once a policy is in place, you can retrieve one or more of the policies using
@@ -155,6 +157,7 @@ GET /_slm/policy
155157
// CONSOLE
156158
// TEST[continued]
157159

160+
[[slm-api-execute]]
158161
=== Execute Snapshot Lifecycle Policy API
159162

160163
Sometimes it can be useful to immediately execute a snapshot based on policy,
@@ -325,6 +328,7 @@ Which now includes the successful snapshot information:
325328

326329
It is a good idea to test policies using the execute API to ensure they work.
327330

331+
[[slm-api-delete]]
328332
=== Delete Snapshot Lifecycle Policy API
329333

330334
A policy can be deleted by issuing a delete request with the policy id. Note
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
[[getting-started-snapshot-lifecycle-management]]
4+
== Getting started with snapshot lifecycle management
5+
6+
Let's get started with snapshot lifecycle management (SLM) by working through a
7+
hands-on scenario. The goal of this example is to automatically back up {es}
8+
indices using the <<modules-snapshots,snapshots>> every day at a particular
9+
time.
10+
11+
[float]
12+
[[slm-gs-create-policy]]
13+
=== Setting up a repository
14+
15+
Before we can set up an SLM policy, we'll need to set up a
16+
<<snapshots-repositories,snapshot repository>> where the snapshots will be
17+
stored. Repositories can use {plugins}/repository.html[many different backends],
18+
including cloud storage providers. You'll probably want to use one of these in
19+
production, but for this example we'll use a shared file system repository:
20+
21+
[source,js]
22+
-----------------------------------
23+
PUT /_snapshot/my_repository
24+
{
25+
"type": "fs",
26+
"settings": {
27+
"location": "my_backup_location"
28+
}
29+
}
30+
-----------------------------------
31+
// CONSOLE
32+
// TEST
33+
34+
[float]
35+
=== Setting up a policy
36+
37+
Now that we have a repository in place, we can create a policy to automatically
38+
take snapshots. Policies are written in JSON and will define when to take
39+
snapshots, what the snapshots should be named, and which indices should be
40+
included, among other things. We'll use the <<slm-api-put,Put Policy>> API
41+
to create the policy.
42+
43+
[source,js]
44+
--------------------------------------------------
45+
PUT /_slm/policy/nightly-snapshots
46+
{
47+
"schedule": "0 30 1 * * ?", <1>
48+
"name": "<nightly-snap-{now/d}>", <2>
49+
"repository": "my_repository", <3>
50+
"config": { <4>
51+
"indices": ["*"] <5>
52+
}
53+
}
54+
--------------------------------------------------
55+
// CONSOLE
56+
// TEST[continued]
57+
<1> when the snapshot should be taken, using
58+
{xpack-ref}/trigger-schedule.html#schedule-cron[Cron syntax], in this
59+
case at 1:30AM each day
60+
<2> whe name each snapshot should be given, using
61+
<<date-math-index-names,date math>> to include the current date in the name
62+
of the snapshot
63+
<3> the repository the snapshot should be stored in
64+
<4> the configuration to be used for the snapshot requests (see below)
65+
<5> which indices should be included in the snapshot, in this case, every index
66+
67+
This policy will take a snapshot of every index each day at 1:30AM UTC.
68+
Snapshots are incremental, allowing frequent snapshots to be stored efficiently,
69+
so don't be afraid to configure a policy to take frequent snapshots.
70+
71+
In addition to specifying the indices that should be included in the snapshot,
72+
the `config` field can be used to customize other aspects of the snapshot. You
73+
can use any option allowed in <<snapshots-take-snapshot,a regular snapshot
74+
request>>, so you can specify, for example, whether the snapshot should fail in
75+
special cases, such as if one of the specified indices cannot be found.
76+
77+
[float]
78+
=== Making sure the policy works
79+
80+
While snapshots taken by SLM policies can be viewed through the standard snapshot
81+
API, SLM also keeps track of policy successes and failures in ways that are a bit
82+
easier to use to make sure the policy is working. Once a policy has executed at
83+
least once, when you view the policy using the <<slm-api-get,Get Policy API>>,
84+
some metadata will be returned indicating whether the snapshot was sucessfully
85+
initiated or not.
86+
87+
Instead of waiting for our policy to run, let's tell SLM to take a snapshot
88+
as using the configuration from our policy right now instead of waiting for
89+
1:30AM.
90+
91+
[source,js]
92+
--------------------------------------------------
93+
PUT /_slm/policy/nightly-snapshots/_execute
94+
--------------------------------------------------
95+
// CONSOLE
96+
// TEST[skip:we can't easily handle snapshots from docs tests]
97+
98+
This request will kick off a snapshot for our policy right now, regardless of
99+
the schedule in the policy. This is useful for taking snapshots before making
100+
a configuration change, upgrading, or for our purposes, making sure our policy
101+
is going to work successfully. The policy will continue to run on its configured
102+
schedule after this execution of the policy.
103+
104+
[source,js]
105+
--------------------------------------------------
106+
GET /_slm/policy/nightly-snapshots?human
107+
--------------------------------------------------
108+
// CONSOLE
109+
// TEST[continued]
110+
111+
This request will return a response that includes the policy, as well as
112+
information about the last time the policy succeeded and failed, as well as the
113+
next time the policy will be executed.
114+
115+
[source,js]
116+
--------------------------------------------------
117+
{
118+
"nightly-snapshots" : {
119+
"version": 1,
120+
"modified_date": "2019-04-23T01:30:00.000Z",
121+
"modified_date_millis": 1556048137314,
122+
"policy" : {
123+
"schedule": "0 30 1 * * ?",
124+
"name": "<nightly-snap-{now/d}>",
125+
"repository": "my_repository",
126+
"config": {
127+
"indices": ["*"],
128+
}
129+
},
130+
"last_success": { <1>
131+
"snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2>
132+
"time_string": "2019-04-24T16:43:49.316Z",
133+
"time": 1556124229316
134+
} ,
135+
"last_failure": { <3>
136+
"snapshot_name": "nightly-snap-2019.04.02-lohisb5ith2n8hxacaq3mw",
137+
"time_string": "2019-04-02T01:30:00.000Z",
138+
"time": 1556042030000,
139+
"details": "{\"type\":\"index_not_found_exception\",\"reason\":\"no such index [important]\",\"resource.type\":\"index_or_alias\",\"resource.id\":\"important\",\"index_uuid\":\"_na_\",\"index\":\"important\",\"stack_trace\":\"[important] IndexNotFoundException[no such index [important]]\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:762)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:714)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:670)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:163)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:142)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:102)\\n\\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:280)\\n\\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\\n\\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)\\n\\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)\\n\\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)\\n\\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:834)\\n\"}"
140+
} ,
141+
"next_execution": "2019-04-24T01:30:00.000Z", <4>
142+
"next_execution_millis": 1556048160000
143+
}
144+
}
145+
--------------------------------------------------
146+
// TESTRESPONSE[skip:the presence of last_failure and last_success is asynchronous and will be present for users, but is untestable]
147+
<1> information about the last time the policy successfully initated a snapshot
148+
<2> the name of the snapshot that was successfully initiated
149+
<3> information about the last time the policy failed to initiate a snapshot
150+
<4> the is the next time the policy will execute
151+
152+
NOTE: This metadata only indicates whether the request to initiate the snapshot was
153+
made successfully or not - after the snapshot has been successfully started, it
154+
is possible for the snapshot to fail if, for example, the connection to a remote
155+
repository is lost while copying files.
156+
157+
If you're following along, the returned SLM policy shouldn't have a `last_failure`
158+
field - it's included above only as an example. You should, however, see a
159+
`last_success` field and a snapshot name. If you do, you've successfully taken
160+
your first snapshot using SLM!
161+
162+
While only the most recent sucess and failure are available through the Get Policy
163+
API, all policy executions are recorded to a history index, which may be queried
164+
by searching the index pattern `.slm-history*`.
165+
166+
That's it! We have our first SLM policy set up to periodically take snapshots
167+
so that our backups are always up to date. You can read more details in the
168+
<<snapshot-lifecycle-management-api,SLM API documentation>> and the
169+
<<modules-snapshots,general snapshot documentation.>>

docs/reference/ilm/index.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ separate APIs for managing snapshot lifecycles. Please see the
5555
<<snapshot-lifecycle-management-api,Snapshot Lifecycle Management>>
5656
documentation for information about configuring snapshots.
5757

58+
See <<getting-started-snapshot-lifecycle-management,getting started with SLM>>.
59+
5860
[IMPORTANT]
5961
===========================
6062
{ilm} does not support mixed-version cluster usage. Although it
@@ -81,3 +83,5 @@ include::error-handling.asciidoc[]
8183
include::ilm-and-snapshots.asciidoc[]
8284

8385
include::start-stop-ilm.asciidoc[]
86+
87+
include::getting-started-slm.asciidoc[]

docs/reference/modules/snapshots.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ recommend testing the reindex from remote process with a subset of your data to
6363
understand the time requirements before proceeding.
6464

6565
[float]
66+
[[snapshots-repositories]]
6667
=== Repositories
6768

6869
You must register a snapshot repository before you can perform snapshot and
@@ -322,6 +323,7 @@ POST /_snapshot/my_unverified_backup/_verify
322323
It returns a list of nodes where repository was successfully verified or an error message if verification process failed.
323324

324325
[float]
326+
[[snapshots-take-snapshot]]
325327
=== Snapshot
326328

327329
A repository can contain multiple snapshots of the same cluster. Snapshots are identified by unique names within the

0 commit comments

Comments
 (0)