Skip to content

Commit

Permalink
[feat] repository-bos
Browse files Browse the repository at this point in the history
  • Loading branch information
fakeyanss committed Feb 8, 2023
1 parent 5076477 commit 9bf918e
Show file tree
Hide file tree
Showing 22 changed files with 2,776 additions and 1 deletion.
209 changes: 209 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
*.zip
target

# Created by https://www.toptal.com/developers/gitignore/api/java
# Edit at https://www.toptal.com/developers/gitignore?templates=java

### Java ###
# Compiled class file
*.class

# Log file
*.log

# BlueJ files
*.ctxt

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.nar
*.ear
*.zip
*.tar.gz
*.rar

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*
replay_pid*

# End of https://www.toptal.com/developers/gitignore/api/java
# Created by https://www.toptal.com/developers/gitignore/api/visualstudiocode
# Edit at https://www.toptal.com/developers/gitignore?templates=visualstudiocode

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets

# Local History for Visual Studio Code
.history/

# Built Visual Studio Code Extensions
*.vsix

### VisualStudioCode Patch ###
# Ignore all local history of files
.history
.ionide

# End of https://www.toptal.com/developers/gitignore/api/visualstudiocode# Created by https://www.toptal.com/developers/gitignore/api/intellij
# Edit at https://www.toptal.com/developers/gitignore?templates=intellij

### Intellij ###
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# AWS User-specific
.idea/**/aws.xml

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn. Uncomment if using
# auto-import.
# .idea/artifacts
# .idea/compiler.xml
# .idea/jarRepositories.xml
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# SonarLint plugin
.idea/sonarlint/

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest Client
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser

### Intellij Patch ###
# Comment Reason: https://github.com/joeblau/gitignore.io/issues/186#issuecomment-215987721

# *.iml
# modules.xml
# .idea/misc.xml
# *.ipr

# Sonarlint plugin
# https://plugins.jetbrains.com/plugin/7973-sonarlint
.idea/**/sonarlint/

# SonarQube Plugin
# https://plugins.jetbrains.com/plugin/7238-sonarqube-community-plugin
.idea/**/sonarIssues.xml

# Markdown Navigator plugin
# https://plugins.jetbrains.com/plugin/7896-markdown-navigator-enhanced
.idea/**/markdown-navigator.xml
.idea/**/markdown-navigator-enh.xml
.idea/**/markdown-navigator/

# Cache file creation bug
# See https://youtrack.jetbrains.com/issue/JBR-2257
.idea/$CACHE_FILE$

# CodeStream plugin
# https://plugins.jetbrains.com/plugin/12206-codestream
.idea/codestream.xml

# Azure Toolkit for IntelliJ plugin
# https://plugins.jetbrains.com/plugin/8053-azure-toolkit-for-intellij
.idea/**/azureSettings.xml

# End of https://www.toptal.com/developers/gitignore/api/intellij
# Created by https://www.toptal.com/developers/gitignore/api/macos
# Edit at https://www.toptal.com/developers/gitignore?templates=macos

### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon


# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### macOS Patch ###
# iCloud generated files
*.icloud

# End of https://www.toptal.com/developers/gitignore/api/macos
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"java.configuration.updateBuildConfiguration": "automatic"
}
134 changes: 133 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,134 @@
# elasticsearch-repository-bos
Elasticsearch基于BOS的快照与恢复

Elasticsearch 基于 BOS 的快照与恢复。

参考 [官方 repository-s3](https://github.com/elastic/elasticsearch/tree/main/modules/repository-s3) 实现。

# 使用场景

满足这些条件,可以使用这个库。

- 使用了 BOS 对象存储,很便宜
- 使用了 Elasticsearch 数据库,但没有使用百度云 Elasticsearch(是的,它太贵了)

使用这个库,你可以定时对生产环境的 Elasticsearch 集群的索引数据进行增量备份,数据保存在 BOS。

这样,你可以在集群故障后,最小损失的恢复数据。

你也可以准备一个灾备集群,借助snapshot在多套集群间同步数据,而不依赖业务模块多写。在主集群故障时,直接切换流量入口到灾备集群。

# 使用文档

## Prerequirement
了解 Elasticsearch 的 repo、snapshot,可以看看这篇文章。

Elasticsearch 支持 FS、HDFS、S3 作为备份仓库的文件存储,考虑用 BOS(兼容 S3 协议)做备份仓库。

Elasticsearch snapshot 备份 与 BOS 内部上传文件逻辑不适配,在 Elasticsearch 源码中搜索错误日志,结合 BOS 兼容 S3 接口说明,定位到问题是 BOS 兼容 S3 接口不支持批量删除文件,所以下载插件源码,修改这部分逻辑为遍历删除文件,然后编译和离线安装插件,可以成功运行备份。

## 安装 elasticsearch-repository-bos 插件

先编译插件,再安装到 es 集群。

### 手动安装

maven 编译插件:
```
mvn clean package -Dmaven.test.skip=true -Dmaven.javadoc.skip=true
```

将编译的 zip 包解压,拷贝到每个 Elasticsearch 实例的 plugins 目录下,重启集群即可。

### 编译镜像预装插件

maven 编译插件:
```
mvn clean package -Dmaven.test.skip=true -Dmaven.javadoc.skip=true
```
编译镜像:
```
docker build -f build/Dockerfile -t elasticsearch-with-repo-bos:7.6.2 .
```

也可直接执行 build.sh 脚本完成这两步:
```
bash scripts/build.sh
```

## BOS Region选择

存储备份的 BOS 可用区,对照 [BOS 服务域名](https://cloud.baidu.com/doc/BOS/s/xjwvyq9l4) 自行选择。

## 创建 repo

可以直接指定 ak、sk,该方式会在后几个版本过期,可以使用elasticsearch-keystore工具设置。 base_path 可以指定为集群名称或 repo 名称,这样可以一个 bucket 设置关联多个 repo。
```
PUT /_snapshot/test-repo
{
"type": "s3",
"settings": {
"bucket": "bucket",
"endpoint": "https://s3.bj.bcebos.com",
"access_key": "xxx",
"secret_key": "yyy",
"base_path": "test-repo"
}
}
```

## 创建与删除 snapshot
**创建 snapshot**
```
PUT /_snapshot/test-repo/snapshot_1
```

**创建 snapshot,指定备份索引**
```
PUT /_snapshot/test-repo/snapshot_1
{
"indices": "test_v1,test_v2"
}
```

**创建 snapshot,指定备份索引,以时间命名**,注意参数编码
```
PUT /_snapshot/test-repo/%3Csnapshot-%7Bnow%2Fd%7D%3E
{
"indices": "test_v1,test_v2"
}
```

**查询 snapshot 列表**
```
GET _snapshot/test-repo/_all
```

**查询 snapshot 进度**
```
GET _snapshot/test-repo/snapshot_1/_status
```

## 恢复 snapshot

恢复快照时,如果不指定重命名方式,就必须先 close 掉已经存在的索引;如果索引不存在,会直接创建出来。

> The restore operation can be performed on a functioning cluster. However, an existing index can be only restored if it’s closed and has the same number of shards as the index in the snapshot. The restore operation automatically opens restored indices if they were closed and creates new indices if they didn’t exist in the cluster.
**集群内恢复**

匹配 test_前缀的索引,执行恢复操作,恢复数据索引重命名为 restored_index_test_前缀
```
POST /_snapshot/test-repo/snapshot-2021.06.22/_restore
{
"rename_pattern": "test_(.+)",
"rename_replacement": "restored_index_test_$1"
}
```

**集群间迁移**

复制 A 集群备份仓库的 snapshot 文件夹,到一个新的 bucket 或同 bucket 的另一文件夹下,将集群 B 的备份仓库设置为这个地址,然后执行 restore 即可。

## 更多操作

参考官方文档 https://www.elastic.co/guide/en/elasticsearch/reference/master/repository-s3.html
10 changes: 10 additions & 0 deletions build/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
FROM elasticsearch:7.6.2

COPY target/repository-bos-7.6.2.zip /usr/share/elasticsearch/
COPY build/entrypoint.sh /usr/share/elasticsearch/

RUN sh -c 'chown -R 1000 /usr/share/elasticsearch/repository-bos-7.6.2.zip'

RUN chmod a+x /usr/share/elasticsearch/entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]
6 changes: 6 additions & 0 deletions build/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

unzip /usr/share/elasticsearch/repository-bos-7.6.2.zip
mv repository-bos plugins/

bin/elasticsearch
17 changes: 17 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# 实现原理
依赖 BOS 兼容的 S3 接口,可以直接复用官方 repository-s3 的实现。

在使用 repository-s3 连接 BOS 的调试过程中,逐步排查报错,定位到 s3.bj.bcebos.com 不提供批量删除 object 功能,导致没法直接使用插件直接使用 repository-s3。

修改`S3BlobContainer.java`中的批量删除方法,改为遍历循环删除 object
```
// clientReference.client().deleteObjects(deleteRequest);
deleteRequest.getKeys().stream().forEach(key -> {
try {
clientReference.client().deleteObject(deleteRequest.getBucketName(), key.getKey());
} catch (AmazonClientException e) {
LOGGER.warn("delete blobs error, key: {}", key, e);
}
});
```
性能也许会有降低,但可以忽略,有强需求可用并行流或线程池+CompletableFuture 并发删除。
Loading

0 comments on commit 9bf918e

Please sign in to comment.