Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix test issues #23

Merged
merged 6 commits into from
Jan 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,11 @@ Backup and Restore (BR) is a CommandLine Interface Tool to back up data of graph
- Target cluster to restore must have the same topologies with the cluster where the backup comes from

# Prerequisites
- Nebula cluster to backup/restore should start the agent service in each host

## Nebula Agent
Nebula cluster to backup/restore should start the [agent](https://github.com/vesoft-inc/nebula-agent) service in each cluster(including metad, storaged, graphd) host. Notice that, if you have multi-services in the same host, you need only start one agent. That is to say, you need exactly one agent in each cluster host no matter how many services in it.
In the future, the nebula-agent will be started automatically, but now, you should start it yourself in your cluster machines one by one. You could download it from nebula-agent repo and start it as the guidance in that repo.


# Quick Start
- Clone the tool repo:
Expand Down Expand Up @@ -54,8 +58,8 @@ bin/br version
If not specified, will backup all spaces.

--storage string backup target url, format: <SCHEME>://<PATH>.
<SCHEME>: a string indicating which backend type. optional: local, hdfs.
now hdfs and local is supported, s3 and oss are still experimental.
<SCHEME>: a string indicating which backend type. optional: local, s3.
now only s3-compatible backend is supported.
example:
for local - "local:///the/local/path/to/backup"
for s3 - "s3://example/url/to/the/backup"
Expand Down Expand Up @@ -86,16 +90,16 @@ bin/br version
--s3.region string S3 Option: set region or location to upload or download backup
--s3.secret_key string S3 Option: set secret key for access id
--storage string backup target url, format: <SCHEME>://<PATH>.
<SCHEME>: a string indicating which backend type. optional: local, hdfs.
now hdfs and local is supported, s3 and oss are still experimental.
<SCHEME>: a string indicating which backend type. optional: local, s3.
now only s3-compatible backend is supported.
example:
for local - "local:///the/local/path/to/backup"
for s3 - "s3://example/url/to/the/backup"
```

For example, the command below will list the information of existing backups in HDFS URL `hdfs://0.0.0.0:9000/example/backup/path`
For example, the command below will list the information of existing backups in S3 URL `s3://127.0.0.1:9000/br-test/backup`
```
br show --s3.endpoint "http://192.168.8.214:9000" --storage="s3://br-test/backup/" --s3.access_key=minioadmin --s3.secret_key=minioadmin
br show --s3.endpoint "http://127.0.0.1:9000" --storage="s3://br-test/backup/" --s3.access_key=minioadmin --s3.secret_key=minioadmin
```

Output of `show` subcommand would be like below:
Expand Down Expand Up @@ -125,8 +129,8 @@ bin/br version
--name string Specify backup name

--storage string backup target url, format: <SCHEME>://<PATH>.
<SCHEME>: a string indicating which backend type. optional: local, hdfs.
now hdfs and local is supported, s3 and oss are still experimental.
<SCHEME>: a string indicating which backend type. optional: local, s3.
now only s3-compatible backend is supported.
example:
for local - "local:///the/local/path/to/backup"
for s3 - "s3://example/url/to/the/backup"
Expand All @@ -144,7 +148,7 @@ bin/br version
br restore full --storage "local:///home/nebula/backup/" --meta "127.0.0.1:9559" --name BACKUP_2021_12_08_18_38_08
```

- Clean up temporary files if any error occured during backup. It will clean the files in cluster and external storage.
- Clean up temporary files if any error occured during backup. It will clean the files in cluster and external storage. You could also use it to clean up old backups files in external storage.
```
Usage:
br cleanup [flags]
Expand All @@ -156,8 +160,7 @@ bin/br version
--name string Specify backup name

--storage string backup target url, format: <SCHEME>://<PATH>.
<SCHEME>: a string indicating which backend type. optional: local, hdfs.
now hdfs and local is supported, s3 and oss are still experimental.
<SCHEME>: a string indicating which backend type. optional: local, s3.
example:
for local - "local:///the/local/path/to/backup"
for s3 - "s3://example/url/to/the/backup
Expand All @@ -174,7 +177,7 @@ bin/br version
BR CLI would send an RPC request to leader of the meta services of Nebula Graph to backup the cluster. Before the backup is created, the meta service will block any writes to the cluster, including DDL and DML statements. The blocking operation is involved with the raft layer of cluster. After that, meta service send an RPC request to all storage service to create snapshot. Metadata of the cluster stored in meta services will be backup as well. Those backup files includes:
- The backup files of storage service are snapshots of wal for raft layer and snapshots of lower-level storage engine, rocksdb's checkpoint for example.
- The backup files of meta service are a list of SSTables exported by scanning some particular metadatas.
After backup files generated, a metafile which describing this backup would be generated. Along with the backup files, BR CLI would upload those files and the meta file into user specified backends. Note that for local disk backend, backup files would be copied to a local path of services defined by `--storage`, the meta file would be copied into a local path of the host where BR CLI running at.
After backup files generated, a metafile which describing this backup would be generated. Along with the backup files, BR CLI would upload those files and the meta file into user specified backends. Note that for local disk backend, backup files would be copied to each local path of services defined by `--storage`, the meta file would be copied into a local path of the host where BR CLI running at. That is to say, when restore, the BR CLI must run in the same host which it runs when backup.

## Restore
BR CLI would first check the topologies of the target cluster and the backup. If not match the requirements, the restore operation would be abort.
Expand All @@ -186,3 +189,11 @@ bin/br version

Note: BR CLI depend on agents in cluster hosts to upload/download the backup files between the external storage and the cluster machines.

## Local Storage Mode

Local mode have strictly usage preconditions:
1. BR CLI could be only used in the same machine when backup/restore/cleanup/show all the time.
2. If you have multi-metad, you should use a shared filesystem path as the local uri which is mounted to all the cluster machines, such as nfs, distributed filesystem. Otherwise, you will restore metad service failed.

Then we suggest that you should only use local storage in experiment environment. In production environment, s3-compatible storage backend is highly recommended.

2 changes: 1 addition & 1 deletion cmd/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import (
func NewBackupCmd() *cobra.Command {
backupCmd := &cobra.Command{
Use: "backup",
Short: "backup Nebula Graph Database",
Short: "backup Nebula Graph Database to external storage for restore",
SilenceUsage: true,
}

Expand Down
2 changes: 1 addition & 1 deletion cmd/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import (
func NewRestoreCmd() *cobra.Command {
restoreCmd := &cobra.Command{
Use: "restore",
Short: "restore Nebula Graph Database",
Short: "restore Nebula Graph Database, notice that it will restart the cluster",
SilenceUsage: true,
}
config.AddCommonFlags(restoreCmd.PersistentFlags())
Expand Down
2 changes: 1 addition & 1 deletion cmd/show.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ var backendUrl string
func NewShowCmd() *cobra.Command {
showCmd := &cobra.Command{
Use: "show",
Short: "show backup info",
Short: "show backup info list in external storage",
SilenceUsage: true,
RunE: func(cmd *cobra.Command, args []string) error {
err := log.SetLog(cmd.Flags())
Expand Down
2 changes: 1 addition & 1 deletion pkg/backup/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func NewBackup(ctx context.Context, cfg *config.BackupConfig) (*Backup, error) {
// upload the meta backup files in host to external uri
// localDir are absolute meta checkpoint folder in host filesystem
// targetUri is external storage's uri, which is meta's root dir,
// has pattern like local://xxx, hdfs://xxx
// has pattern like local://xxx, s3://xxx
func (b *Backup) uploadMeta(host *nebula.HostAddr, targetUri string, localDir string) error {
agentAddr, err := b.hosts.GetAgentFor(b.meta.LeaderAddr())
if err != nil {
Expand Down
51 changes: 47 additions & 4 deletions pkg/cleanup/cleanup.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,25 @@ package cleanup
import (
"context"
"fmt"
"strings"

"github.com/vesoft-inc/nebula-agent/pkg/storage"
"github.com/vesoft-inc/nebula-br/pkg/clients"
"github.com/vesoft-inc/nebula-br/pkg/config"
"github.com/vesoft-inc/nebula-br/pkg/utils"

log "github.com/sirupsen/logrus"
pb "github.com/vesoft-inc/nebula-agent/pkg/proto"
)

type Cleanup struct {
ctx context.Context
cfg config.CleanupConfig
client *clients.NebulaMeta
sto storage.ExternalStorage

hosts *utils.NebulaHosts
agentMgr *clients.AgentManager
}

func NewCleanup(ctx context.Context, cfg config.CleanupConfig) (*Cleanup, error) {
Expand All @@ -30,11 +35,23 @@ func NewCleanup(ctx context.Context, cfg config.CleanupConfig) (*Cleanup, error)
return nil, fmt.Errorf("create meta client failed: %w", err)
}

listRes, err := client.ListCluster()
if err != nil {
return nil, fmt.Errorf("list cluster failed: %w", err)
}
hosts := &utils.NebulaHosts{}
err = hosts.LoadFrom(listRes)
if err != nil {
return nil, fmt.Errorf("parse cluster response failed: %w", err)
}

return &Cleanup{
ctx: ctx,
cfg: cfg,
client: client,
sto: sto,
ctx: ctx,
cfg: cfg,
client: client,
sto: sto,
hosts: hosts,
agentMgr: clients.NewAgentManager(ctx, hosts),
}, nil
}

Expand All @@ -58,6 +75,32 @@ func (c *Cleanup) cleanExternal() error {
if err != nil {
return fmt.Errorf("remove %s in external storage failed: %w", backupUri, err)
}
log.Debugf("Remove %s successfullly", backupUri)

// Local backend's data lay in different cluster machines,
// which should be handled separately
if c.cfg.Backend.GetLocal() != nil {
for _, addr := range c.hosts.GetAgents() {
agent, err := clients.NewAgent(c.ctx, addr)
if err != nil {
return fmt.Errorf("create agent for %s failed: %w when clean local data",
utils.StringifyAddr(addr), err)
}

// This is an hack, generally, we could not get local path
// from uri by triming directly
backupPath := strings.TrimPrefix(backupUri, "local://")
removeReq := &pb.RemoveDirRequest{
Path: backupPath,
}
_, err = agent.RemoveDir(removeReq)
if err != nil {
return fmt.Errorf("remove %s in host: %s failed: %w", backupPath, addr.Host, err)
}
log.Debugf("Remove local data %s in %s successfullly", backupPath, addr.Host)
}
}

return nil
}

Expand Down
4 changes: 3 additions & 1 deletion pkg/config/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ const (
FlagMetaAddr = "meta"
FlagSpaces = "spaces"

FlagLogPath = "log"
FlagLogPath = "log"
FlagLogDebug = "debug"

flagBackupName = "name"
)

func AddCommonFlags(flags *pflag.FlagSet) {
flags.String(FlagLogPath, "br.log", "Specify br detail log path")
flags.Bool(FlagLogDebug, false, "Output log in debug level or not")
storage.AddFlags(flags)
}

Expand Down
11 changes: 10 additions & 1 deletion pkg/log/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,16 @@ func SetLog(flags *pflag.FlagSet) error {
logrus.SetFormatter(&logrus.JSONFormatter{
TimestampFormat: "2006-01-02T15:04:05.000Z",
})
logrus.SetLevel(logrus.InfoLevel)

debug, err := flags.GetBool(config.FlagLogDebug)
if err != nil {
return err
}
if debug {
logrus.SetLevel(logrus.DebugLevel)
} else {
logrus.SetLevel(logrus.InfoLevel)
}

path, err := flags.GetString(config.FlagLogPath)
if err != nil {
Expand Down
35 changes: 23 additions & 12 deletions pkg/restore/restore.go
Original file line number Diff line number Diff line change
Expand Up @@ -326,32 +326,43 @@ func (r *Restore) startMetaService() error {
}

func (r *Restore) stopCluster() error {
rootDirs := r.hosts.GetRootDirs()
for _, agentAddr := range r.hosts.GetAgents() {
for host, services := range r.hosts.GetHostServices() {
logger := log.WithField("host", host)

var agentAddr *nebula.HostAddr
for _, s := range services {
if s.GetRole() == meta.HostRole_AGENT {
if agentAddr == nil {
agentAddr = s.GetAddr()
} else {
return fmt.Errorf("there are two agents in host %s: %s, %s", s.GetAddr().GetHost(),
utils.StringifyAddr(agentAddr), utils.StringifyAddr(s.GetAddr()))
}
}
}
agent, err := r.agentMgr.GetAgent(agentAddr)
if err != nil {
return fmt.Errorf("get agent %s failed: %w", utils.StringifyAddr(agentAddr), err)
}

dirs, ok := rootDirs[agentAddr.Host]
if !ok {
log.WithField("host", agentAddr.Host).Info("Does not find nebula root dirs in this host")
continue
}
for _, s := range services {
if s.GetRole() == meta.HostRole_AGENT {
continue
}

logger := log.WithField("host", agentAddr.Host)
for _, d := range dirs {
req := &pb.StopServiceRequest{
Role: pb.ServiceRole_ALL,
Dir: d.Dir,
Role: pb.ServiceRole(s.GetRole()),
Dir: string(s.GetDir().GetRoot()),
}
logger.WithField("dir", d.Dir).Info("Stop services")

logger.WithField("dir", req.Dir).WithField("role", s.GetRole().String()).Info("Stop services")
_, err := agent.StopService(req)
if err != nil {
return fmt.Errorf("stop services in host %s failed: %w", agentAddr.Host, err)
}
}
}

return nil
}

Expand Down
4 changes: 2 additions & 2 deletions pkg/storage/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ const (
func AddFlags(flags *pflag.FlagSet) {
flags.String(flagStorage, "",
`backup target url, format: <SCHEME>://<PATH>.
<SCHEME>: a string indicating which backend type. optional: local, hdfs.
now hdfs and local is supported, s3 and oss are still experimental.
<SCHEME>: a string indicating which backend type. optional: local, s3.
now only s3-compatible is supported.
example:
for local - "local:///the/local/path/to/backup"
for s3 - "s3://example/url/to/the/backup"
Expand Down
4 changes: 4 additions & 0 deletions pkg/utils/hosts.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ func (h *NebulaHosts) GetRootDirs() map[string][]*HostDir {
return hostRoots
}

func (h *NebulaHosts) GetHostServices() map[string][]*meta.ServiceInfo {
return h.hosts
}

func (h *NebulaHosts) GetAgents() []*nebula.HostAddr {
var al []*nebula.HostAddr
for _, services := range h.hosts {
Expand Down