Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creation of Snapshot fails #707

Open
samirvb opened this issue Nov 8, 2021 · 9 comments
Open

Creation of Snapshot fails #707

samirvb opened this issue Nov 8, 2021 · 9 comments

Comments

@samirvb
Copy link

samirvb commented Nov 8, 2021

Your question

On one of my existing nodes , the snapshot creation fails with the following exception stacktrace :

2021-11-05 17:56:29.200 [ ] [JRaft-Closure-Executor-4] [init-64] ERROR c.a.s.j.s.s.l.LocalSnapshotWriter - Fail to create directory /node/data//sofajraft/stacs/snapshot/temp.
2021-11-05 17:56:29.201 [ ] [JRaft-Closure-Executor-4] [create-285] ERROR c.a.s.j.s.s.l.LocalSnapshotStorage - Fail to init snapshot writer.
2021-11-05 17:56:29.202 [ ] [JRaft-FSMCaller-Disruptor-0] [onError-72] ERROR c.a.s.j.c.StateMachineAdapter - Encountered an error=Status[EIO<1014>: Fail to create snapshot writer.] on StateMachine io.stacs.nav.consensus.sofajraft.config.SofajraftStateMachine, it's highly recommended to implement this method as raft stops working since some error occurs, you should figure out the cause and repair or remove this node.
com.alipay.sofa.jraft.error.RaftException: ERROR_TYPE_SNAPSHOT
at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.reportError(SnapshotExecutorImpl.java:691)
at com.alipay.sofa.jraft.storage.snapshot.SnapshotExecutorImpl.doSnapshot(SnapshotExecutorImpl.java:346)
at com.alipay.sofa.jraft.core.NodeImpl.doSnapshot(NodeImpl.java:3098)
at com.alipay.sofa.jraft.core.NodeImpl.lambda$handleSnapshotTimeout$0(NodeImpl.java:607)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Note that the location already has snapshot folders so it's not an issue with permissions. Also there is no issue with disk space. Any idea what might be happening ? This error occurs on different nodes which have been running fine using sofajraft 1.3.5.

Your scenes

Describe your use scenes (why need this feature)

Your advice

Describe the advice or solution you'd like

Environment

  • SOFAJRaft version: 1.3.5
  • JVM version (e.g. java -version): openjdk version "11.0.10" 2021-01-19
  • OS version (e.g. uname -a): Linux native-e-64d59994b-dgtk5 5.4.149-73.259.amzn2.x86_64 typo error #1 SMP Mon Sep 27 12:48:12 UTC 2021 x86_64 Linux
  • Maven version: 3.6.3
  • IDE version: IntelliJ IDEA 2021.2 Community Edition
@fengjiachun
Copy link
Contributor

/node/data//sofajraft/stacs/snapshot/temp exists and is not a directory?

@samirvb
Copy link
Author

samirvb commented Nov 8, 2021

/node/data//sofajraft/stacs/snapshot/temp exists and is not a directory?

Yes , this location doesn't exist nor as a directory or as a file.

@fengjiachun
Copy link
Contributor

Can you show the ls -lsh result for: /stacs/snapshot/

@samirvb
Copy link
Author

samirvb commented Nov 8, 2021

Unfortunately I don't have the old node (since we had to restore it). I had done a "ls -la" on the location and found no other "temp" folder/file in that location. Attached is a screenshot of the restored node -

image

Is there anyway we can reproduce this issue ? This is quite important and our cluster goes down so we need to fix it.

@fengjiachun
Copy link
Contributor

Most likely it was permission issue, but the logger did not print the exception message, I fixed the log in this #708

@killme2008
Copy link
Contributor

I think it's a permission problem here , what's the user do you run the java program? In above screenshot, the snapshot directory belongs to root user.

@samirvb
Copy link
Author

samirvb commented Nov 9, 2021

I think it's a permission problem here , what's the user do you run the java program? In above screenshot, the snapshot directory belongs to root user.

Hi , all processes are run using the root user. See below screenshot :

image

The process runs using the "root" user
I was able to create a directory in the same location using the mkdir command and was able to create it.

Can you let me know if there is any way we can reproduce the creation of snapshot (and hopefully this issue) ?

@fengjiachun
Copy link
Contributor

Only one directory was created and nothing else was done, so we couldn't find a good way to reproduce it.

@killme2008
Copy link
Contributor

We will release a new version with more logs, and if it reproduces in future, we can find out the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants