Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve some action for install snapshot and add peer #1527

Merged
merged 15 commits into from
May 16, 2022

Conversation

Linary
Copy link
Contributor

@Linary Linary commented Jun 29, 2021

  1. Fix bug that relative path need normalize when load snapshot
  2. Use zip decompresser since decompress tar file may lead dead loop
  3. Move raft.endpoint and raft.group_peers into rest-server.properties
  4. Pass raft.endpoint and raft.group_peers from server to core
  5. Let RestServer start before GremlinServer
  6. Use read-index to ensure the raft log caught up for new node
  7. Let add_peer and remove_peer API work in async job
  8. Let truncateBackend don't truncate system store

Change-Id: I7de3d98cb51c3b7c32a4f19a0d99ed622f78d331

@codecov
Copy link

codecov bot commented Jun 29, 2021

Codecov Report

Merging #1527 (7592b23) into master (bb88c36) will increase coverage by 4.16%.
The diff coverage is 57.32%.

@@             Coverage Diff              @@
##             master    #1527      +/-   ##
============================================
+ Coverage     66.89%   71.05%   +4.16%     
+ Complexity      728      693      -35     
============================================
  Files           449      451       +2     
  Lines         38116    38204      +88     
  Branches       5404     5407       +3     
============================================
+ Hits          25496    27145    +1649     
+ Misses        10104     8409    -1695     
- Partials       2516     2650     +134     
Impacted Files Coverage Δ
...e/src/main/java/com/baidu/hugegraph/HugeGraph.java 65.30% <ø> (ø)
...ph/backend/store/AbstractBackendStoreProvider.java 80.85% <ø> (+1.06%) ⬆️
...u/hugegraph/backend/store/raft/RaftAddPeerJob.java 0.00% <0.00%> (ø)
...ugegraph/backend/store/raft/RaftRemovePeerJob.java 0.00% <0.00%> (ø)
...aph/backend/store/raft/rpc/ListPeersProcessor.java 21.73% <0.00%> (+21.73%) ⬆️
...hugegraph/backend/store/raft/rpc/RpcForwarder.java 71.18% <ø> (+71.18%) ⬆️
...aph/backend/store/raft/rpc/SetLeaderProcessor.java 27.77% <0.00%> (+27.77%) ⬆️
.../backend/store/raft/rpc/StoreCommandProcessor.java 65.21% <ø> (+65.21%) ⬆️
.../baidu/hugegraph/backend/tx/SchemaTransaction.java 89.31% <ø> (+0.38%) ⬆️
...ain/java/com/baidu/hugegraph/api/raft/RaftAPI.java 17.30% <4.76%> (+17.30%) ⬆️
... and 73 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb88c36...7592b23. Read the comment docs.

@@ -131,10 +133,15 @@ public void clear() throws BackendException {
}

@Override
public void truncate() {
public void truncate(HugeGraph graph) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass systemStoreName as param

@github-actions
Copy link

Due to the lack of activity, the current pr is marked as stale and will be closed after 180 days, any update will remove the stale label

@imbajin imbajin added improvement General improvement and removed inactive labels Sep 7, 2021
@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Due to the lack of activity, the current pr is marked as stale and will be closed after 180 days, any update will remove the stale label

@imbajin imbajin added feature New feature and removed inactive feature New feature labels Oct 8, 2021
this.checkOpened();
HugeConfig config = (HugeConfig) graph.configuration();
String systemStoreName = config.get(CoreOptions.STORE_SYSTEM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can split stores into multi vars, like schemaStore + systemStore + graphStore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny comment, address optionally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't deal with it this time.

Comment on lines 141 to 145
// Don't truncate system store
if (store.store().equals(systemStoreName)) {
store.truncate();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more reasonable to truncate systemStore + graphStore, can we move meta table from systemStore to schemaStore?

Linary added 2 commits April 16, 2022 11:35
1. Fix bug that relative path need normalize when load snapshot
2. Use zip decompresser since decompress tar file may lead dead loop
3. Move raft.endpoint and raft.group_peers into rest-server.properties
4. Pass raft.endpoint and raft.group_peers from server to core
5. Let RestServer start before GremlinServer
6. Use read-index to ensure the raft log caught up for new node
7. Let add_peer and remove_peer API work in async job
8. Let truncateBackend don't truncate system store

Change-Id: I7de3d98cb51c3b7c32a4f19a0d99ed622f78d331
Change-Id: Iba6e9cbfab15f28cab4126b7ef8ab7210370a0d6
@Linary Linary force-pushed the raft-add-peer-improve branch from b3f1c1d to e9883f9 Compare April 16, 2022 04:07
@github-actions
Copy link

github-actions bot commented Apr 16, 2022

CLA Assistant Lite bot Good! All Contributors have signed the CLA.

Change-Id: Iefe83f4781ad77ece88939f59eb81552175189cd
@Linary Linary force-pushed the raft-add-peer-improve branch from e9883f9 to 435edfe Compare April 16, 2022 04:17
@Linary
Copy link
Contributor Author

Linary commented Apr 16, 2022

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Apr 18, 2022
LOG.debug("Graph '{}' has writed snapshot", this.graph());
RaftClosure<?> future = this.context.node().submitAndWait(command,
closure);
if (future != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the future may be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not. Maybe it's defensive programming

Change-Id: Ic27656f2fb8793d02baf40e720b609b4a6fb9a12
@Linary Linary closed this May 15, 2022
@javeme javeme reopened this May 15, 2022
} else {
this.snapshotExecutor = null;
}
this.snapshotExecutor = this.createSnapshotExecutor(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define a const var

@@ -19,7 +19,7 @@

package com.baidu.hugegraph.backend.store.raft.rpc;

import static com.baidu.hugegraph.backend.store.raft.RaftSharedContext.WAIT_RPC_TIMEOUT;
import static com.baidu.hugegraph.backend.store.raft.RaftContext.WAIT_RPC_TIMEOUT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid static import

"raft.install_snapshot_rpc_timeout",
"The install snapshot rpc timeout for jraft rpc.",
positiveInt(),
// jraft default value is 5 * 60 * 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 minutes?

@@ -252,10 +228,19 @@ public static synchronized CoreOptions instance() {
public static final ConfigOption<Integer> RAFT_RPC_TIMEOUT =
new ConfigOption<>(
"raft.rpc_timeout",
"The rpc timeout for jraft rpc.",
"The general rpc timeout for jraft rpc.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in millisecond

@@ -181,10 +213,15 @@ public void clear() {
}

@Override
public void truncate() {
public void truncate(HugeGraph graph) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass systemStoreName?

Comment on lines 139 to 141
this.raftGroupService.shutdown();
try {
this.raftGroupService.join();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to shutdown raftGroupService when raftNode closed?

Comment on lines 63 to 65
this.raftGroupService = this.initRaftNode();
// Start node
this.node = this.raftGroupService.start(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do need to hold raftGroupService?


/*
* TODO Depending on the name of the config item for server options,
* need to get through ServerConfig and CoreConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raft.group_peers is transfered from ServerConfig instead of CoreConfig, since it's shared by all graphs.

@@ -375,6 +377,7 @@ private RpcServer initAndStartRpcServer() {
NodeManager.getInstance().addAddress(endpoint.getEndpoint());
RpcServer rpcServer = RaftRpcServerFactory.createAndStartRaftRpcServer(
endpoint.getEndpoint());
LOG.info("RPC server is started successfully");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HugeConfig config = params.configuration();
Integer lowWaterMark = config.get(
CoreOptions.RAFT_RPC_BUF_LOW_WATER_MARK);
System.setProperty("bolt.channel_write_buf_low_water_mark",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reuse from RaftContext.initAndStartRpcServer()?

Change-Id: Ie16b77460133a9b7a5c27922efc82de171b775c7
@javeme javeme force-pushed the raft-add-peer-improve branch from fd649fb to 9df37f0 Compare May 15, 2022 15:39
javeme added 2 commits May 15, 2022 23:59
Change-Id: If902adf36d41fbe87843fdd4ed436143937486a1
Change-Id: I86348c0ce7be613150ec097adfe8276f0cd8a22a
javeme added 6 commits May 16, 2022 00:22
Change-Id: Ida94cbd278b340e784baf0d85439789236c300d4
Change-Id: Ib86e892929bfa1a1d88ec1ceb1bf43f962437794
Change-Id: I628bcb4dc3545bb702b4cac3ebe151e5fd9f8e54
Change-Id: I9f4506a1bc4115d010255b584e8bbfa7d20fe67c
Change-Id: I68517141d8bbc6a9b2991849fda5fc686f23b513
Change-Id: I6be1a445412693701e8f10f5f3e8597000497530
zyxxoo
zyxxoo previously approved these changes May 16, 2022
@javeme
Copy link
Contributor

javeme commented May 16, 2022

ci error:

2022-05-15 19:04:18 [main] [INFO] c.b.h.b.s.BackendProviderFactory - Opening backend store 'rocksdb' in raft mode for graph 'hugegraph'
2022-05-15 19:04:18 [main] [INFO] c.b.h.b.s.r.RaftBackendStoreProvider - Init raft backend system store
Error: -15 19:04:18 [main] [ERROR] c.b.h.d.HugeGraphServer - HugeRestServer start error: 
java.lang.RuntimeException: GraphFactory could not instantiate this Graph implementation [class com.baidu.hugegraph.auth.HugeFactoryAuthProxy]
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:84) ~[gremlin-core-3.5.1.jar:3.5.1]
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:72) ~[gremlin-core-3.5.1.jar:3.5.1]
	at com.baidu.hugegraph.auth.StandardAuthenticator.setup(StandardAuthenticator.java:121) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at com.baidu.hugegraph.auth.HugeAuthenticator.loadAuthenticator(HugeAuthenticator.java:148) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at com.baidu.hugegraph.core.GraphManager.<init>(GraphManager.java:96) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at com.baidu.hugegraph.server.ApplicationConfig$GraphManagerFactory$1.onEvent(ApplicationConfig.java:112) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at org.glassfish.jersey.server.internal.monitoring.CompositeApplicationEventListener.onEvent(CompositeApplicationEventListener.java:49) ~[jersey-server-3.0.3.jar:?]
	at org.glassfish.jersey.server.internal.monitoring.MonitoringContainerListener.onStartup(MonitoringContainerListener.java:56) ~[jersey-server-3.0.3.jar:?]
	at org.glassfish.jersey.server.ApplicationHandler.onStartup(ApplicationHandler.java:711) ~[jersey-server-3.0.3.jar:?]
	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.start(GrizzlyHttpContainer.java:330) ~[jersey-container-grizzly2-http-3.0.3.jar:?]
	at org.glassfish.grizzly.http.server.HttpHandlerChain.start(HttpHandlerChain.java:376) ~[grizzly-http-server-3.0.1.jar:3.0.1]
	at org.glassfish.grizzly.http.server.HttpServer.setupHttpHandler(HttpServer.java:268) ~[grizzly-http-server-3.0.1.jar:3.0.1]
	at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:245) ~[grizzly-http-server-3.0.1.jar:3.0.1]
	at com.baidu.hugegraph.server.RestServer.start(RestServer.java:68) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at com.baidu.hugegraph.server.RestServer.start(RestServer.java:175) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at com.baidu.hugegraph.dist.HugeRestServer.start(HugeRestServer.java:34) ~[hugegraph-dist-0.13.0.jar:?]
	at com.baidu.hugegraph.dist.HugeGraphServer.<init>(HugeGraphServer.java:62) ~[hugegraph-dist-0.13.0.jar:?]
	at com.baidu.hugegraph.dist.HugeGraphServer.main(HugeGraphServer.java:121) ~[hugegraph-dist-0.13.0.jar:?]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_332]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_332]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_332]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_332]
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:80) ~[gremlin-core-3.5.1.jar:3.5.1]
	... 17 more
Caused by: java.lang.IllegalStateException: Please ensure init raft context
	at com.google.common.base.Preconditions.checkState(Preconditions.java:531) ~[guava-25.1-jre.jar:?]
	at com.baidu.hugegraph.util.E.checkState(E.java:68) ~[hugegraph-common-2.1.2.jar:2.1.2.0]
	at com.baidu.hugegraph.backend.store.raft.RaftBackendStoreProvider.context(RaftBackendStoreProvider.java:74) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.backend.store.raft.RaftBackendStoreProvider.loadSystemStore(RaftBackendStoreProvider.java:143) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.StandardHugeGraph.loadSystemStore(StandardHugeGraph.java:466) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.StandardHugeGraph.access$1300(StandardHugeGraph.java:113) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.StandardHugeGraph$StandardHugeGraphParams.loadSystemStore(StandardHugeGraph.java:1185) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.task.ServerInfoManager.listenChanges(ServerInfoManager.java:109) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.task.ServerInfoManager.<init>(ServerInfoManager.java:85) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.task.StandardTaskScheduler.<init>(StandardTaskScheduler.java:106) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.task.TaskManager.addScheduler(TaskManager.java:92) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.StandardHugeGraph.<init>(StandardHugeGraph.java:218) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.HugeFactory.open(HugeFactory.java:91) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.HugeFactory.open(HugeFactory.java:69) ~[hugegraph-core-0.13.0.jar:0.13.0.0]
	at com.baidu.hugegraph.auth.HugeFactoryAuthProxy.open(HugeFactoryAuthProxy.java:83) ~[hugegraph-api-0.13.0.jar:0.69.0.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_332]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_332]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_332]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_332]
	at org.apache.tinkerpop.gremlin.structure.util.GraphFactory.open(GraphFactory.java:80) ~[gremlin-core-3.5.1.jar:3.5.1]
	... 17 more
The operation timed out(60s) when attempting to connect to http://127.0.0.1:8082/graphs

javeme added 2 commits May 16, 2022 12:33
Change-Id: Ib128b3221ee3acafa757069409625888790e14cf
Change-Id: I34abe845ba28d91a05e96065413a6148d39fa5a6
@Linary
Copy link
Contributor Author

Linary commented May 16, 2022

+2, Thanks!

@javeme javeme merged commit 8609968 into master May 16, 2022
@javeme javeme deleted the raft-add-peer-improve branch May 16, 2022 11:34
@wangyao2016
Copy link
Contributor

wangyao2016 commented May 17, 2022

With this PR, I cannot start hugegraph successfully. Err:

image

I found the object serverConfig.getServer() was null. Actually ServerConfig is [protocol=bolt, port=8091, host=127.0.0.1]

Can you take a look?Thanks. @Linary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement General improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants