Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixbug/install snapshot bug #80

Merged
merged 16 commits into from
Apr 1, 2019
Merged

fixbug/install snapshot bug #80

merged 16 commits into from
Apr 1, 2019

Conversation

fengjiachun
Copy link
Contributor

@fengjiachun fengjiachun commented Mar 29, 2019

Motivation:

Snapshot 过大引起的安装失败 bug,会影响新增节点的加入

Modification:

  1. GetFileResponse#eof 字段必须设置
  2. 新增 rpcInstallSnapshotTimeout 配置项,默认 5 min,避免 snapshot 分段传输时间过长导致的不断超时重试

Result:

Fixes #79

@fengjiachun fengjiachun added the bug Something isn't working label Mar 29, 2019
@killme2008
Copy link
Contributor

@fengjiachun 怎么测试没过

@fengjiachun
Copy link
Contributor Author

@killme2008 是不是应该在 jepsen 验证中把 snapshot 考虑进去?

}

@Test
public void testGetLargeFileData() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个测试还是不够的,还是希望能模拟真实的情况,节点添加下安装较大的 snapshot ,可以模拟出来的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个测试更类似单元测试,放到 FileServiceTest 更合适,这里更多还是模拟运行的集成测试。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试还没完成,这个只是 FileService 的测试,在 NodeTest 中还会补充一个 InstallSnapshot 的测试

@killme2008
Copy link
Contributor

@fengjiachun 可以考虑改下配置,加快生成 snapshot,降低分片阈值。

@@ -187,6 +181,9 @@ public Status status() {

private void onFinished() {
if (!this.finished) {
if (!this.st.isOk()) {
LOG.error("Fail to copy data with status: {}.", this.st);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要打印错误日志信息,否则拷贝失败时不好排查

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充下 file reader id 等信息,最好有更多上下文。

@@ -88,7 +87,7 @@ public int readFile(ByteBufferCollector metaBufferCollector, String fileName, lo
}
}

return this.readFileWithMeta(metaBufferCollector, fileName, fileMeta, offset, newMaxCount);
return super.readFileWithMeta(metaBufferCollector, fileName, fileMeta, offset, newMaxCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 super 改动是不必要的,并且会引入风险,未来如果这个类覆写了上面这个方法,反而可能遗忘修改这里。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

恩,有道理

@@ -172,7 +172,7 @@ public static long getProcessId(final long fallback) {
}

/**
* Default init and expand buffer size, it can be set by -Draft.byte_buf.size=n, default 1024.
* Default init and expand buffer size, it can be set by -Djraft.byte_buf.size=n, default 1024.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个改动需要加入 changelog 说明了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个没有动代码,是原来注释写错了,顺手改了下


// add follower
final PeerId newPeer = peers.get(3);
final SnapshotThrottle snapshotThrottle = new ThroughputSnapshotThrottle(128, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是故意流控了?确认有超过限制?我以为你会修改那个 maxByteCountPerRpc 的参数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

恩,设置流控了,设置流控和调小 maxByteCountPerRpc 参数理论上是一样功效的呀,
我有在 FileService read 中打印调试日志,确认是分段传输的,maxByteCountPerRpc 其实也测试了,同样效果

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保险起见,我又增加了一个 unit test, 调整了 maxByteCountPerRpc 参数,两种不同方式目前都通过测试

@fengjiachun fengjiachun mentioned this pull request Apr 1, 2019
4 tasks
@killme2008 killme2008 merged commit fa23d59 into develop Apr 1, 2019
@fengjiachun fengjiachun deleted the fixbug/get_file_eof branch April 1, 2019 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants