Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add diskrecovery command #1843

Merged
merged 3 commits into from
Aug 3, 2023
Merged

feat: add diskrecovery command #1843

merged 3 commits into from
Aug 3, 2023

Conversation

Mixficsol
Copy link
Collaborator

@Mixficsol Mixficsol commented Jul 30, 2023

feat: add diskrecovery command
Be related to issue: #1150
Be related to PR: #1765

背景

Pika 实例硬盘容量满,清理数据后,Pika 仍然保持错误的状态,需要重启才能恢复

解决方案

由于 RocksDB 在磁盘满了之后会进入写保护状态,此时不接受任何写请求,所以这种情况下如果 Pika 在清理掉磁盘空间进行接收写请求时,由于 RocksDB 还是处于写保护状态,这时候依然不能进行写响应,我们之前的一个方案是为 Pika 新增加一个线程,每过 5s 对磁盘进行检测,如果在预计容量内,则遍历整个 slot 对写请求状态进行恢复,但是由于这种操作每次需要遍历整个 DB 和 整个 slot,有上锁解锁操作,我们后面决定以一个命令的形式对这一情况进行解决,Diskcovery 命令,这个命令用于恢复对 RocksDB 的写保护状态,同时恢复 Pika 中写 binlog 错误的状态,在当前磁盘容量小于 write-buffer-size 的时候不进行恢复,只有在当前磁盘容量大于 write-buffer-size 的时候进行恢复,这里的 write-buffer-size 指 Memtable 的大小,设置为 256M. 以下是设计方案的代码:

截屏2023-08-07 10 56 37

复现方式

  1. 我们首先创建一个 1GB 大小的虚拟磁盘 (Centos环境),并查看是否成功

    dd if=/dev/zero of=my_virtual_disk.img bs=1G count=1

    ls -lh my_virtual_disk.img

截屏2023-08-02 20 31 54
  1. 然后执行命令来对文件进行格式化为特定的文件系统(ext4)

mkfs.ext4 my_virtual_disk.img

截屏2023-08-02 14 59 28
  1. 把虚拟文件系统挂载到目录 /mnt/vfs

sudo mkdir -p /mnt/my_disk

sudo mount -o loop my_virtual_disk.img /mnt/my_disk

df -h

截屏2023-08-02 20 18 23
  1. 我们对 pika.conf 里面的 log-path 和 db-path 进行修改,改成我们创建的虚拟文件系统路径
截屏2023-08-02 15 24 02
  1. 启动 Pika 并进行压测,我们先记录一下压测前磁盘的使用情况
截屏2023-08-02 20 11 58
  1. 等到压测到磁盘上限时我们可以发现此时执行 set 操作(写操作)时已经不能正常执行,我们调用 Diskrecovery 命令可以看到此时的磁盘使用情况(此时这种情况是不能恢复的,因为磁盘 free 的大小小于 write-buffter-size)
截屏2023-08-02 16 11 50
  1. 我们继续更改 pika.conf 的 log-path 和 db-path 然后把 之前的一个 db 文件删掉,继续启动 Pika 进行测试
截屏2023-08-02 20 17 29
  1. 在磁盘再次被打满时,我们依然不能执行 set 操作,此时我们把之前的 db 文件夹删掉,然后执行一次 Diskrecovery 命令,发现再次执行 set 操作时已经可以正常执行了
截屏2023-08-02 20 16 36

docs/ops/APIDifference.md Outdated Show resolved Hide resolved
docs/ops/APIDifference.md Outdated Show resolved Hide resolved
src/pika_server.cc Show resolved Hide resolved
docs/ops/APIDifference.md Outdated Show resolved Hide resolved
private:
void DoInitial() override;
};

#ifdef WITH_COMMAND_DOCS
class CommandCmd : public Cmd {
public:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon reviewing the provided code patch, a few observations and suggestions can be made:

  1. DiskRecoveryCmd Class:

    • The Split and Merge functions in the DiskRecoveryCmd class are empty. If they are intended to perform specific functionalities, you should implement them accordingly.
    • In the Clone function of DiskRecoveryCmd, it is recommended to use smart pointers (std::unique_ptr or std::shared_ptr) instead of raw pointers for better memory management.
  2. Code Formatting:

    • Ensure consistent indentation throughout the code to improve readability and maintainability.
  3. Error Handling:

    • It is important to handle any potential errors that may occur during execution. Make sure to include appropriate error handling mechanisms such as exception handling or error return codes.
  4. Documentation:

    • Consider adding comments or documentation to clarify the purpose and functionality of the classes and functions. This will help other developers understand and maintain the codebase.

Note: The provided code snippet is relatively small, making it difficult to identify any specific bugs or risks without a broader context of the codebase and its requirements. A comprehensive code review would typically involve examining the entire codebase and understanding its architecture and design choices.

src/pika_server.cc Show resolved Hide resolved

### diskrecovery
Pika 原创命令,功能为当磁盘意外写满后,RocksDB 会进入写保护状态,当我们将空间调整为充足空间时,这个命令可以将 RocksDB 的写保护状态解除,变为可以继续写的状态, 避免了 Pika 因为磁盘写满后需要重启才能恢复写的情况,执行成功时返回 OK,如果当前磁盘空间依然不足,执行这个命令返回`"The available disk capacity is insufficient`,该命令执行时不需要额外参数,只需要执行 diskrecovery 即可。 No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the provided code patch, here are some observations:

  1. There is no specific bug identified in the code patch.
  2. Improvement suggestion: Ensure consistent formatting and follow coding conventions like adding a newline at the end of each file.

However, it's important to note that the provided code patch is just a small excerpt, and without the full context and surrounding code, it's difficult to provide an extensive review. It is recommended to conduct a thorough review of the entire codebase to identify any potential issues or improvements.

private:
void DoInitial() override;
};

#ifdef WITH_COMMAND_DOCS
class CommandCmd : public Cmd {
public:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the code patch you provided looks fine. However, there are a few suggestions for improvement and potential bug risks:

  1. In the DiskRecoveryCmd class, the Split function is declared but does not have an implementation. Make sure to implement it according to your requirements. If it's not needed, consider removing the declaration.

  2. In the Merge function of the DiskRecoveryCmd class, there is an empty body. If the Merge function is intended to be empty, it would be clearer to indicate that by using the override keyword followed by a semicolon, like this: void Merge() override {};.

  3. In the CommandCmd class (not shown in the code patch), ensure that it has the necessary member functions implemented and properly defined. Verify that any overridden functions have the correct implementation according to their base class declarations.

  4. Consider using smart pointers (std::unique_ptr or std::shared_ptr) instead of raw pointers where appropriate. Smart pointers help manage memory automatically, reducing the likelihood of memory leaks and other related bugs.

  5. Refactor repetitive code, such as the DoInitial function, into a common base class if possible. This encourages code reuse and reduces duplication.

Remember to pay attention to the implementation details outside the code patch you provided to ensure the overall correctness and efficiency of your code.


### diskrecovery
Pika 原创命令,功能为当磁盘意外写满后,RocksDB 会进入写保护状态,当我们将空间调整为充足空间时,这个命令可以将 RocksDB 的写保护状态解除,变为可以继续写的状态, 避免了 Pika 因为磁盘写满后需要重启才能恢复写的情况,执行成功时返回 OK,如果当前磁盘空间依然不足,执行这个命令返回`"The available disk capacity is insufficient`,该命令执行时不需要额外参数,只需要执行 diskrecovery 即可。 No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code patch you provided appears to be a modification to a documentation file rather than actual code. Therefore, it is difficult to perform a comprehensive code review. However, based on the information provided, there are a few suggestions:

  1. Ensure consistent formatting: Maintain consistent line spacing and indentation throughout the file.
  2. Add a newline at the end of the file: It's recommended to have a newline character at the end of the file for better readability.
  3. Clarify usage instructions: Provide more detailed instructions on how to use the "diskrecovery" command, such as any required parameters or additional considerations.

Since this patch only contains changes to the documentation, it is unlikely to introduce bugs or risks in the code itself. Nonetheless, it's always important to ensure accuracy and clarity in the documentation to assist users in understanding and utilizing your code effectively.

private:
void DoInitial() override;
};

#ifdef WITH_COMMAND_DOCS
class CommandCmd : public Cmd {
public:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to provide a comprehensive code review without seeing the entire codebase or having more context. However, based on the provided code patch:

  1. In DiskRecoveryCmd, the Split and Merge functions are empty. It's important to make sure these functions are properly implemented if they are expected to perform specific tasks. Otherwise, consider removing them if they are not needed.

  2. In the Clone function of DiskRecoveryCmd, you're returning a raw pointer to a dynamically allocated object. This can lead to memory leaks if not handled correctly. Consider using smart pointers like std::unique_ptr instead or ensuring proper memory management.

  3. It's unclear what DoInitial is meant to do in both HelloCmd and DiskRecoveryCmd. Make sure the implementation of this function aligns with its intended purpose.

  4. Without further information about the functionality and usage of the Cmd class, it's challenging to provide additional improvement suggestions or assess potential bug risks.

Consider providing more code context or specific areas of concern for a more detailed review.

@Mixficsol Mixficsol changed the title feat: Add diskrecovery command feat: add diskrecovery command Jul 30, 2023
@Mixficsol Mixficsol requested a review from Axlgrep July 31, 2023 02:37
@Mixficsol Mixficsol changed the title feat: add diskrecovery command WIF: add diskrecovery command Aug 1, 2023
@Mixficsol Mixficsol changed the title WIF: add diskrecovery command WIP: add diskrecovery command Aug 1, 2023
@Mixficsol Mixficsol changed the title WIP: add diskrecovery command feat: add diskrecovery command Aug 2, 2023
@yaoyinnan
Copy link
Contributor

可以把复现流程在 PR 中描述吗?

slot_item.second->DbRWUnLock();
for (const auto &item: background_errors_) {
if (item.second != 0) {
rocksdb::Status s = slot_item.second->db()->GetDBByType(item.first)->Resume();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是没有处理当resume失败时,设置对应返回值

return;
}

std::shared_mutex dbs_rw, slots_rw;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把这些都挪到成员变量中,同时不要两个何谓一行

@Mixficsol Mixficsol merged commit 1f66565 into OpenAtomFoundation:unstable Aug 3, 2023
@Mixficsol Mixficsol deleted the add_Disk_recovery_command branch August 3, 2023 11:25
Mixficsol added a commit that referenced this pull request Aug 4, 2023
* Optimized code logic (#1834)

* fix:delete consensus level (#1844)

* delete_consensus_level

* Update pika.conf (#1854)

* fix:delete_consensus_level (#1852)

* delete_consensus_level

* fix:shutdown&&slaveof no one too slow (#1859)

* fix_slow_shut_down

* add go integrate test (#1792)

* add go integrate test

* add go integrate test

* add go integrate test

* add go integrate test

* temp test

* temp test

* temp test

* temp test

* temp test

* temp test

* add test

* temp test

* temp test

* add tcl

* add tcl

* reformat code

* add tcl

* reformat code

* add tcl

* reformat code

* add tcl

* fix setxx

* fix setxx

* add tcl

* add tcl

* fix: fix some compile error (#1861)

Co-authored-by: wangshaoyi <wangshaoyi@meituan.com>

* change-version

* feat: add diskrecovery command (#1843)

* Add diskrecovery command

* Add diskrecovery command

---------

Co-authored-by: chejinge <945997690@qq.com>
Co-authored-by: Yuecai Liu <38887641+luky116@users.noreply.github.com>
Co-authored-by: wangshao1 <30471730+wangshao1@users.noreply.github.com>
Co-authored-by: wangshaoyi <wangshaoyi@meituan.com>
bigdaronlee163 pushed a commit to bigdaronlee163/pika that referenced this pull request Jun 8, 2024
* Add diskrecovery command

* Add diskrecovery command
cheniujh pushed a commit to cheniujh/pika that referenced this pull request Sep 24, 2024
* Add diskrecovery command

* Add diskrecovery command
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants