Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rawpart #7774

Merged
merged 7 commits into from
Jul 3, 2022
Merged

Rawpart #7774

merged 7 commits into from
Jul 3, 2022

Conversation

hyc
Copy link
Collaborator

@hyc hyc commented Jul 7, 2021

Add support for storing database on a raw device/partition. Can improve write performance.
Also may be useful for storing DB on removable storage (USB flash drives, etc.) for transporting
between online and offline daemons, and precludes the possibility of viruses hiding in a filesystem
(since there is no filesystem to hide in).

For use with monerod, would need to manually symlink <data-dir>/lmdb to the target device name.

This is only supported for POSIX systems, not WIN32. While there may be a way to access raw
devices in Windows, that usually requires Administrator access.

Note: currently monerod can't use this feature. Its startup checks for whether the target path is
a directory and if it supports mmap are interfering, they need to be skipped.

@hyc hyc changed the title Rawpart WIP: Rawpart Jul 7, 2021
@hyc
Copy link
Collaborator Author

hyc commented Jul 7, 2021

These checks must be disabled in order to use the PR:
dif.txt

OK never mind, I've removed those checks in the following 2 commits.

@hyc hyc changed the title WIP: Rawpart Rawpart Jul 9, 2021
@Malinero
Copy link
Collaborator

For use with monerod, would need to manually symlink /lmdb to the target device name

What do you think about an option to avoid this manual operation ? stg like --lmdb-device /dev/.....

@hyc
Copy link
Collaborator Author

hyc commented Sep 18, 2021

I think anybody who's smart enough to use this without accidentally destroying their live filesystem partitions is smart enough to set a symlink. And I feel there's less chance of accidentally typo'ing the wrong device name on the commandline, by relying on a manually created symlink that you can doublecheck a few times before starting up...

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is long overdue, just a few questions.

@@ -1293,26 +1286,6 @@ BlockchainLMDB::BlockchainLMDB(bool batch_transactions): BlockchainDB()
m_hardfork = nullptr;
}

void BlockchainLMDB::check_mmap_support()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check be done after LMDB initializes (and therefore the flags could be checked for !rawpart) ?

@@ -4148,6 +4179,18 @@ mdb_env_init_meta(MDB_env *env, MDB_meta *meta)

psize = env->me_psize;

if ((env->me_flags & (MDB_RAWPART|MDB_WRITEMAP)) == (MDB_RAWPART|MDB_WRITEMAP)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have RAWPART == true but WRITEMAP == false? Not questioning the logic or asking for a code change, I'm asking for my understanding of how LMDB works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and WRITEMAP is usually false on most deployments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, brainfart. Everything in *Nix is a file, so this is just opening the rawpart like any other file (albeit one with a fixed size and location on disk) that gets written to like normal.

@@ -4054,6 +4058,31 @@ mdb_env_read_header(MDB_env *env, int prev, MDB_meta *meta)
int i, rc, off;
enum { Size = sizeof(pbuf) };

if (env->me_flags & MDB_RAWPART) {
#define VM_ALIGN 0x200000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has me completely lost - I have a rough idea of what the code is doing, I just don't understand how this constant was chosen.

Shouldn't this somehow probe the size of the "rawpart" ? Or was this chosen as a sufficiently large value; the OS handles "trapping" writes outside of the RAWPART through interrupts/errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2MB size is an optimal alignment for Flash memory erase blocks. Smaller may not be properly aligned for all devices, larger is not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok I did in fact misunderstand this slightly, thanks for the clarification.

hyc and others added 7 commits June 3, 2022 16:45
Autodetects that a block device is being used.
Use mmap to read and initialize the meta pages, raw device
may not support read/write syscalls.
This reverts commit bd96536.

The check interferes with raw device/partition support.
The check interferes with raw device/partition support.
@hyc
Copy link
Collaborator Author

hyc commented Jun 3, 2022

Rebased onto current master

@vtnerd
Copy link
Contributor

vtnerd commented Jun 3, 2022

The only holdup is the lack of warning if mmap fails ... can/should that be left in? It's a somewhat useful warning to users, but maybe its ignored most of the time?

@hyc
Copy link
Collaborator Author

hyc commented Jun 4, 2022

If mmap fails, LMDB will fail, and the daemon is dead in the water. It's not ignorable.

@luigi1111 luigi1111 merged commit 30a9183 into monero-project:master Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants