Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Return size of block (in bytes) from web3.eth.getBlock RPC function #5829

Merged
merged 9 commits into from
Dec 4, 2019

Conversation

halfalicious
Copy link
Contributor

@halfalicious halfalicious commented Nov 17, 2019

Fix #5797, #5814

Return the size of a block in bytes to calls to web3.eth.getBlock. This requires extending the BlockDetails structure (by adding a new blockSizeBytes field) which in turn requires incrementing the db minor version number (since BlockDetails are stored in the Extras database) which triggers an automatic database rebuild (which can take a while depending on how many blocks are in your local chain).

I've also made some minor changes to the database rebuild code to handle edge cases:

  • Detect a previously prematurely terminated rebuild and log an error / throw exception if detected
  • Throw an exception on rebuild failure (previously only logged an error message)
  • Only update the database minor version file after a successful rebuild
  • Additional logging during database rebuild

Tested changes by calling web3.eth.getBlock with the following block numbers and comparing the results against Etherscan: 0, 1, 65001 (3 txs), 101146 (2 txs), 109424 (1 tx), 115367 (7 txs)

@halfalicious halfalicious changed the title Return size of block RLP in bytes from web3.eth.getBlock RPC function [WIP] Return size of block RLP in bytes from web3.eth.getBlock RPC function Nov 18, 2019
@halfalicious halfalicious changed the title [WIP] Return size of block RLP in bytes from web3.eth.getBlock RPC function [WIP] Return size of block in bytes from web3.eth.getBlock RPC function Nov 18, 2019
@codecov-io
Copy link

codecov-io commented Nov 18, 2019

Codecov Report

Merging #5829 into master will decrease coverage by 0.03%.
The diff coverage is 50.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5829      +/-   ##
==========================================
- Coverage   64.05%   64.02%   -0.04%     
==========================================
  Files         362      364       +2     
  Lines       30903    30970      +67     
  Branches     3431     3434       +3     
==========================================
+ Hits        19796    19828      +32     
- Misses       9878     9918      +40     
+ Partials     1229     1224       -5

@halfalicious
Copy link
Contributor Author

Size returned for genesis block is still 0 bytes, need to investigate.

@halfalicious halfalicious force-pushed the blocksize branch 2 times, most recently from 33b6590 to 956528b Compare November 18, 2019 06:29
@halfalicious halfalicious requested review from gumb0 and chfast and removed request for gumb0 November 18, 2019 07:04
@halfalicious halfalicious changed the title [WIP] Return size of block in bytes from web3.eth.getBlock RPC function Return size of block (in bytes) from web3.eth.getBlock RPC function Nov 18, 2019
libethereum/BlockDetails.h Outdated Show resolved Hide resolved
libethereum/BlockDetails.h Outdated Show resolved Hide resolved
libethereum/BlockDetails.cpp Outdated Show resolved Hide resolved
libethereum/BlockDetails.h Show resolved Hide resolved
libweb3jsonrpc/Eth.cpp Show resolved Hide resolved
libweb3jsonrpc/Eth.cpp Outdated Show resolved Hide resolved
fs::remove_all(chainSubPathBlocks);
fs::remove_all(extrasSubPathExtras);
LOG(m_loggerInfo) << "Deleting all databases. This will require a resync from genesis.";
fs::remove_all(chainPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it was removing only blocks and extras because BlockChain class is responsible on only for these DB, and State is responsible for state.

I'd leave it without change for better separation of concertns.
(with this change there's at least a problem when it gets to State::openDB and logs Killing state database - that's not true anymore)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see below that you did it to remove also extras.old, if it's there... Maybe better to remove existing extras.old in that case?

Copy link
Contributor Author

@halfalicious halfalicious Nov 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 I think we should leave extras.old (unless one explicitly specifies --kill) since there's the possibility of using it to restore your extras DB (if you know what you're doing), keeping it around gives the user the chance to do that.

fs::remove_all(chainSubPathBlocks);
fs::remove_all(extrasSubPathExtras);
LOG(m_loggerInfo) << "Deleting all databases. This will require a resync from genesis.";
fs::remove_all(chainPath);
}

fs::create_directories(extrasPath);
DEV_IGNORE_EXCEPTIONS(fs::permissions(extrasPath, fs::owner_all));

auto const extrasSubPathMinor = extrasPath / fs::path("minor");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make a constant out of "minor" as it's used in more that one place now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 I don't see where it's being used multiple times in a function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another function (rebuild)

Copy link
Contributor Author

@halfalicious halfalicious Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, most of the paths are used in both places...thought you were referring to within the same function. I can move them to anonymous namespace vars.

libethereum/BlockChain.cpp Outdated Show resolved Hide resolved
libethereum/ClientBase.cpp Outdated Show resolved Hide resolved
@halfalicious
Copy link
Contributor Author

Need to also handle case of the minor version file not existing and the DB format changing

@halfalicious
Copy link
Contributor Author

Will squash some changes to clean up the commit history before merging

@halfalicious
Copy link
Contributor Author

Rebased to fix changelog merge conflict

@halfalicious
Copy link
Contributor Author

Also did a successful rebuild of ~2M blocks.

{
bool databasePathsSet();
void setDatabasePaths(boost::filesystem::path const& _rootPath, h256 const& _genesisHash);
boost::filesystem::path rootPath();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would return const references from these, to avoid copying

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat confusing that we have db::databasePath() and now these. Should all these maybe better put into DBFactory.h?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 I considered that but DBFactory strikes me as more general purpose..it doesn’t include anything related to chains and such. Should I maybe rename the new namespace and files to something like BlockchainDatabasePaths to make the differences more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace and files are fine, but then maybe it makes sense to move db::setDatabasePath to this file and/or merge with setDatabasePaths?

Isn't rootPath here basically the same as db::databasePath ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 Yes, rootPath is the same as db::databasePath. After looking at this a little more I like your original suggestion of merging these together...I'll move the DatabasePaths functions out of that namespace and into the dev::db namespace.

@halfalicious
Copy link
Contributor Author

Still need to merge DatabasePaths with dev::db

libethereum/DatabasePaths.h Outdated Show resolved Hide resolved
Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockChain and State shouldn't set the global data

libethereum/ClientBase.cpp Outdated Show resolved Hide resolved
libethereum/ClientBase.cpp Outdated Show resolved Hide resolved
if (db::isDiskDatabase())
{
db_paths::setDatabasePaths(_basePath, _genesisHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You call this from here and in BlockChain class, so it will be called at least twice, won't it?
Also it makes it impossible to have two instantces of State pointing to different databases.

It should rather be called once somewhere at the higher level (like main function or Client initialization).

Copy link
Contributor Author

@halfalicious halfalicious Nov 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 Yes, it will be called twice, thought that it was simpler this way rather than checking if it was already set and then only setting it in that case. I agree that setting the paths at a higher level makes sense.

For context, when would we ever have two State instances pointing to different databases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in tests, maybe nowhere, but it's one thing to set the global data once and then only read it in various places, but writing global data from various places just doesn't seem good practice - both from thread-safety perspective and from general complexity management perspective

libethereum/State.cpp Show resolved Hide resolved
libethereum/BlockChain.cpp Outdated Show resolved Hide resolved
@halfalicious halfalicious force-pushed the blocksize branch 3 times, most recently from 99a1213 to 5d76fe6 Compare December 3, 2019 05:37
@halfalicious
Copy link
Contributor Author

Rebased to address CHANGELOG merge conflict

@halfalicious
Copy link
Contributor Author

Need to fix OpenDB test error

public:
DatabasePaths(boost::filesystem::path const& _rootPath, h256 const& _genesisHash) noexcept;
DatabasePaths() = default;
void setPaths(boost::filesystem::path const& _rootPath, h256 const& _genesisHash) noexcept;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it immutable class, i.e. have only constructor without setter, and create another instance each time you need to reset the paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 I like the immutable idea, I went with the current approach since I was hesitant to use dynamic allocation for something trivial like paths. I can use unique_ptr and remove the default ctor + setter method which will effectively make the class immutable.

DatabasePaths() = default;
void setPaths(boost::filesystem::path const& _rootPath, h256 const& _genesisHash) noexcept;
bool pathsSet() const noexcept;
boost::filesystem::path const& rootPath() noexcept;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these getters should be const methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course 😀

fs::path(toString(c_databaseVersion));
auto const statePath = chainPath / fs::path("state");

DatabasePaths dbPaths{_basePath, _genesisHash};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const

unsigned lastMinor = c_databaseMinorVersion;
bool rebuildNeeded = false;

if (db::isDiskDatabase())
{
if (!_path.empty())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, when we reopen?
Why is it needed, why can't you just use the path provided to reopen and just reset m_dbPaths in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0: We can, but that seems unnecessary if m_dbPaths is already set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could then just

if (!m_dbPaths.pathsSet()) 
  m_dbPaths.setPaths(_path, m_genesisHash);

But I would probably make it unique_ptr and

if (!m_dbPaths) 
  m_dbPaths = std::make_unique<DatabasePaths>(_path, m_genesisHash);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 unique_ptr makes sense, but I like the current behavior of only setting the path if the supplied path arg is empty since I think that's more intuitive, i.e. if I call a function and pass it an arg I expect that arg to be used and not silently dropped. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but I don't like the hidden features of the function interface, like magic value "empty path" meaning "use previous path".
How about instead of empty path calling it with the current path (as before) and then here:

if (!m_dbPaths || m_dbPaths.rootPath() != _path) 
  m_dbPaths = std::make_unique<DatabasePaths>(_path, m_genesisHash);

then it's kind of an optimization to not reset the paths when not needed. And in practice re-creation will never happen, because it's called with the same path only.

libethereum/BlockChain.cpp Outdated Show resolved Hide resolved
cerror << "Unknown error occurred. Exception details: " << ex.what();

LOG(m_loggerError) << "Unknown error occurred when opening database. Exception details: "
<< ex.what();
throw;
}

if (_we != WithExisting::Verify && !rebuildNeeded && !details(m_genesisHash))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can slightly streamline things by initializing rebuildNeeded to (_we == WithExisting::Verify) and then checking !rebuildNeeded here rather than also checking _we. This would also enable me to skip the minor version checks if rebuildNeeded == true. However, this file already has a ton of churn and this streamlining is pretty minor so I can take care of it in a future change.

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash some commits before merging

Also expose this to RPC callers
Throw exception on rebuild failure, detect existence of temporary extras
database and throw exception, only update database minor version file on
successful rebuild
Remove individual directories in BlockChain::open on WithExisting::Kill
and save genesis block details RLP in local variable before inserting
into extras db.

Untabify BlockDetails and pass ctor args by const&

Cleanup web3.eth.getBlock* calls

Make BlockDetails::blockSizeBytes a size_t instead of unsigned

Minor formatting/logging changes (e.g. untabify BlockDetails files)
Case = minor version file not present but extras database exists. This
should only happen if someone is upgrading to a release of Aleth which
writes the minor version file, and this release will also have an extras
database upgrade so require a rebuild.
New namespace is called dev::eth::database_paths. Also tweak BlockChain::open and BlockChain::rebuild so that
BlockChain::m_dbPath is no longer needed, and set BlockDetails::size in BlockDetails::rlp(). Finally, remove unnecessary boost::filesystem::create_directories calls in BlockChain::open / State::openDB.
Remove misleading log messages when memoryDB is used

Only set database paths if disk db is being used

Remove unused exceptions
Required by one of the tests (BlockChainFrontierSuite/opendb)
Remove unnecessary static_cast when creating BlockDetails and change return
values of DatabasePaths getters to const&

Use const and noexcept where possible and move DatabasePaths from namespace
to immutable class
@halfalicious
Copy link
Contributor Author

Squashed a bunch of commits and rebased

@halfalicious halfalicious merged commit d873d73 into master Dec 4, 2019
@halfalicious halfalicious deleted the blocksize branch December 4, 2019 17:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aleth's block JSON (retrieved via web3.eth.getBlock) has different "size" value than Geth's
3 participants