Skip to content

Commit

Permalink
WIP: database: Major redesign of database package.
Browse files Browse the repository at this point in the history
This commit contains a complete redesign and rewrite of the database
package that approaches things in a vastly different manner than the
previous version.  This is the first part of several stages that will be
needed to ultimately make use of this new package.

Some of the reason for this were discussed in btcsuite#255, however a quick
summary is as follows:

- The previous database could only contain blocks on the main chain and
  reorgs required deleting the blocks from the database.  This made it
  impossible to store orphans and could make external RPC calls for
  information about blocks during the middle of a reorg fail.
- The previous database interface forced a high level of bitcoin-specific
  intelligence such as spend tracking into each backend driver.
- The aforementioned point led to making it difficult to implement new
  backend drivers due to the need to repeat a lot of non-trivial logic
  which is better handled at a higher layer, such as the blockchain
  package.
- The old database stored all blocks in leveldb.  This made it extremely
  inefficient to do things such as lookup headers and individual
  transactions since the entire block had to be loaded from leveldb (which
  entails it doing data copies) to get access.
- The vast majority of database activity after the initial block download
  is read activity, however leveldb, as its name implies, is optimized for
  leveled write performance at the expense of read performance.

In order to address all of these concerns, and others not mentioned, the
database interface has been redesigned as follows:

- Two main categories of functionality are provided: block storage and
  metadata storage
- All block storage and metadata storage are done via read-only and
  read-write MVCC transactions with both manual and managed modes
  - Support for multiple concurrent readers and a single writer
  - Readers use a snapshot and therefore are not blocked by the writer
- Some key properties of the block storage and retrieval API:
  - It is generic and does NOT contain additional bitcoin logic such spend
    tracking and block linking
  - Provides access to the raw serialized bytes so deserialization is not
    forced for callers that don't need it
  - Support for fetching headers via independent functions which allows
    implementations to provide significant optimizations
  - Ability to efficiently retrieve arbitrary regions of blocks
    (transactions, scripts, etc)
- A rich metadata storage API is provided:
  - Key/value with arbitrary data
  - Support for buckets and nested buckets
  - Bucket iteration through a couple of different mechanisms
  - Cursors for efficient and direct key seeking
- Supports registration of backend database implementations
- Comprehensive test coverage
- Provides strong documentation with example usage

This commit also contains an implementation of the previously discussed
interface named ffboltdb (flat file plus boltdb metadata backend).  Here
is a quick overview:

- Highly optimized for read performance
- All blocks are stored in flat files on the file system
- Bulk block region fetching is optimized to perform linear reads which
  improves performance on spindle disks
- The metadata storage uses boltdb under the hood which provides fast
  memory-mapped access
  - NOTE: Due to the fact the database can get quite large and due to how
    the OS does memory-mapped files this can result in the process showing
    a huge amount of memory used.  This is typical for memory-mapped
    files, but the key thing to remember is the reported memory is not
    really used because the OS will relinquish it as soon as other
    processes need it
- Anti-corruption mechanisms:
  - Flat files contain full block checksums to quickly an easily detect
    database corruption without needing to do expensive merkle root
    calculations
  - Metadata checksums
  - Open reconciliation
- Extensive test coverage:
  - Comprehensive blackbox interface testing
  - Whitebox testing which uses intimate knowledge to exercise uncommon
    failure paths such as deleting files out from under the database
  - Corruption tests (replacing random data in the files)

In addition, this commit also contains a new tool under the new database
directory named dbtool which provides a few basic commands for testing the
database.  It is designed around commands, so it could be useful to expand
on in the future.

Finally, this commit addresses the following issues:

- Adds support for and therefore closes btcsuite#255
- Fixes btcsuite#199
- Fixes btcsuite#201
- Implements and closes btcsuite#256
- Obsoletes and closes btcsuite#257
- Closes btcsuite#247 once the required chain and btcd modifications are in place
  to make use of this new code
  • Loading branch information
davecgh committed May 1, 2015
1 parent 6e402de commit 1f40b6d
Show file tree
Hide file tree
Showing 31 changed files with 8,570 additions and 0 deletions.
77 changes: 77 additions & 0 deletions database2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
database
========

[![Build Status](https://travis-ci.org/btcsuite/btcd.png?branch=master)]
(https://travis-ci.org/btcsuite/btcd)

Package database provides a block and metadata storage database.

Please note that this package is intended to enable btcd to support different
database backends and is not something that a client can directly access as only
one entity can have the database open at a time (for most database backends),
and that entity will be btcd.

When a client wants programmatic access to the data provided by btcd, they'll
likely want to use the [btcrpcclient](https://github.com/btcsuite/btcrpcclient)
package which makes use of the [JSON-RPC API]
(https://github.com/btcsuite/btcd/tree/master/docs/json_rpc_api.md).

However, this package could be extremely useful for any applications requiring
Bitcoin block storage capabilities.

As of April 2015, there are over 350,000 blocks in the Bitcoin block chain and
and over 64 million transactions (which turns out to be over 35GB of data).
This package provides a database layer to store and retrieve this data in a
simple and efficient manner.

The default backend, ffboltdb, has a strong focus on speed, efficiency, and
robustness. It makes use of zero-copy memory mapping for the metadata, flat
files for block storage, and checksums in key areas to ensure data integrity.

## Feature Overview

- Key/value metadata store
- Bitcoin block storage
- Efficient retrieval of block headers and regions (transactions, scripts, etc)
- Read-only and read-write transactions with both manual and managed modes
- Nested buckets
- Iteration support including cursors with seek capability
- Supports registration of backend databases
- Comprehensive test coverage

## Documentation

[![GoDoc](https://godoc.org/github.com/btcsuite/btcd/database?status.png)]
(http://godoc.org/github.com/btcsuite/btcd/database)

Full `go doc` style documentation for the project can be viewed online without
installing this package by using the GoDoc site here:
http://godoc.org/github.com/btcsuite/btcd/database

You can also view the documentation locally once the package is installed with
the `godoc` tool by running `godoc -http=":6060"` and pointing your browser to
http://localhost:6060/pkg/github.com/btcsuite/btcd/database

## Installation

```bash
$ go get github.com/btcsuite/btcd/database
```

## Examples

* [Basic Usage Example]
(http://godoc.org/github.com/btcsuite/btcd/database#example-package--BasicUsage)
Demonstrates creating a new database and using a managed read-write
transaction to store and retrieve metadata.

* [Block Storage and Retrieval Example]
(http://godoc.org/github.com/btcsuite/btcd/database#example-package--BlockStorageAndRetrieval)
Demonstrates creating a new database, using a managed read-write transaction
to store a block, and then using a managed read-only transaction to fetch the
block.

## License

Package database is licensed under the [copyfree](http://copyfree.org) ISC
License.
62 changes: 62 additions & 0 deletions database2/cmd/dbtool/fetchblock.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
// Copyright (c) 2015 Conformal Systems LLC.
// Use of this source code is governed by an ISC
// license that can be found in the LICENSE file.

package main

import (
"encoding/hex"
"errors"
"time"

"github.com/btcsuite/btcd/database2"
"github.com/btcsuite/btcd/wire"
)

// fetchBlockCmd defines the configuration options for the fetchblock command.
type fetchBlockCmd struct{}

var (
// fetchBlockCfg defines the configuration options for the command.
fetchBlockCfg = fetchBlockCmd{}
)

// Execute is the main entry point for the command. It's invoked by the parser.
func (cmd *fetchBlockCmd) Execute(args []string) error {
// Setup the global config options and ensure they are valid.
if err := setupGlobalConfig(); err != nil {
return err
}

if len(args) != 1 {
return errors.New("required block hash parameter not specified")
}
blockHash, err := wire.NewShaHashFromStr(args[0])
if err != nil {
return err
}

// Load the block database.
db, err := loadBlockDB()
if err != nil {
return err
}
defer db.Close()

return db.View(func(tx database.Tx) error {
log.Infof("Fetching block %s", blockHash)
startTime := time.Now()
blockBytes, err := tx.FetchBlock(blockHash)
if err != nil {
return err
}
log.Infof("Loaded block in %v", time.Now().Sub(startTime))
log.Infof("Block Hex: %s", hex.EncodeToString(blockBytes))
return nil
})
}

// Usage overrides the usage display for the command.
func (cmd *fetchBlockCmd) Usage() string {
return "<block-hash>"
}
89 changes: 89 additions & 0 deletions database2/cmd/dbtool/fetchblockregion.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
// Copyright (c) 2015 Conformal Systems LLC.
// Use of this source code is governed by an ISC
// license that can be found in the LICENSE file.

package main

import (
"encoding/hex"
"errors"
"strconv"
"time"

"github.com/btcsuite/btcd/database2"
"github.com/btcsuite/btcd/wire"
)

// blockRegionCmd defines the configuration options for the fetchblockregion
// command.
type blockRegionCmd struct{}

var (
// blockRegionCfg defines the configuration options for the command.
blockRegionCfg = blockRegionCmd{}
)

// Execute is the main entry point for the command. It's invoked by the parser.
func (cmd *blockRegionCmd) Execute(args []string) error {
// Setup the global config options and ensure they are valid.
if err := setupGlobalConfig(); err != nil {
return err
}

// Ensure expected arguments.
if len(args) < 1 {
return errors.New("required block hash parameter not specified")
}
if len(args) < 2 {
return errors.New("required start offset parameter not " +
"specified")
}
if len(args) < 3 {
return errors.New("required region length parameter not " +
"specified")
}

// Parse arguments.
blockHash, err := wire.NewShaHashFromStr(args[0])
if err != nil {
return err
}
startOffset, err := strconv.ParseUint(args[1], 10, 32)
if err != nil {
return err
}
regionLen, err := strconv.ParseUint(args[2], 10, 32)
if err != nil {
return err
}

// Load the block database.
db, err := loadBlockDB()
if err != nil {
return err
}
defer db.Close()

return db.View(func(tx database.Tx) error {
log.Infof("Fetching block region %s<%d:%d>", blockHash,
startOffset, regionLen)
region := database.BlockRegion{
Hash: blockHash,
Offset: uint32(startOffset),
Len: uint32(regionLen),
}
startTime := time.Now()
regionBytes, err := tx.FetchBlockRegion(&region)
if err != nil {
return err
}
log.Infof("Loaded block region in %v", time.Now().Sub(startTime))
log.Infof("Region Hex: %s", hex.EncodeToString(regionBytes))
return nil
})
}

// Usage overrides the usage display for the command.
func (cmd *blockRegionCmd) Usage() string {
return "<block-hash> <start-offset> <length-of-region>"
}
121 changes: 121 additions & 0 deletions database2/cmd/dbtool/globalconfig.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
// Copyright (c) 2015 Conformal Systems LLC.
// Use of this source code is governed by an ISC
// license that can be found in the LICENSE file.

package main

import (
"errors"
"fmt"
"os"
"path/filepath"

"github.com/btcsuite/btcd/chaincfg"
"github.com/btcsuite/btcd/database2"
_ "github.com/btcsuite/btcd/database2/ffboltdb"
"github.com/btcsuite/btcd/wire"
"github.com/btcsuite/btcutil"
)

var (
btcdHomeDir = btcutil.AppDataDir("btcd", false)
knownDbTypes = database.SupportedDrivers()
activeNetParams = &chaincfg.MainNetParams

// Default global config.
cfg = &config{
DataDir: filepath.Join(btcdHomeDir, "data"),
DbType: "ffboltdb",
}
)

// config defines the global configuration options.
type config struct {
DataDir string `short:"b" long:"datadir" description:"Location of the btcd data directory"`
DbType string `long:"dbtype" description:"Database backend to use for the Block Chain"`
TestNet3 bool `long:"testnet" description:"Use the test network"`
RegressionTest bool `long:"regtest" description:"Use the regression test network"`
SimNet bool `long:"simnet" description:"Use the simulation test network"`
}

// filesExists reports whether the named file or directory exists.
func fileExists(name string) bool {
if _, err := os.Stat(name); err != nil {
if os.IsNotExist(err) {
return false
}
}
return true
}

// validDbType returns whether or not dbType is a supported database type.
func validDbType(dbType string) bool {
for _, knownType := range knownDbTypes {
if dbType == knownType {
return true
}
}

return false
}

// netName returns the name used when referring to a bitcoin network. At the
// time of writing, btcd currently places blocks for testnet version 3 in the
// data and log directory "testnet", which does not match the Name field of the
// chaincfg parameters. This function can be used to override this directory name
// as "testnet" when the passed active network matches wire.TestNet3.
//
// A proper upgrade to move the data and log directories for this network to
// "testnet3" is planned for the future, at which point this function can be
// removed and the network parameter's name used instead.
func netName(chainParams *chaincfg.Params) string {
switch chainParams.Net {
case wire.TestNet3:
return "testnet"
default:
return chainParams.Name
}
}

// setupGlobalConfig examine the global configuration options for any conditions
// which are invalid as well as performs any addition setup necessary after the
// initial parse.
func setupGlobalConfig() error {
// Multiple networks can't be selected simultaneously.
// Count number of network flags passed; assign active network params
// while we're at it
numNets := 0
if cfg.TestNet3 {
numNets++
activeNetParams = &chaincfg.TestNet3Params
}
if cfg.RegressionTest {
numNets++
activeNetParams = &chaincfg.RegressionNetParams
}
if cfg.SimNet {
numNets++
activeNetParams = &chaincfg.SimNetParams
}
if numNets > 1 {
return errors.New("The testnet, regtest, and simnet params " +
"can't be used together -- choose one of the three")
}

// Validate database type.
if !validDbType(cfg.DbType) {
str := "The specified database type [%v] is invalid -- " +
"supported types %v"
return fmt.Errorf(str, cfg.DbType, knownDbTypes)
}

// Append the network type to the data directory so it is "namespaced"
// per network. In addition to the block database, there are other
// pieces of data that are saved to disk such as address manager state.
// All data is specific to a network, so namespacing the data directory
// means each individual piece of serialized data does not have to
// worry about changing names per network and such.
cfg.DataDir = filepath.Join(cfg.DataDir, netName(activeNetParams))

return nil
}
Loading

0 comments on commit 1f40b6d

Please sign in to comment.