Appearance
btcd数据库设计
块的存储
由于区块链块信息的不变性,btcd将整个比特币的区块想象成一个超级大的平坦文件,块与块相邻存放. 由于文件系统的限制,将这个超级大文件拆分成n多文件块,每块都是512M. 假定我知道我要读的第3000block的起始是1000000000000,大小是3M,那么很简单的算法. 一个文件块是512M字节=51210241024=536870912字节, 因此,我位于第1862(1000000000000/536870912)文件块,文件块内偏移是346361856(1000000000000%536870912)字节, 那么我就读取第1862文件块中的346361856到349507584即可.
当然具体实现还要考虑到缓存以及block跨越文件块的问题
接口
btcd关于数据的设计完全是面向接口的,最终暴露出来的是ffldb/interface.go中的几个接口, 主要是DB,Cursor,Bucket,Tx 全部都是接口. 真正的实现位于ffldb/db.go中
- Bucket是真正存数据的地方,并且提供了直接读写KV接口
- 一组读写Bucket使用Tx进行管理,
- Cursor是用来遍历Bucket
go
// db represents a collection of namespaces which are persisted and implements
// the database.DB interface. All database access is performed through
// transactions which are obtained through the specific Namespace.
type db struct {
writeLock sync.Mutex // Limit to one write transaction at a time.
closeLock sync.RWMutex // Make database close block while txns active.
closed bool // Is the database closed?
store *blockStore // Handles read/writing blocks to flat files.
cache *dbCache // Cache layer which wraps underlying leveldb DB.
}
// transaction represents a database transaction. It can either be read-only or
// read-write and implements the database.Bucket interface. The transaction
// provides a root bucket against which all read and writes occur.
type transaction struct {
managed bool // Is the transaction managed?
closed bool // Is the transaction closed?
writable bool // Is the transaction writable?
db *db // DB instance the tx was created from.
snapshot *dbCacheSnapshot // Underlying snapshot for txns.
metaBucket *bucket // The root metadata bucket.
blockIdxBucket *bucket // The block index bucket.
// Blocks that need to be stored on commit. The pendingBlocks map is
// kept to allow quick lookups of pending data by block hash.
pendingBlocks map[chainhash.Hash]int
pendingBlockData []pendingBlock
// Keys that need to be stored or deleted on commit.
pendingKeys *treap.Mutable
pendingRemove *treap.Mutable
// Active iterators that need to be notified when the pending keys have
// been updated so the cursors can properly handle updates to the
// transaction state.
activeIterLock sync.RWMutex
activeIters []*treap.Iterator
}
// bucket is an internal type used to represent a collection of key/value pairs
// and implements the database.Bucket interface.
type bucket struct {
tx *transaction
id [4]byte
}
// cursor is an internal type used to represent a cursor over key/value pairs
// and nested buckets of a bucket and implements the database.Cursor interface.
type cursor struct {
bucket *bucket
dbIter iterator.Iterator
pendingIter iterator.Iterator
currentIter iterator.Iterator
}
这里面实现机制比较复杂,基本思路是用leveldb来存block的元数据信息,真正的block是按照文件进行存储的. 这样的设计好处是规避了leveldb写大量数据时的低性能问题.
同时还有缓存管理,尤其是transaction中使用的treap模块 很有意思,需要专门花时间研究, treap这个数据结构可以做到检索,插入,删除都是O(logn), 是一个比较巧妙的设计.