filterblock用于确定某个key是否存在于某个datablock中,在插入一个key到datablock中时也会插入一个key到filterblock中,filterblock中会记录所有的key,并通过布隆过滤器来确定一个key是否存于这个datablock中。
下面来看下filterblock的代码:
class FilterBlockBuilder {public:explicit FilterBlockBuilder(const FilterPolicy*);FilterBlockBuilder(const FilterBlockBuilder&) = delete;FilterBlockBuilder& operator=(const FilterBlockBuilder&) = delete;void StartBlock(uint64_t block_offset);void AddKey(const Slice& key);Slice Finish();private:void GenerateFilter();const FilterPolicy* policy_;std::string keys_; // Flattened key contentsstd::vector<size_t> start_; // Starting index in keys_ of each keystd::string result_; // Filter data computed so farstd::vector<Slice> tmp_keys_; // policy_->CreateFilter() argumentstd::vector<uint32_t> filter_offsets_;
};class FilterBlockReader {public:// REQUIRES: "contents" and *policy must stay live while *this is live.FilterBlockReader(const FilterPolicy* policy, const Slice& contents);bool KeyMayMatch(uint64_t block_offset, const Slice& key);private:const FilterPolicy* policy_;const char* data_; // Pointer to filter data (at block-start)const char* offset_; // Pointer to beginning of offset array (at block-end)size_t num_; // Number of entries in offset arraysize_t base_lg_; // Encoding parameter (see kFilterBaseLg in .cc file)
};
FilterBlockBuilder类中,主要有四个方法,分别是StartBlock,AddKey,Finish和GenerateFilter,
AddKey
首先看AddKey,代码如下:
void FilterBlockBuilder::AddKey(const Slice& key) {Slice k = key;start_.push_back(keys_.size());keys_.append(k.data(), k.size());
}
每添加一个key value到databloc,就同步存一个key到filterblock,keys_保存每个key字符串,start_数组每次保存keys_字符串的长度,也就是每个key相对于起始keys_字符串的起始位置的偏移量。
GenerateFilter
可以生成一个filterblock的数据部分,其原理是,当存入datablock的数据已经达到指定大小时,生成一个filterblock的数据块,这个数据块存储的是所有key的hash值,因此可以通过这个filterblock块来确定一个key是否在这个block中。
void FilterBlockBuilder::GenerateFilter() {const size_t num_keys = start_.size();if (num_keys == 0) {// Fast path if there are no keys for this filter//GenerateFilter调用一次会将当前keys存储的所有key进行一次filter计算,并将结果存储到result_中,当连续调用第二次时,fiter_offsets_记录的偏移量还是一次result_的大小,并且不会再重新计算result_,//但是fiter_offsets_数组的元素个数与block_offset/2kb是一致的,即可以通过block_offset/2kb去查询这个block对应的filter的位置。filter_offsets_.push_back(result_.size());return;}// Make list of keys from flattened key structure//这次的push_back是为了便于计算,因为start_[i+1]在i=num_keys-1时会越界。start_.push_back(keys_.size()); // Simplify length computationtmp_keys_.resize(num_keys);//获取所有的keyfor (size_t i = 0; i < num_keys; i++) {const char* base = keys_.data() + start_[i];size_t length = start_[i + 1] - start_[i];tmp_keys_[i] = Slice(base, length);}// Generate filter for current set of keys and append to result_.//将上次生成的filter的大小存入result_,这就是生成的当前这个filter的起始偏移点。filter_offsets_.push_back(result_.size());policy_->CreateFilter(&tmp_keys_[0], static_cast<int>(num_keys), &result_);tmp_keys_.clear();keys_.clear();start_.clear();
}
StartBlock
根据datablock的大小,如果超过2kb的大小,就会生成一个filterblock,这个filterblock会存储当前这个block的所有key的hash值,然后block_offset/kFilterBase就是这个datablock在filterblock中的索引,只不过有可能出现几个filterblock索引指向的是相同的filterblock数据区。
所以查某个datablock中是否有一个key的流程大致是:
该datablock的大小/kFilterBase得到一个索引index,在filterblock中查找这个index所指向的filterblock的数据区,在filterblock的数据区中查找是否有这个key。
void FilterBlockBuilder::StartBlock(uint64_t block_offset) {uint64_t filter_index = (block_offset / kFilterBase);assert(filter_index >= filter_offsets_.size());while (filter_index > filter_offsets_.size()) {GenerateFilter();}
}
Finish
//filterblock生成完成,添加元数据到最后
Slice FilterBlockBuilder::Finish() {//将未生成filter的keys生成一次filterif (!start_.empty()) {GenerateFilter();}// Append array of per-filter offsetsconst uint32_t array_offset = result_.size();for (size_t i = 0; i < filter_offsets_.size(); i++) {PutFixed32(&result_, filter_offsets_[i]);}PutFixed32(&result_, array_offset);result_.push_back(kFilterBaseLg); // Save encoding parameter in resultreturn Slice(result_);
}