2022-09-22 存储引擎不同的索引结构对比

学习资料：

https://shashankbaravani.medium.com/database-storage-engines-under-the-hood-705418dc0e35

https://blog.csdn.net/qq_31807385/article/details/113662819

核心问题：

了解存储引擎使用的不同的索引结构的特点和场景

存储引擎分类日志结构存储引擎面向页面的存储引擎

1.Hash

基于Hash的索引

无法进行范围查询&规模小稠密索引需要很大的内存存放稠密索引

比如Redis

2.SSTables和LSM Tree

基于SSTable的索引

压缩和合并更快支持分块天生更适合分布式

更好的利用Most Recent原则，多级结构

可以整块加载到内存进行范围查询

更适合写数量级高于读的场景

写 O(1) 读 k*O(n) 块数*每块的大小

内存中啥memtable，需要WAL来恢复数据

SSTable的合并依赖于块的大小以及数据的age

HBase and Cassandra是经典的使用该索引的数据结构

3.B-Trees

基于B树的索引

B-Trees also maintain a sorted in memory map but the underlying file sizes are much smallerblocks of 4KB in line with underlying hardware instead of much larger segments of many MBs.

Thenon leaflevel nodes in the BST contain information around the range of of keys beneath each node. To arrive at a record, we start at therootnode and traverse downwards, breaking down the range at each step ad moving in the direction of the nodes containing the sub-range to which the key belongs.

Theleaflevel nodes either contain the data itself inline or contain references to page blocks which hold the record we are interested in. The number of references to child pages in one page of the B-tree is called thebranching factor(usually in 100s).

B树的写需要把整个Block加载到内存，然后整体更新；这导致了读与写放大

4.Inverted Index

倒排索引

In search indexes such as ElasticSearch, an index is composed of shards

and each shard is an further broken down into segments. Each segment is

an inverted index in itself and this is how it looks like: a collection

of document Ids and the term frequency.

2022-09-22 存储引擎不同的索引结构对比

推荐阅读更多精彩内容