学习资料:
https://shashankbaravani.medium.com/database-storage-engines-under-the-hood-705418dc0e35
https://blog.csdn.net/qq_31807385/article/details/113662819
核心问题:
了解存储引擎使用的不同的索引结构的特点和场景
存储引擎分类 日志结构存储引擎 面向页面的存储引擎
1.Hash
无法进行范围查询&规模小 稠密索引 需要很大的内存存放稠密索引
比如Redis
2.SSTables和LSM Tree
压缩和合并更快 支持分块 天生更适合分布式
更好的利用Most Recent原则,多级结构
可以整块加载到内存进行范围查询
更适合写数量级高于读的场景
写 O(1) 读 k*O(n) 块数*每块的大小
内存中啥memtable,需要WAL来恢复数据
SSTable的合并依赖于块的大小以及数据的age
HBase and Cassandra是经典的使用该索引的数据结构
3.B-Trees
B-Trees also maintain a sorted in memory map but the underlying file sizes are much smallerblocks of 4KB in line with underlying hardware instead of much larger segments of many MBs.
Thenon leaflevel nodes in the BST contain information around the range of of keys beneath each node. To arrive at a record, we start at therootnode and traverse downwards, breaking down the range at each step ad moving in the direction of the nodes containing the sub-range to which the key belongs.
Theleaflevel nodes either contain the data itself inline or contain references to page blocks which hold the record we are interested in. The number of references to child pages in one page of the B-tree is called thebranching factor(usually in 100s).
B树的写需要把整个Block加载到内存,然后整体更新;这导致了读与写放大
4.Inverted Index
In search indexes such as ElasticSearch, an index is composed of shards
and each shard is an further broken down into segments. Each segment is
an inverted index in itself and this is how it looks like: a collection
of document Ids and the term frequency.