Hyperloglog介绍

极小空间完成独立数量统计。本质是个string。千万级别的存储只会消耗极少的内存(几Mb)，但是错误率比较高(0.81%)

命令介绍

命令与用法	说明
pfadd key element [element ...]	向hyperloglog中添加元素
pfcount key	统计hyperloglog中不重复元素的数量
pfmerge targetkey key [otherKey]	合并多个hyperloglog

127.0.0.1:6379> pfadd hyper test1 test2 test3
(integer) 1
127.0.0.1:6379> pfadd hyper test1 test4 test3
(integer) 1
127.0.0.1:6379> pfadd hyper test1 test4 test5 test6
(integer) 1
127.0.0.1:6379> pfcount hyper
(integer) 6
127.0.0.1:6379> pfadd test test1 test2 test3 test7
(integer) 1
127.0.0.1:6379> pfmerge testhyper test hyper
OK
127.0.0.1:6379> pfcount testhyper
(integer) 7
127.0.0.1:6379> pfcount test
(integer) 4
127.0.0.1:6379> pfcount hyper
(integer) 6
127.0.0.1:6379>

Hyperloglog原理

一个Hyperlolog固定需要12K的内存空间。
最直白的解释是，给定一个集合 S，对集合中的每一个元素，我们做一个哈希，假设生成一个 16 位的比特串，从所有生成的比特串中挑选出前面连续 0 次数最多的比特串，假设为 0000000011010110，连续 0 的次数为 8，因此我们可以估计该集合 S 的基数为 2^9。当然单独用这样的单一估计偶然性较大，导致误差较大，因此在实际的 HyperLogLog 算法中，采取分桶平均原理了来消除误差。（这段话引用了 HyperLogLog 原理中的描述，还有一些细节实现感兴趣可阅读 https://blockchain.iethpay.com/hyperloglog-theory.html）

特点：实现牺牲了一定的准确度（在一些场景下是可以忽略的），但却实现了空间复杂度上的极大的压缩，可以说是性价比很高的。

虽然基数不完全准确，但是可以符合，随着数量的递增，基数也是递增的。

统计每日的UV

只需要新建一个hyperlolog，比如pfadd user_20190815 {userid}
如果要统计某一周的，则只需要通过pfmerge命令。

redis hyperloglog实现统计UV

redis hyperloglog实现统计UV

Hyperloglog介绍

命令介绍

Hyperloglog原理

统计每日的UV