JPA 百万级数据量动态分页查询的优化

分页查询是业务中再常见不过的操作了，在数据量比较小，索引使用得当的情况下，一般的动态查询都没啥性能问题。然而当数据量比较达到百万，千万级，常规的分页查询一般都会出现性能问题。本文不会介绍什么分库分表，缓存之类的优化方案，这些东西在网上千篇一律，不值得在此处拿来讨论。本文将通过具体的案例来讲讲当数据量到达百万量级后，分页到底该怎么做，代码该怎么写。

前言

分页查询是业务中再常见不过的操作了，在数据量比较小，索引使用得当的情况下，一般的动态查询都没啥性能问题。然而当数据量比较达到百万，千万级，常规的分页查询一般都会出现性能问题。本文不会介绍什么分库分表，缓存之类的优化方案，这些东西在网上千篇一律，不值得在此处拿来讨论。本文将通过具体的案例来讲讲当数据量到达百万量级后，分页到底该怎么做，代码该怎么写。

解决方案

基于索引的搜索: 通过在建立索引的基础上建立一个搜索索引，结合参数进行模糊查询，将搜索的结果限定在一个较小范围内。

分治策略: 将这个百万数据集分割成几个确定范围内的子集，将查找任务交给各个子集，最后再聚合起来。此种方法也叫做分治算法，该算法实现简单，也能大大提高查询效率。
Hash表搜索：哈希索引适用于经常被搜索的百万级数据表上。将被搜索字段作为哈希表的键，构建hash表，提前缓存所有可能的结果，有效降低了搜索所需时间

常规查询分页优化

JPA提供的PagingAndSortingRepository接口可以很方便的为我们实现分页，我们只需要继承这个接口或者其子接口JpaRepository就可以实现分页操作。

先看个简单的例子，下面是个无任何查询参数的分页。

public interface AuthorsRepository extends JpaRepository<Authors, Integer> {
}

@Service
public class AuthorsQueryService {

    private final AuthorsRepository authorsRepository;

    public AuthorsQueryService(AuthorsRepository authorsRepository) {
        this.authorsRepository = authorsRepository;
    }

    public Page<Authors> queryPage(Integer pageNo, Integer pageSize) {
        return authorsRepository.findAll (PageRequest.of (pageNo, pageSize));
    }
}

当前的测试数据集有270多万，看看这个查询大概会多长时间呢？在单元测试中执行以下代码：

        long t1 = System.currentTimeMillis ();
        Page<Authors> page = authorsQueryService.queryPage (1,10);
        long t2 = System.currentTimeMillis ();
        System.out.println ("page query cost time : " + (t2-t1));

控制台输出：

Hibernate: 
select authors0_.id as id1_0_, 
    authors0_.added as added2_0_, 
    authors0_.birthdate as birthdat3_0_, 
    authors0_.email as email4_0_, 
    authors0_.first_name as first_na5_0_, 
    authors0_.last_name as last_nam6_0_ 
from authors authors0_ limit ?, ?
Hibernate: 
select count(authors0_.id) as col_0_0_ from authors authors0_
page query cost time : 1205

可以看出，总共耗时1.2s。这个查询已经很慢了，如果算上浏览器传输的时间消耗，时间会更长。对于商业网站来说，页面停顿超过1s，用户大概率会关闭。

当然这个查询也不是没有优化的办法，我们把控制台打印的两条SQL放到Navicat中跑一下，就可以发现，时间基本都用在了第二条统计总量的sql上了，统计总量是为了计算总页数。

所以，优化分页查询的第一个方案：

避免总量统计

对于一些不需要展示总页数的场景来说，这个方案再合适不过了。

JPA提供了返回Slice类型的对象来避免分页时统计总数，我们只需要在dao层增加一个返回Slice的方法就好了：

public interface AuthorsRepository extends JpaRepository<Authors, Integer> {
    Slice<Authors> findAllBy(Pageable pageable);
}

在Service中增加：

    public Slice<Authors> querySlice(Integer pageNo, Integer pageSize) {
        return authorsRepository.findAllBy (PageRequest.of (pageNo, pageSize));
    }

在单元测试中增加代码：

        long t2 = System.currentTimeMillis ();
        Slice<Authors> slice = authorsQueryService.querySlice (1,10);
        long t3 = System.currentTimeMillis ();
        System.out.println ("slice query cost time : " + (t3-t2));

通过控制台可以发现，Slice 确实避免了做分页查询的总量统计，它只用了32ms。

Hibernate: 
select authors0_.id as id1_0_, 
authors0_.added as added2_0_, 
authors0_.birthdate as birthdat3_0_, 
authors0_.email as email4_0_, 
authors0_.first_name as first_na5_0_, 
authors0_.last_name as last_nam6_0_ 
from authors authors0_ limit ?, ?
slice query cost time : 32

这里Slice的返回实际上是SliceImpl对象，虽然它不再提供总量和总页数，但我们可以根据 hashNext 属性来判断是否有下一页。

image.png

这里的分页比较简单，如果是复杂条件动态查询的场景呢？

动态查询分页优化

动态查询简单来说若某个字段存在，则用上这个字段作为查询条件，反之忽略。JPA提供了动态查询的接口JpaSpecificationExecutor用来实现这类动态拼SQL的操作。我们的dao层接口只需要继承它即可：

public interface AuthorsRepository extends JpaRepository<Authors, Integer>, JpaSpecificationExecutor<Authors> {
    Slice<Authors> findAllBy(Pageable pageable);
}

Service增加代码如下，这是个非常简单的动态查询，如果fistName字段有值，则进行like左前缀匹配，如果lastName或者email有值则进行相等匹配。

    public Slice<Authors> dynamicQuery(Authors authors, Integer pageNo, Integer pageSize) {
        return authorsRepository.findAll ((Specification<Authors>) (root, query, criteriaBuilder) -> {
            List<Predicate> list = new ArrayList<> ();
            if (authors.getFirstName () != null && !authors.getFirstName ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .like (root.get("firstName").as(String.class), authors.getFirstName ()+"%"));
            }
            if (authors.getLastName () != null && !authors.getLastName ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .equal(root.get("lastName").as(String.class), authors.getLastName ()));
            }
            if (authors.getEmail () != null && !authors.getEmail ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .equal(root.get("email").as(String.class), authors.getEmail ()));
            }
            Predicate[] p = new Predicate[list.size()];
            return criteriaBuilder.and(list.toArray(p));
        }, PageRequest.of (pageNo, pageSize));
    }

单元测试中增加测试代码：

        Authors queryDto = new Authors ();
        queryDto.setFirstName ("A");
        queryDto.setLastName ("Bosco");
        queryDto.setEmail ("eve54@example.org");
        long t4 = System.currentTimeMillis ();
        Slice<Authors> authorsSlice = authorsQueryService.dynamicQuery (queryDto, 1, 10);
        long t5 = System.currentTimeMillis ();
        System.out.println ("dynamic query cost time :" + (t5-t4));

观察控制台的打印：

Hibernate: select authors0_.id as id1_0_, authors0_.added as added2_0_, authors0_.birthdate as birthdat3_0_, authors0_.email as email4_0_, authors0_.first_name as first_na5_0_, authors0_.last_name as last_nam6_0_ from authors authors0_ where (authors0_.first_name like ?) and authors0_.last_name=? and authors0_.email=? limit ?, ?
Hibernate: select count(authors0_.id) as col_0_0_ from authors authors0_ where (authors0_.first_name like ?) and authors0_.last_name=? and authors0_.email=?
dynamic query cost time :1025

虽然总共耗时大概1s，但是这里有个比较明显的问题：
即使接口声明了返回Slice，但底层还是执行了总量统计
通过debugger查看上面 authorsSlice 的具体实现，可以看出它竟然是PageImpl，而非SliceImpl！

image.png

回归源码，可以看出Page实际上是Slice的子接口，而真正实现无总量统计的分页对象实际上是SliceImpl对象。

image.png

此处，使用 JpaSpecificationExecutor 接口尽管定义了方法返回类型为Slice，但查询依然返回PageImpe，导致分页仍然统计了总量。

进入源码分析，以下为JpaSpecificationExecutor#findAll方法源码:

image.png

由于我们传入了分页参数，所以进入readPage方法：

image.png

通过红框部分可以看出readPage方法一定会执行总量统计。

虽然底层写死了一定会执行总量统计，但是这个方法的访问修饰符是protected，JPA的作者似乎在告诉我们，你要是对这个方法不满意，那就重写它！所以，动态分页的优化核心在于：

重写 readPage 方法
这里的重写也不复杂，只需要去掉executeCountQuery ，然后拼装PageImpl对象即可。

我们定义了静态内部类SimpleJpaNoCountRepository继承SimpleJpaRepository，readPage方法改写分页实现，然后再提供一个findAll方法作为入口，通过调用子类的findAll，那么readPage方法也就会走子类的方法，从而避免分页的总量统计。

@Repository
public class CriteriaNoCountDao {

    @PersistenceContext
    protected EntityManager em;

    public <T, I extends Serializable> Slice<T> findAll(final Specification<T> spec, final Pageable pageable,
                                                        final Class<T> domainClass) {
        final SimpleJpaNoCountRepository<T, I> noCountDao = new SimpleJpaNoCountRepository<> (domainClass, em);
        return noCountDao.findAll (spec, pageable);
    }

    /**
     * Custom repository type that disable count query.
     */
    public static class SimpleJpaNoCountRepository<T, ID extends Serializable> extends SimpleJpaRepository<T, ID> {

        public SimpleJpaNoCountRepository(Class<T> domainClass, EntityManager em) {
            super (domainClass, em);
        }

        @Override
        protected <S extends T> Page<S> readPage(TypedQuery<S> query, Class<S> domainClass, Pageable pageable, Specification<S> spec) {
            query.setFirstResult ((int) pageable.getOffset ());
            query.setMaxResults (pageable.getPageSize ());
            final List<S> content = query.getResultList ();
            return new PageImpl<> (content, pageable, content.size ());
        }
    }
}

在Service中增加调用：

    public Slice<Authors> noPagingDynamicQuery(Authors authors, Integer pageNo, Integer pageSize) {
        return noCountPagingRepository.findAll ((Specification<Authors>) (root, query, criteriaBuilder) -> {
            List<Predicate> list = new ArrayList<> ();
            if (authors.getFirstName () != null && !authors.getFirstName ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .like (root.get("firstName").as(String.class), authors.getFirstName ()+"%"));
            }
            if (authors.getLastName () != null && !authors.getLastName ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .equal(root.get("lastName").as(String.class), authors.getLastName ()));
            }
            if (authors.getEmail () != null && !authors.getEmail ().trim ().isEmpty ()) {
                list.add(criteriaBuilder
                        .equal(root.get("email").as(String.class), authors.getEmail ()));
            }
            Predicate[] p = new Predicate[list.size()];
            return criteriaBuilder.and(list.toArray(p));
        }, PageRequest.of (pageNo, pageSize), Authors.class);
    }

单元测试及控制台输出：

        long t5 = System.currentTimeMillis ();
        Slice<Authors> authorsSlice = authorsQueryService.noPagingDynamicQuery (queryDto, 1, 10);
        long t6 = System.currentTimeMillis ();
        System.out.println ("no paging dynamic query cost time :" + (t6-t5));

Hibernate: 
select authors0_.id as id1_0_, 
authors0_.added as added2_0_, 
authors0_.birthdate as birthdat3_0_, 
authors0_.email as email4_0_, 
authors0_.first_name as first_na5_0_, 
authors0_.last_name as last_nam6_0_ 
from authors authors0_ 
where (authors0_.first_name like ?) 
and authors0_.last_name=? and authors0_.email=? limit ?, ?
no paging dynamic query cost time :148

很明显可以看出来，我们对底层源码的重写生效了，这个重写方案成功地解决了动态查询时，Slice分页一定走总量统计的问题。

最后编辑于：2023.02.14 09:03:40

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 211,561评论 6赞 492
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 90,218评论 3赞 385
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 157,162评论 0赞 348
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 56,470评论 1赞 283
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 65,550评论 6赞 385
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,806评论 1赞 290
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,951评论 3赞 407
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,712评论 0赞 266
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 44,166评论 1赞 303
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,510评论 2赞 327
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,643评论 1赞 340
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 34,306评论 4赞 330
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,930评论 3赞 313
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,745评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,983评论 1赞 266
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 46,351评论 2赞 360
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 43,509评论 2赞 348

JPA 百万级数据量动态分页查询的优化

前言

解决方案

常规查询分页优化

动态查询分页优化

推荐阅读更多精彩内容