一个复杂的标量子查询改写 left join 的例子,在 case when 部分多张表与外表关联,并且有聚合,标量子查询SQL如下,要跑600秒:
SELECT
AAZ661 ZFLSH,
'1' ZFXH,
AAE002 TIMESTAMP,
AAE924 JSXM,
SUM(
CASE
WHEN (
SELECT COUNT(DISTINCT B.BAA526)
FROM t1 A, t2 B
WHERE A.AAA345 = B.AAA345
AND A.AAZ661 = Q.AAZ661
AND (B.BAA531 = Q.LTXBZ OR B.BAA531 = '9' OR Q.LTXBZ = '9')
AND B.BAA526 IN ('3', '4')
) = 2
THEN
CASE
WHEN BAA526 = '4' THEN AAE774
WHEN AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND BAA526 = '5' THEN AAE019 - ZZZHBF
ELSE AAE019
END
ELSE AAE019
END
) JE
FROM Q
GROUP BY AAZ661, AAE002, AAE924;
第一次改写:
SELECT
Q.AAZ661 ZFLSH,
'1' ZFXH,
Q.AAE002 TIMESTAMP,
Q.AAE924 JSXM,
SUM(
CASE
WHEN V.COUNT_BAA526 = 2
THEN
CASE
WHEN B.BAA526 = '4' THEN Q.AAE774
WHEN Q.AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND B.BAA526 = '5' THEN Q.AAE019 - Q.ZZZHBF
ELSE Q.AAE019
END
ELSE Q.AAE019
END
) JE
FROM Q
LEFT JOIN (
SELECT
A.AAZ661,B.BAA531
COUNT(DISTINCT B.BAA526) AS COUNT_BAA526
FROM t1 A
JOIN t2 B ON A.AAA345 = B.AAA345
WHERE B.BAA526 IN ('3', '4')
GROUP BY A.AAZ661,B.BAA531
) V ON Q.AAZ661 = V.AAZ661 and(Q.LTXBZ = V.BAA531 OR V.BAA531 = '9' OR Q.LTXBZ = '9')
GROUP BY Q.AAZ661, Q.AAE002, Q.AAE924;
改写后只要50秒,但是结果不对,需要修正(chatgpt帮忙做的修正,值得一提的是提问方法,一开始问他原始 SQL 应该怎么改,总是给出明显有错误的 SQL;但后来我先给出一个改写后的 SQL,问他改写的是否正确时,他就能给出下面这个答案了):
- 子查询中的 GROUP BY 只需要按 A.AAZ661 进行分组,因为我们计算的是 COUNT(DISTINCT B.BAA526),并且获取 BAA531 的最大值。
- 在子查询中使用 MAX(B.BAA531) 获取 BAA531 的最大值,这样在 LEFT JOIN 时可以正确地进行条件判断。
其实没太理解为什么要取 MAX(B.BAA531) ,尽管它的结果是对的。按道理来说,原始 SQL 是取了 Q 表每行数据中的 Q.AAZ661、Q.LTXBZ 代入子查询,对满足条件的记录计算 COUNT(DISTINCT B.BAA526) ,那改成 left join 后,应该对 AAZ661、BAA531 进行分组,第一次改写的 SQL 就是这个逻辑,但是结果却不对。希望有老师能指点下,不胜感激。
SELECT
Q.AAZ661 AS ZFLSH,
'1' AS ZFXH,
Q.AAE002 AS TIMESTAMP,
Q.AAE924 AS JSXM,
SUM(
CASE
WHEN V.COUNT_BAA526 = 2
THEN
CASE
WHEN Q.BAA526 = '4' THEN Q.AAE774
WHEN Q.AAE924 NOT IN ('7004', '7008', '7009', '7010', '7011', '7029') AND Q.BAA526 = '5' THEN Q.AAE019 - Q.ZZZHBF
ELSE Q.AAE019
END
ELSE Q.AAE019
END
) AS JE
FROM Q
LEFT JOIN (
SELECT
A.AAZ661,
COUNT(DISTINCT B.BAA526) AS COUNT_BAA526,
MAX(B.BAA531) AS BAA531
FROM t1 A
JOIN t2 B ON A.AAA345 = B.AAA345
WHERE B.BAA526 IN ('3', '4')
GROUP BY A.AAZ661
) V ON Q.AAZ661 = V.AAZ661
AND (V.BAA531 = Q.LTXBZ OR V.BAA531 = '9' OR Q.LTXBZ = '9')
GROUP BY Q.AAZ661, Q.AAE002, Q.AAE924;