hive sql 条件放在on和where区别比较

原理:

关联时会先创建临时表t1和t2,where后面的条件会先过滤t1、t2临时表后在关联,on后面的条件会先关联t1、t2后在过滤。

结论:

条件在on后面,主表数据量不变,副表只显示符合条件的。
条件在where后面,先过滤条件在关联。
测试时直接替换dmp库就可以执行

-- 条件放在on和where后面的区别:
use tmp;
drop table tmp.yl_test_1;
drop table tmp.yl_test_2;
create table tmp.yl_test_1(id int,name string,birthday string);
create table tmp.yl_test_2(id int,age int,birthday string);
insert into tmp.yl_test_1 values(1,'aa','2023-12-01');
insert into tmp.yl_test_1 values(2,'bb','2023-12-12');
insert into tmp.yl_test_1 values(3,'cc','2023-12-30');
insert into tmp.yl_test_2 values(1,40,'2023-12-04');
insert into tmp.yl_test_2 values(2,50,'2023-12-10');
insert into tmp.yl_test_2 values(2,50,'2023-12-20');

select * from tmp.yl_test_1 t1;
id  name    birthday
1   aa  2023-12-01
2   bb  2023-12-12
3   cc  2023-12-30

select * from tmp.yl_test_2 t1;
id  age birthday
1   40  2023-12-04
2   50  2023-12-10
2   50  2023-12-20

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id;
id  name    birthday    id2 age birthday2
2   bb  2023-12-12  2   50  2023-12-10
2   bb  2023-12-12  2   50  2023-12-20
1   aa  2023-12-01  1   40  2023-12-04
3   cc  2023-12-30  \N  \N  \N

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id and t1.id = 3;
id  name    birthday    id2 age birthday2
1   aa  2023-12-01  \N  \N  \N
2   bb  2023-12-12  \N  \N  \N
3   cc  2023-12-30  \N  \N  \N

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id where t1.id = 3;
id  name    birthday    id2 age birthday2
3   cc  2023-12-30  \N  \N  \N

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id and t2.id = 3;
id  name    birthday    id2 age birthday2
1   aa  2023-12-01  \N  \N  \N
2   bb  2023-12-12  \N  \N  \N
3   cc  2023-12-30  \N  \N  \N

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id where t2.id = 3;
id  name    birthday    id2 age birthday2

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id and t1.birthday > t2.birthday;
id  name    birthday    id2 age birthday2
3   cc  2023-12-30  \N  \N  \N
2   bb  2023-12-12  2   50  2023-12-10
1   aa  2023-12-01  \N  \N  \N

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id where t1.birthday > t2.birthday;
id  name    birthday    id2 age birthday2
2   bb  2023-12-12  2   50  2023-12-10

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id and t1.birthday < t2.birthday;
id  name    birthday    id2 age birthday2
3   cc  2023-12-30  \N  \N  \N
2   bb  2023-12-12  2   50  2023-12-20
1   aa  2023-12-01  1   40  2023-12-04

select * from tmp.yl_test_1 t1 
left join 
tmp.yl_test_2 t2 on t1.id = t2.id where t1.birthday < t2.birthday;
id  name    birthday    id2 age birthday2
2   bb  2023-12-12  2   50  2023-12-20
1   aa  2023-12-01  1   40  2023-12-04
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容