Hive查询练习二

目标

inner join
left join
full join
union all
综合练习

表结构

image.png

image.png

image.png

image.png

image.png

image.png

image.png


inner join

1.在2019年购买后又退款的用户
'''
※【注意去重】去重31条,没有去重61条
尽量在做表连接之前去重,这样效率才高
'''
select a.user_name
from 
(select distinct user_name
from user_trade 
where year(dt)='2019') a
join 
(select distinct user_name
from user_refund
where year(dt)='2019') b
on a.user_name=b.user_name;
2.在2017年和2018年都购买的用户【134条】
select a.user_name
from
(select distinct user_name
from user_trade
where year(dt)='2017') a
join 
(select distinct user_name
from user_trade
where year(dt)='2018') b
on a.user_name=b.user_name;
3.在2017、2018、2019年都有交易的用户
select a.user_name
from
(select distinct user_name
from trade_2017) a
join 
(select distinct user_name
from trade_2018) b
on a.user_name=b.user_name
join 
(select distinct user_name
from trade_2019) c
on b.user_name=c.user_name;


left join

1.在user_list_1表中但是不在user_list_2的用户
select a.user_id,
a.user_name
from user_list_1 a
left join user_list_2 b
on a.user_name=b.user_name
where b.user_name is null;
'''
【hive不可以在in后面接子查询】
select user_id, 
user_name
from user_list_1 
where user_id not in
(select user_id
from user_list_2);
'''
image.png
2.在2019年购买,但是没有退款的用户【32条】
select a.user_name
from
(select distinct user_name
from user_trade
where year(dt)='2019') a
left join 
(select distinct user_name
from user_refund
where year(dt)='2019') b
on a.user_name=b.user_name
where b.user_name is null;
3.在2019年购买和用户的学历分布【 ※※※】

1.在user_trade表中提取19年购买的用户姓名
2.在user_info表中提取每个人的学历信息
3.根据用户姓名进行连接

select 
b.education,
count(a.user_name)
from
(select distinct user_name
from user_trade
where year(dt)='2019') a
left join 
(select user_name,
get_json_object(extra1,'$.education') education
from user_info ) b
on a.user_name=b.user_name
group by b.education;
image.png
4.在2017年和2018年都购买,但是没有在2019年购买的用户
'''
不需要重命名再左连接
'''
select a.user_name
from
(select distinct user_name
from trade_2017) a
join 
(select distinct user_name
from trade_2018) b
on a.user_name=b.user_name
left join 
(select distinct user_name
from trade_2019) c
on b.user_name=c.user_name
where c.user_name is null;


full join【并集】

1.user_list_1和user_list_2的所有用户

user_list_1:【1,2,3,4,5,6】
user_list_2:【1,3,4,7,8,9】
合并:【1,2,3,4,5,6,7,8,9】

'''
--coalesce函数,按照列名依次读取,遇到非null即停止并返回该值,
如果两个参数都是null,最终返回null
最终显示一列
'''
select coalesce(a.user_name,b.user_name)
from user_list_1 a 
full join user_list_2 b
on a.user_id=b.user_id;

image.png


union all【追加】

  • 字段名称必须一致【不同,重命名即可】
  • 字段顺序必须一致
  • 没有连接条件
将user_list_1和user_list_3合并在一起

user_list_1:【1,2,3,4,5,6】
user_list_3:【90,91,92,93,94】
合并:【1,2,3,4,5,6,90,91,92,93,94】

select user_id,
user_name
from user_list_1
union all 
select user_id,
user_name
from user_list_3;
image.png
1.2017-2019年有交易的所有用户数【※※】
  • 推荐:先把每个表要合并的字段distinct再union all,但最后都要distinct;而union时,最后面就不需要再加distinct
  • 常见的错误:
    1.没有重命名
    2.直接union all 表名,而不是select...from形式
select count(distinct a.user_name)
from 
(select user_name
from trade_2017
union all 
select user_name
from trade_2018
union all
select user_name
from trade_2019) a;
2.2019年每个用户的支付和退款金额汇总
  • 业务逻辑:不是每个人在2019年都有支付金额和退款金额
    即:支付金额为0,退款金额不为0 ;支付金额不为0 ,退款金额为0
'''
用 union all 来实现
1.在trade表中,按照用户分组,每个用户的支付金额和退款金额(补0)
2.在refund表中,按照用户分组,每个用户的退款金额和支付金额(补0)
3.用 union all 追加,然后 group by 用户,再各自求和

'''
select a.user_name,
sum(a.pay_amount),
sum(a.refund_amount)
from 
(select user_name,
    sum(pay_amount) pay_amount,
0 refund_amount
from user_trade
where year(dt)='2019'
group by user_name
union all
select user_name,
0 pay_amount,
sum(refund_amount) refund_amount
from user_refund
where year(dt)='2019'
group by user_name) a
group by a.user_name;


'''
用full join 来实现 (比union all慢很多) 合并时4列
关键:将null值变为0
1.各自按照用户分组进行支付和退款汇总;
2.按用户连接,用 coalesce 函数;
3.将null改为0
'''
select coalesce(a.user_name,b.user_name),
if(a.pay_amount is null,0,a.pay_amount),
if(b.refund_amount is null,0,b.refund_amount)
from
(select user_name,
sum(pay_amount) pay_amount
from user_trade
where year(dt)='2019'
group by user_name) a
full join
(select user_name,
sum(refund_amount) refund_amount
from user_refund
where year(dt)='2019'
group by user_name) b
on a.user_name=b.user_name;
image.png

image.png
'''
拓展:2019年每个支付用户的支付金额和退款金额:
'''
select a.user_name,
a.pay_amount,
if(b.refund_amount is null,0,b.refund_amount)
from
(select user_name,
sum(pay_amount) pay_amount
from user_trade
where year(dt)='2019'
group by user_name) a
left join 
(select user_name,
sum(refund_amount) refund_amount
from user_refund
where year(dt)='2019'
group by user_name) b
on a.user_name=b.user_name;
image.png


综合练习

1.首次激活时间在2019年,但是一直没有支付的用户年龄段分布(user_trade、user_info)
方法一:dt>0条件很重要
'''1.查询首次激活时间在2019年的用户年龄分布'''

select user_name,
case when age<20 then '20岁以下'
            when age>=20 and age<30 then '20-30岁'
            when age>=30 and age<40 then '30-40岁'
            else '40岁以上' end age_level
from user_info
where year(firstactivetime)='2019'

'''2.将上面查询左连接trade,查支付金额为空的部分
(只要在trade这张表中就表示有支付金额)'''

select distinct user_name
from user_trade
where dt>0

'''3.查询首次激活时间在2019年且支付金额为0的用户年龄分布'''

select a.age_level,
count(a.user_name)
from
(select user_name,
case when age<20 then '20岁以下'
            when age>=20 and age<30 then '20-30岁'
            when age>=30 and age<40 then '30-40岁'
            else '40岁以上' end age_level
from user_info
where year(firstactivetime)='2019') a
left join 
(select distinct user_name
from user_trade) b
on a.user_name=b.user_name
where b.user_name is null
group by a.age_level;
image.png
方法二:dt>0条件很重要
'''1.首次激活时间在2019年且支付金额为0的用户姓名和年龄:'''
select a.user_name,
a.age
from
(select distinct user_name,
age       
from user_info 
where year(firstactivetime)='2019') a
left join 
(select user_name
from user_trade 
where dt>0) b
on a.user_name=b.user_name
where b.user_name is null;

'''2.年龄分布查询:'''
select case when age<20 then '20岁以下'
            when age>=20 and age<30 then '20-30岁'
            when age>=30 and age<40 then '30-40岁'
            else '40岁以上' end,
        count(distinct user_id) user_num
from user_info
group by case when age<20 then '20岁以下'
            when age>=20 and age<30 then '20-30岁'
            when age>=30 and age<40 then '30-40岁'
            else '40岁以上' end;

'''3.将上面的查询作为新表,然后进行年龄分布查询:'''
select case when c.age<20 then '20岁以下'
            when c.age>=20 and c.age<30 then '20-30岁'
            when c.age>=30 and c.age<40 then '30-40岁'
            else '40岁以上' end,
        count(distinct c.user_name) user_num
from
(select a.user_name,
a.age
from
(select distinct user_name,
age       
from user_info 
where year(firstactivetime)='2019') a
left join 
(select user_name
from user_trade 
where dt>0) b
on a.user_name=b.user_name
where b.user_name is null) c
group by case when c.age<20 then '20岁以下'
            when c.age>=20 and c.age<30 then '20-30岁'
            when c.age>=30 and c.age<40 then '30-40岁'
            else '40岁以上' end;
2.2018、2019年交易的用户,其激活时间段分布(trade_2018、trade_2019、user_info)
''' 用union不需要重复写distinct
1.查出2018、2019年交易用户的并集
2.用户激活时间段分布
3.合并
'''
select substr(firstactivetime,12,2),
count(a.user_name)
from
(select user_name
from trade_2018
union 
select user_name
from trade_2019) a
left join
user_info c
on a.user_name=c.user_name
group by substr(firstactivetime,12,2);
'''
group by substr(firstactivetime,12,2)等价于
group by hour(firstactivetime)
'''
image.png
3.在2019年购买后又退款的用户性别分布(user_trade、user_refund)
'''1.查出2019年购买后又退款的用户及性别
2.性别分布'''

select c.sex,
count(a.user_name)
from
(select distinct user_name
from user_trade 
where year(dt)='2019') a
join 
(select distinct user_name
from user_refund
where year(dt)='2019') b
on a.user_name=b.user_name
join 
(select distinct user_name,
sex
from user_info) c
on b.user_name=c.user_name
group by c.sex;
image.png
【错误写法】:
select c.user_name
from
((select distinct user_name
from user_trade 
where year(dt)='2019') a
join 
(select distinct user_name
from user_refund
where year(dt)='2019') b
on a.user_name=b.user_name)c;

【正确写法】:
select a.user_name
from
(select distinct user_name
from user_trade 
where year(dt)='2019') a
join 
(select distinct user_name
from user_refund
where year(dt)='2019') b
on a.user_name=b.user_name;
4.在2018年购买,但是没有在2019年购买的用户城市分布(user_trade、user_refund)

重点

select d.city,
count(c.user_name)
from
(select a.user_name
from
(select distinct user_name
from user_trade
where year(dt)='2018') a
left join
(select distinct user_name
from user_trade
where year(dt)='2019') b
on a.user_name=b.user_name
where b.user_name is null) c
left join 
(select distinct user_name,
city
from user_info) d
on c.user_name=d.user_name
group by d.city;
image.png
5.在2017-2019年,有交易但是没有退款的用户的手机品牌分布(trade_2017、trade_2018、trade_2019、user_refund、user_info)
select d.phonebrand,
count(c.user_name)
from
(select a.user_name
from
(select user_name
from trade_2017
union 
select user_name
from trade_2018
union
select user_name
from trade_2019) a
left join 
(select distinct user_name
from user_refund
where year(dt) between '2017' and '2019') b
on a.user_name=b.user_name
where b.user_name is null) c
left join 
(select distinct user_name,
get_json_object(extra1,'$.phonebrand') phonebrand
from user_info) d
on c.user_name=d.user_name
group by phonebrand;
image.png
【错】
select c.user_name
from
((select distinct user_name
from user_trade
where year(dt)='2018') a
left join
(select distinct user_name
from user_trade
where year(dt)='2019') b
on a.user_name=b.user_name
where b.user_name is null) c

【对】
select a.user_name
from
(select user_name
from trade_2017
union 
select user_name
from trade_2018
union
select user_name
from trade_2019) a;


【对】
select a.user_name
from
(select distinct user_name
from user_trade
where year(dt)='2018') a
left join
(select distinct user_name
from user_trade
where year(dt)='2019') b
on a.user_name=b.user_name
where b.user_name is null
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,546评论 6 507
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,224评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,911评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,737评论 1 294
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,753评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,598评论 1 305
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,338评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,249评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,696评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,888评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,013评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,731评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,348评论 3 330
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,929评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,048评论 1 270
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,203评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,960评论 2 355