Hive一行拆多行、多行拼一行

一行拆多行结合使用了lateral viewexplode
多行拼一行结合使用了concat_wscollect_set,某些hive版本支持group_concat可以代替前者。

举例说明:
有两张表sen_tbl(敏感字段定义表)msg_tbl(消息表)。其中消息表的消息是一个json字符串,最多有两层json嵌套。而json消息中的某些value是含有敏感信息的,key-value是否敏感的定义存放在sen_tbl中。

现在要做的是将json消息中的敏感value加密,然后分成不含敏感的json,和只含敏感的json。所以需要分为两步:

  1. 拆分json,并加密敏感value
  2. 分别合并敏感和非敏感键值对,合并成2个json,放在不同的列
1. 建表,插入数据:
create table sen_tbl (
    col_nam string,  --字段(key)名称
    is_sen string  --是否敏感:1敏感,2不敏感
);

insert into sen_tbl values('CRDACCPTNMELCT','1');  --敏感
insert into sen_tbl values('TXNMERCHNO','1');  --敏感
insert into sen_tbl values('ACCTNBR','0');
insert into sen_tbl values('CRSERVICECODE','0');
insert into sen_tbl values('modelFileId','0');
insert into sen_tbl values('trs_ist','0');


drop table if exists msg_tbl;
create table msg_tbl (
    msg_id string,  --消息id
    json_msg string  --消息主体
);


insert into msg_tbl values(
'1',
'{
    "sceneParams": {
        "ACCTNBR": "0100020100001001",
        "CRDACCPTNMELCT": "如家和美酒店管理有限公司如家快捷天坛店",
        "CRSERVICECODE": "340",
        "TXNMERCHNO": "303605280000102"
    },
    "modelFileId": 2,
    "modelFileThreshold": {
        "trs_ist": 0.4
    }
}'
);

insert into msg_tbl values(
'2',
'{
    "sceneParams": {
        "ACCTNBR": "0100020100001222",
        "CRDACCPTNMELCT": "招商银行",
        "CRSERVICECODE": "340",
        "TXNMERCHNO": "303605285555555"
    }
}'
);
2. 将json消息的第一层拆分成多行数据:
create table explode_msg as
select msg_id,   --消息id
       '1' as json_lv,   --key在json中的层级
       tbl1.key, 
       tbl1.value
  from msg_tbl
  lateral view explode(default.json_to_map(json_msg)) tbl1 as key,value
;

SQL执行结果数据在简书中没有对齐,在其他编辑器入notepad++中是可以对齐的

+---------------------+----------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| explode_msg.msg_id  | explode_msg.json_lv  |   explode_msg.key   |                                                               explode_msg.value                                                                |
+---------------------+----------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| 1                   | 1                    | modelFileId         | 2                                                                                                                                              |
| 1                   | 1                    | sceneParams         | {"CRDACCPTNMELCT":"如家和美酒店管理有限公司如家快捷天坛店","CRSERVICECODE":"340","ACCTNBR":"0100020100001001","TXNMERCHNO":"303605280000102"}  |
| 1                   | 1                    | modelFileThreshold  | {"trs_ist":0.4}                                                                                                                                |
| 2                   | 1                    | sceneParams         | {"CRDACCPTNMELCT":"招商银行","CRSERVICECODE":"340","ACCTNBR":"0100020100001222","TXNMERCHNO":"303605285555555"}                                |
+---------------------+----------------------+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
3. 将json消息第二层拆分成多行数据:
drop table if exists explode_msg_2;
create table explode_msg_2 as
select msg_id, 
       '2' as json_lv, 
       key, 
       tbl1.key2, 
       tbl1.value2
  from explode_msg
  lateral view explode(default.json_to_map(value)) tbl1 as key2,value2
;
+-----------------------+------------------------+---------------------+---------------------+-----------------------------------------+
| explode_msg_2.msg_id  | explode_msg_2.json_lv  |  explode_msg_2.key  | explode_msg_2.key2  |          explode_msg_2.value2           |
+-----------------------+------------------------+---------------------+---------------------+-----------------------------------------+
| 1                     | 2                      | sceneParams         | CRDACCPTNMELCT      | 如家和美酒店管理有限公司如家快捷天坛店  |
| 1                     | 2                      | sceneParams         | CRSERVICECODE       | 340                                     |
| 1                     | 2                      | sceneParams         | ACCTNBR             | 0100020100001001                        |
| 1                     | 2                      | sceneParams         | TXNMERCHNO          | 303605280000102                         |
| 1                     | 2                      | modelFileThreshold  | trs_ist             | 0.4                                     |
| 2                     | 2                      | sceneParams         | CRDACCPTNMELCT      | 招商银行                                |
| 2                     | 2                      | sceneParams         | CRSERVICECODE       | 340                                     |
| 2                     | 2                      | sceneParams         | ACCTNBR             | 0100020100001222                        |
| 2                     | 2                      | sceneParams         | TXNMERCHNO          | 303605285555555                         |
+-----------------------+------------------------+---------------------+---------------------+-----------------------------------------+
4. 合并以上两步的数据,并对敏感value加密:
drop table if exists explode_msg_fnl;
create table explode_msg_fnl as
select msg_id,
       json_lv,
       key as key1,
       key2,
       value2 as value,
       case when b.is_sen = '1' then md5(a.value2) else a.value2 end as value_sen
  from explode_msg_2 a
  join sen_tbl b
    on a.key2 = b.col_nam
union all
select msg_id,
       json_lv,
       key as key1,
       null,
       value,
       case when b.is_sen = '1' then md5(a.value) else a.value end as value_sen
  from explode_msg a
  join sen_tbl b
    on a.key = b.col_nam
 where value not like '{"%'
;
+-------------------------+--------------------------+-----------------------+-----------------------+-----------------------------------------+-----------------------------------+
| explode_msg_fnl.msg_id  | explode_msg_fnl.json_lv  | explode_msg_fnl.key1  | explode_msg_fnl.key2  |          explode_msg_fnl.value          |     explode_msg_fnl.value_sen     |
+-------------------------+--------------------------+-----------------------+-----------------------+-----------------------------------------+-----------------------------------+
| 1                       | 1                        | modelFileId           | NULL                  | 2                                       | 2                                 |
| 1                       | 2                        | sceneParams           | ACCTNBR               | 0100020100001001                        | 0100020100001001                  |
| 1                       | 2                        | modelFileThreshold    | trs_ist               | 0.4                                     | 0.4                               |
| 1                       | 2                        | sceneParams           | CRSERVICECODE         | 340                                     | 340                               |
| 1                       | 2                        | sceneParams           | TXNMERCHNO            | 303605280000102                         | ecb0166601a8264180164810a2df4ee9  |
| 1                       | 2                        | sceneParams           | CRDACCPTNMELCT        | 如家和美酒店管理有限公司如家快捷天坛店  | d531cfc939890cfbb3127f59bc30060a  |
| 2                       | 2                        | sceneParams           | ACCTNBR               | 0100020100001222                        | 0100020100001222                  |
| 2                       | 2                        | sceneParams           | CRSERVICECODE         | 340                                     | 340                               |
| 2                       | 2                        | sceneParams           | TXNMERCHNO            | 303605285555555                         | 0ab5cdd213a8313d69b3d8e1b5e1eadb  |
| 2                       | 2                        | sceneParams           | CRDACCPTNMELCT        | 招商银行                                | e0f88f4dbec781d1ab8402e53f0e25c3  |
+-------------------------+--------------------------+-----------------------+-----------------------+-----------------------------------------+-----------------------------------+
5. 重新合并json的第二层的key、value,多行合并为一行:
drop table if exists result_msg;
create table result_msg as
select msg_id, key1, is_sen, concat('"',value,'"') as key1_val
  from explode_msg_fnl
 where json_lv = 1

union all
select msg_id, 
       key1,
       is_sen,
       concat('{', concat_ws(',', collect_set(map_val)), '}') as key1_val
  from (select msg_id, 
               key1, 
               concat('"',key2,'":"',value,'"') as map_val,
               is_sen
          from explode_msg_fnl
         where json_lv = 2
       ) t
 group by msg_id, key1, is_sen
;
+--------------------+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
| result_msg.msg_id  |   result_msg.key1   | result_msg.is_sen  |                                          result_msg.key1_val                                           |
+--------------------+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
| 1                  | modelFileThreshold  | 0                  | {"trs_ist":"0.4"}                                                                                      |
| 1                  | sceneParams         | 0                  | {"CRSERVICECODE":"340","ACCTNBR":"0100020100001001"}                                                   |
| 1                  | sceneParams         | 1                  | {"CRDACCPTNMELCT":"d531cfc939890cfbb3127f59bc30060a","TXNMERCHNO":"ecb0166601a8264180164810a2df4ee9"}  |
| 2                  | sceneParams         | 0                  | {"CRSERVICECODE":"340","ACCTNBR":"0100020100001222"}                                                   |
| 2                  | sceneParams         | 1                  | {"CRDACCPTNMELCT":"e0f88f4dbec781d1ab8402e53f0e25c3","TXNMERCHNO":"0ab5cdd213a8313d69b3d8e1b5e1eadb"}  |
| 1                  | modelFileId         | 0                  | "2"                                                                                                    |
+--------------------+---------------------+--------------------+--------------------------------------------------------------------------------------------------------+
6. 合并json第一层的key、value:
drop table if exists result_msg_2;
create table result_msg_2 as
select msg_id, 
       is_sen,
       concat('{', concat_ws(',', collect_set(json_val)), '}') as json_val
  from (select msg_id,
               is_sen,
               concat('"',key1,'": ',key1_val) as json_val
          from result_msg
       ) t
 group by msg_id, is_sen
;
+----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| result_msg_2.msg_id  | result_msg_2.is_sen  |                                                       result_msg_2.json_val                                                       |
+----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------+
| 1                    | 0                    | {"modelFileThreshold": {"trs_ist":"0.4"},"sceneParams": {"CRSERVICECODE":"340","ACCTNBR":"0100020100001001"},"modelFileId": "2"}  |
| 1                    | 1                    | {"sceneParams": {"CRDACCPTNMELCT":"d531cfc939890cfbb3127f59bc30060a","TXNMERCHNO":"ecb0166601a8264180164810a2df4ee9"}}            |
| 2                    | 0                    | {"sceneParams": {"CRSERVICECODE":"340","ACCTNBR":"0100020100001222"}}                                                             |
| 2                    | 1                    | {"sceneParams": {"CRDACCPTNMELCT":"e0f88f4dbec781d1ab8402e53f0e25c3","TXNMERCHNO":"0ab5cdd213a8313d69b3d8e1b5e1eadb"}}            |
+----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------+
7. 行转列,一列为不包含敏感的json,另一列为包含敏感的json:
create table result_msg_fnl as
select msg_id,
       max(case when is_sen = 0 then json_val end) as json_val,
       max(case when is_sen = 1 then json_val end) as json_val_sen
  from result_msg_2
 group by msg_id
;
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
| result_msg_fnl.msg_id  |                                                      result_msg_fnl.json_val                                                      |                                               result_msg_fnl.json_val_sen                                               |
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
| 1                      | {"modelFileThreshold": {"trs_ist":"0.4"},"sceneParams": {"CRSERVICECODE":"340","ACCTNBR":"0100020100001001"},"modelFileId": "2"}  | {"sceneParams": {"CRDACCPTNMELCT":"d531cfc939890cfbb3127f59bc30060a","TXNMERCHNO":"ecb0166601a8264180164810a2df4ee9"}}  |
| 2                      | {"sceneParams": {"CRSERVICECODE":"340","ACCTNBR":"0100020100001222"}}                                                             | {"sceneParams": {"CRDACCPTNMELCT":"e0f88f4dbec781d1ab8402e53f0e25c3","TXNMERCHNO":"0ab5cdd213a8313d69b3d8e1b5e1eadb"}}  |
+------------------------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,265评论 6 490
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,078评论 2 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 156,852评论 0 347
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,408评论 1 283
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,445评论 5 384
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,772评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,921评论 3 406
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,688评论 0 266
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,130评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,467评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,617评论 1 340
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,276评论 4 329
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,882评论 3 312
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,740评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,967评论 1 265
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,315评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,486评论 2 348