在hive中,使用正则表达式来筛选电话号码,电话号码不像手机号一样,电话比较难匹配,需要考虑的情况也比较多,所以在这里我使用正则表达式匹配所有的电话号码,比较长,只要用耐心就比较好整理。
号码由数字或‘-’组成,位数在10到12位之间,其中区号3-4位,号码7-8位
length(regexp_replace(receiver,'-',''))<=12 and
(substr(receiver,1,3)='010'
or substr(receiver,1,3) rlike '^02[0-57-9].*$'
or substr(receiver,1,4) rlike '^031[0-9].*$'
or substr(receiver,1,4) rlike '^0335.*$'
or substr(receiver,1,4) rlike '^0349.*$'
or substr(receiver,1,4) rlike '^035[1-9].*$'
or substr(receiver,1,4) rlike '^037[0-79].*$'
or substr(receiver,1,4) rlike '^039[1-8].*$'
or substr(receiver,1,4) rlike '^041[125-9].*$'
or substr(receiver,1,4) rlike '^042[179].*$'
or substr(receiver,1,4) rlike '^04[35][1-9].*$'
or substr(receiver,1,4) rlike '^046[4789].*$'
or substr(receiver,1,4) rlike '^047[0-9].*$'
or substr(receiver,1,4) rlike '^048[23].*$'
or substr(receiver,1,4) rlike '^052[37].*$'
or substr(receiver,1,4) rlike '^053[0-9].*$'
or substr(receiver,1,4) rlike '^054[36].*$'
or substr(receiver,1,4) rlike '^055[0-9].*$'
or substr(receiver,1,4) rlike '^056[1-46].*$'
or substr(receiver,1,4) rlike '^057[0-9].*$'
or substr(receiver,1,4) rlike '^0580.*$'
or substr(receiver,1,4) rlike '^063[1-5].*$'
or substr(receiver,1,4) rlike '^066[0238].*$'
or substr(receiver,1,4) rlike '^069[12].*$'
or substr(receiver,1,4) rlike '^0701.*$'
or substr(receiver,1,4) rlike '^07[1579][0-9].*$'
or substr(receiver,1,4) rlike '^072[248].*$'
or substr(receiver,1,4) rlike '^073[014-9].*$'
or substr(receiver,1,4) rlike '^074[3-6].*$'
or substr(receiver,1,4) rlike '^076[023689].*$'
or substr(receiver,1,4) rlike '^081[23678].*$'
or substr(receiver,1,4) rlike '^082[567].*$'
or substr(receiver,1,4) rlike '^08[37][0-9].*$'
or substr(receiver,1,4) rlike '^085[14-9].*$'
or substr(receiver,1,4) rlike '^088[3678].*$'
or substr(receiver,1,4) rlike '^089[1-8].*$'
or substr(receiver,1,4) rlike '^090[123689].*$'
or substr(receiver,1,4) rlike '^091[12345679].*$'
or substr(receiver,1,4) rlike '^09[39][0-9].*$'
or substr(receiver,1,4) rlike '^094[13].*$'
or substr(receiver,1,4) rlike '^095[12345].*$'
or substr(receiver,1,4) rlike '^097[012345679].*$'
)