数据分析师福利!Python SQLAlchemy 连接数据库+各类查询大全

在上手之前先确保你的这些软件/第三方库已经装好,可以使用:

  • oracle database 11g 客户端+服务端
  • python 3.7.7
    注:如果你的python是64位的, oracle database的客户端也要求是64位。
  • cx-Oracle 7.3.0
    注:连接ORACLEDB需要装cx_oracle,连接MYSQL需要装pymysql,连接Hadoop(HDFS)需要装pyhive。无论操作什么数据库,SQLAlchemy的查询语法几乎一致,代码可重用性高。而原生SQL是存在语法差异的,同一SQL语句,换个数据库就可能运行不了了。
  • SQLAlchemy 1.3.16
  • pandas 1.0.3
    如果pandas安装成功但是import pandas失败(WIN10系统),请将concrt140.dll、msvcp140.dll放到python的安装目录下。如图:


    绿色版动态运行库

    或者安装这个:


    从微软官网下载的安装版运行库

    pandas是一个数据分析库,完虐EXCEL。笔者学会用Python之后就把Office卸载了。
    没错,就是这么干净

笔者是一个重度洁癖患者,Python和VSCode都用绿色版。
oracle客户端用的也是绿色版,服务端没办法弄绿色版,就装在自家的一台服务器上,通过局域网访问。

准备演示数据库

为了演示,先创建表和数据:
新建cmd终端 用数据库管理员登录oracle database:

> sqlplus sys/123456 as sysdba

创建新用户

>>> create user abc identified by 123;

授予DBA权限

grant connect,resource,dba to abc;

用新的用户登录

> sqlplus abc/321

创建第一张表

>>> CREATE TABLE Websites
(
id integer PRIMARY KEY,
name varchar(255),
url varchar(255),
alexa integer,
country varchar(255)
);

向第一张表插入数据

>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('1','Google','https://www.google.cm/','1','USA');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('2','淘宝','https://www.taobao.com/','13','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('3','菜鸟教程','http://www.runoob.com/','4689','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('4','微博','http://weibo.com/','20','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('5','Facebook','https://www.facebook.com/','3','USA');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('6','百度','https://www.baidu.com/','4','CN');
>>> INSERT INTO Websites (id, name, url, alexa, country)
VALUES ('7','stackoverflow','http://stackoverflow.com/','0','IND');

查看数据

>>> select * from Websites;
+----+---------------+---------------------------+-------+---------+
| id | name          | url                       | alexa | country |
+----+---------------+---------------------------+-------+---------+
| 1  | Google        | https://www.google.cm/    | 1     | USA     |
| 2  | 淘宝          | https://www.taobao.com/   | 13    | CN      |
| 3  | 菜鸟教程      | http://www.runoob.com/    | 4689  | CN      |
| 4  | 微博          | http://weibo.com/         | 20    | CN      |
| 5  | Facebook      | https://www.facebook.com/ | 3     | USA     |
| 6  | 百度          | https://www.baidu.com/    | 4     | CN      |
| 7  | stackoverflow | http://stackoverflow.com/ | 0     | IND     |
+----+---------------+---------------------------+-------+---------+

创建第二张表

>>> CREATE TABLE access_log
(
aid integer PRIMARY KEY,
site_id integer,
count integer,
col_date date,
CONSTRAINT fk_site_id FOREIGN KEY (site_id) REFERENCES Websites(id)
);

插入数据

>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('1','1','45',to_date('2016-05-10','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('2','3','100',to_date('2016-05-13','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('3','1','230',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('4','2','10',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('5','5','205',to_date('2016-05-14','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('6','4','13',to_date('2016-05-15','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('7','3','220',to_date('2016-05-15','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('8','5','545',to_date('2016-05-16','yyyy-mm-dd'));
>>> INSERT INTO access_log (aid,site_id,count,col_date)
VALUES ('9','3','201',to_date('2016-05-17','yyyy-mm-dd'));

查看数据

>>> select * from access_log
+-----+---------+-------+------------+
| aid | site_id | count | col_date   |
+-----+---------+-------+------------+
|   1 |       1 |    45 | 2016-05-10 |
|   2 |       3 |   100 | 2016-05-13 |
|   3 |       1 |   230 | 2016-05-14 |
|   4 |       2 |    10 | 2016-05-14 |
|   5 |       5 |   205 | 2016-05-14 |
|   6 |       4 |    13 | 2016-05-15 |
|   7 |       3 |   220 | 2016-05-15 |
|   8 |       5 |   545 | 2016-05-16 |
|   9 |       3 |   201 | 2016-05-17 |
+-----+---------+-------+------------+

创建第三张表

>>> CREATE TABLE apps
(
id integer PRIMARY KEY,
app_name varchar(255),
url varchar(255),
country varchar(255)
);

插入数据

>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('1','QQ APP','http://im.qq.com/','CN');
>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('2','微博 APP','http://weibo.com/','CN');
>>> INSERT INTO apps (id,app_name,url,country)
VALUES ('3','淘宝 APP','https://www.taobao.com/','CN');

查看表

>>> select * from apps
+----+------------+-------------------------+---------+
| id | app_name   | url                     | country |
+----+------------+-------------------------+---------+
|  1 | QQ APP     | http://im.qq.com/       | CN      |
|  2 | 微博 APP   | http://weibo.com/       | CN      |
|  3 | 淘宝 APP   | https://www.taobao.com/ | CN      |
+----+------------+-------------------------+---------+

至此,表创建完毕。

用SQLAclemy ORM连接数据库

打开python交互,开始编写程序

>>> import  os
>>> os.environ["PATH"] += "D:\\oracle"#添加环境变量,是Oracle数据库客户端绿色版实例zip文件的解压目录,注意这里不是sys.path,“sys.path”是Python在import时搜索包的路径。
>>> os.environ["NLS_LANG"] = "SIMPLIFIED CHINESE_CHINA.UTF8"#解决Oracle编码问题
>>> from sqlalchemy import create_engine
>>> from sqlalchemy import MetaData, Table
>>> from sqlalchemy.orm import sessionmaker
>>> import pandas as pd
>>> from sqlalchemy import func

创建数据库连接(只是创建连接,还没有绑定到MetaData)
注:这里展示的是连接ORACLE,连接其他数据库的格式基本相同,详见SQLAclemy文档。
如果要as sysdba登录,换成'oracle://abc:123@127.0.0.1:1521/orcl?mode=SYSDBA'

>>> engine = create_engine('oracle://abc:123@127.0.0.1:1521/orcl', encoding='utf8', convert_unicode=True, echo = False)

准备反射已有表,创建元数据对象(数据库连接绑定到MetaData)

>>> metadata = MetaData(bind=engine)

反射整个数据库(这才真正开始操作数据库)

>>> metadata.reflect()
>>> print(metadata.tables.keys())#获取所有的表名
dict_keys(['apps', 'access_log', 'websites'])

反射单表

>>> #反射表一
>>> web_t = Table('websites', metadata, autoload=True, autoload_with=engine)
>>> web = web_t.alias('web')#起个别名
>>> #反射表二
>>> acc_t = Table('access_log', metadata, autoload=True,autoload_with=engine)
>>> acc = acc_t.alias('acc')#起个别名
>>> #反射表三
>>> app_t = Table('apps', metadata, autoload=True,autoload_with=engine)
>>> app = app_t.alias('app')#起个别名

>>> # 列出所有的字段名
>>> web_col_name = [eval('u'+repr(i)) for i in web.columns.keys()]
>>> acc_col_name = [eval('u'+repr(i)) for i in acc.columns.keys()]
>>> app_col_name = [eval('u'+repr(i)) for i in app.columns.keys()]
>>> print(web_col_name,acc_col_name,app_col_name)
['id', 'name', 'url', 'alexa', 'country'] ['aid', 'site_id', 'count', 'col_date'] ['id', 'app_name', 'url', 'country']

这样起别名也是一样的,缺点是多导了个函数:

from sqlalchemy.orm import aliased
web = aliased(web_t,name='web')

关于跨库查询:
假设我的数据库实例ORCL中还有另外两个用户,'def','ghi'。
'def'下有表'A','ghi'下有表'B',用户‘abc’对'def','ghi'下的表'A'、'B'拥有查询的权限。
只需这么做:

from sqlalchemy.ext.automap import automap_base
autobase = automap_base()
sc = ['def','ghi']
for i in sc:
    autobase.metadata.reflect(engine, schema=i)
autobase.prepare()
autobase.metadata.bind = engine
A_t = Table('def.A', autobase.metadata, autoload=True, autoload_with=engine)
B_t = Table('ghi.B', autobase.metadata, autoload=True, autoload_with=engine)
autobase.metadata.reflect(views=True)#views=True可以把视图的表也弄进来
web_t = Table('websites', autobase.metadata, autoload=True, autoload_with=engine)

这样就能在一次查询中同时使用web_t、A_t、B_t这三张表了,别忘了起别名,不然容易报“标识符过长”的错误。

============基础查询============

ORM对表查询的操作,需要先创建一个全局session实例,通过这个session向数据库提交语句。

>>>Session = sessionmaker()
>>>session = Session()

别名

表别名:反射表的时候已经介绍了给表起别名的操作,翻译成SQL是这样的:

>>> SQL = '''SELECT w.name FROM websites w'''
>>> pd.read_sql(SQL,engine)
            name
0         Google
1             淘宝
2           菜鸟教程
3             微博
4       Facebook
5             百度
6  stackoverflow

下面讲字段别名
原生SQL 给字段起别名

>>> SQL = "SELECT name AS n, country AS c FROM Websites"
>>> pd.read_sql(SQL,engine)
               n    c
0         Google  USA
1             淘宝   CN
2           菜鸟教程   CN
3             微博   CN
4       Facebook  USA
5             百度   CN
6  stackoverflow  IND

ORM 给字段起别名

>>> ret = session.query(web.c.name.label('n'),web.c.country.label('c'))
>>> pd.DataFrame(ret.all())
               n    c
0         Google  USA
1             淘宝   CN
2           菜鸟教程   CN
3             微博   CN
4       Facebook  USA
5             百度   CN
6  stackoverflow  IND

查询整张表

SELECT 语句

原生SQL

>>> SQL = 'SELECT * FROM Websites'
>>> pd.read_sql(SQL,engine)
   id           name                        url  alexa country
0   1         Google     https://www.google.cm/      1     USA
1   2             淘宝    https://www.taobao.com/     13      CN
2   3           菜鸟教程     http://www.runoob.com/   4689      CN
3   4             微博          http://weibo.com/     20      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   6             百度     https://www.baidu.com/      4      CN
6   7  stackoverflow  http://stackoverflow.com/      0     IND

ORM

>>> ret = session.query(web)
>>> pd.DataFrame(ret.all())
   id           name                        url  alexa country
0   1         Google     https://www.google.cm/      1     USA
1   2             淘宝    https://www.taobao.com/     13      CN
2   3           菜鸟教程     http://www.runoob.com/   4689      CN
3   4             微博          http://weibo.com/     20      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   6             百度     https://www.baidu.com/      4      CN
6   7  stackoverflow  http://stackoverflow.com/      0     IND
>>> pd.read_sql("access_log",engine)
   aid  site_id  count   col_date
0    1        1     45 2016-05-10
1    2        3    100 2016-05-13
2    3        1    230 2016-05-14
3    4        2     10 2016-05-14
4    5        5    205 2016-05-14
5    6        4     13 2016-05-15
6    7        3    220 2016-05-15
7    8        5    545 2016-05-16
8    9        3    201 2016-05-17

查询部分表

ORM

>>> ret.first()
(1, 'Google', 'https://www.google.cm/', 1, 'USA')
>>> ret = session.query(web).slice(1,3)
>>> pd.DataFrame(ret.all())
   id  name                      url  alexa country
0   2    淘宝  https://www.taobao.com/     13      CN
1   3  菜鸟教程   http://www.runoob.com/   4689      CN

带条件查询

WHERE 子句

原生SQL

>>> SQL = "SELECT url,alexa FROM Websites WHERE name='淘宝'"
>>> pd.read_sql(SQL,engine)
                       url  alexa
0  https://www.taobao.com/     13

ORM 用 表别名对象.c.字段名 访问字段

>>> ret = session.query(*[web.c.url,web.c.alexa]).filter(web.c.name == '淘宝')
>>> pd.DataFrame(ret.all())
                       url  alexa
0  https://www.taobao.com/     13

原生SQL

>>> SQL = "SELECT url,alexa FROM Websites WHERE id>2 AND country = 'CN'"
>>> pd.read_sql(SQL,engine)
                      url  alexa
0  http://www.runoob.com/   4689
1       http://weibo.com/     20
2  https://www.baidu.com/      4

ORM

>>> ret = session.query(web.c.url,web.c.alexa).filter(web.c.id > 2, web.c.country == "CN")
>>> pd.DataFrame(ret.all())
                      url  alexa
0  http://www.runoob.com/   4689
1       http://weibo.com/     20
2  https://www.baidu.com/      4

多条件查询

先导入几个必要的函数

>>> from sqlalchemy import or_, and_, case

AND OR

原生SQL

>>> SQL = "SELECT * FROM Websites WHERE id<=2 or id>2 AND country = 'CN'"
>>> pd.read_sql(SQL,engine)
   id    name                      url  alexa country
0   1  Google   https://www.google.cm/      1     USA
1   2      淘宝  https://www.taobao.com/     13      CN
2   3    菜鸟教程   http://www.runoob.com/   4689      CN
3   4      微博        http://weibo.com/     20      CN
4   6      百度   https://www.baidu.com/      4      CN

ORM

>>> ret = session.query(web).filter(or_(web.c.id<=2, and_(web.c.id>2, web.c.country == "CN")))
>>> pd.DataFrame(ret.all())
   id    name                      url  alexa country
0   1  Google   https://www.google.cm/      1     USA
1   2      淘宝  https://www.taobao.com/     13      CN
2   3    菜鸟教程   http://www.runoob.com/   4689      CN
3   4      微博        http://weibo.com/     20      CN
4   6      百度   https://www.baidu.com/      4      CN

LIKE/NOT LIKE 与 通配符%

原生SQL

>>> SQL = "SELECT * FROM Websites WHERE url NOT LIKE '%.com%'"
>>> pd.read_sql(SQL,engine)
   id    name                     url  alexa country
0   1  Google  https://www.google.cm/      1     USA

ORM “~”代表反

>>> ret = session.query(web).filter(~web.c.url.like("%.com%"))
>>> pd.DataFrame(ret.all())
   id    name                     url  alexa country
0   1  Google  https://www.google.cm/      1     USA
>>> ret = session.query(web).filter(~web.c.url.ilike("%.com%"))#不区分大小写用ilike
>>> pd.DataFrame(ret.all())
   id    name                     url  alexa country
0   1  Google  https://www.google.cm/      1     USA

IN/NOT IN

原生SQL (要NOT IN只要在前面加 “NOT”)

>>> SQL = "SELECT * FROM Websites WHERE name IN ('Google','菜鸟教程')"
>>> pd.read_sql(SQL,engine)
   id    name                     url  alexa country
0   1  Google  https://www.google.cm/      1     USA
1   3    菜鸟教程  http://www.runoob.com/   4689      CN

ORM (要not in 只要在前面加 “~”)

>>> ret = session.query(web).filter(web.c.name.in_(['Google','菜鸟教程']))
>>> pd.DataFrame(ret.all())
   id    name                     url  alexa country
0   1  Google  https://www.google.cm/      1     USA
1   3    菜鸟教程  http://www.runoob.com/   4689      CN

IS NULL/IS NOT NULL

原生SQL

>>> SQL = "SELECT * FROM Websites WHERE name IS NULL"
>>> pd.read_sql(SQL,engine)
Empty DataFrame
Columns: [id, name, url, alexa, country]
Index: []

ORM

>>> ret = session.query(web).filter(web.c.name.is_(None))
>>> pd.DataFrame(ret.all())
Empty DataFrame
Columns: []
Index: []
>>> ret = session.query(web).filter(web.c.name == None)#或者这样
>>> pd.DataFrame(ret.all())
Empty DataFrame
Columns: []
Index: []

原生SQL

>>> SQL = "SELECT * FROM Websites WHERE name IS NOT NULL"
>>> pd.read_sql(SQL,engine)
   id           name                        url  alexa country
0   1         Google     https://www.google.cm/      1     USA
1   2             淘宝    https://www.taobao.com/     13      CN
2   3           菜鸟教程     http://www.runoob.com/   4689      CN
3   4             微博          http://weibo.com/     20      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   6             百度     https://www.baidu.com/      4      CN
6   7  stackoverflow  http://stackoverflow.com/      0     IND

ORM

>>> ret = session.query(web).filter(web.c.name.isnot(None))
>>> pd.DataFrame(ret.all())
   id           name                        url  alexa country
0   1         Google     https://www.google.cm/      1     USA
1   2             淘宝    https://www.taobao.com/     13      CN
2   3           菜鸟教程     http://www.runoob.com/   4689      CN
3   4             微博          http://weibo.com/     20      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   6             百度     https://www.baidu.com/      4      CN
6   7  stackoverflow  http://stackoverflow.com/      0     IND
>>> ret = session.query(web).filter(web.c.name != None)#或者这样
>>> pd.DataFrame(ret.all())
   id           name                        url  alexa country
0   1         Google     https://www.google.cm/      1     USA
1   2             淘宝    https://www.taobao.com/     13      CN
2   3           菜鸟教程     http://www.runoob.com/   4689      CN
3   4             微博          http://weibo.com/     20      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   6             百度     https://www.baidu.com/      4      CN
6   7  stackoverflow  http://stackoverflow.com/      0     IND

BETWEEN

原生SQL

>>> SQL = "SELECT * FROM Websites WHERE alexa BETWEEN 1 AND 20"
>>> pd.read_sql(SQL,engine)
   id      name                        url  alexa country
0   1    Google     https://www.google.cm/      1     USA
1   2        淘宝    https://www.taobao.com/     13      CN
2   4        微博          http://weibo.com/     20      CN
3   5  Facebook  https://www.facebook.com/      3     USA
4   6        百度     https://www.baidu.com/      4      CN

ORM

>>> ret = session.query(web).filter(web.c.alexa.between(1, 20))
>>> pd.DataFrame(ret.all())
   id      name                        url  alexa country
0   1    Google     https://www.google.cm/      1     USA
1   2        淘宝    https://www.taobao.com/     13      CN
2   4        微博          http://weibo.com/     20      CN
3   5  Facebook  https://www.facebook.com/      3     USA
4   6        百度     https://www.baidu.com/      4      CN

CASE WHEN语句

原生SQL

>>> SQL = '''SELECT name,CASE
... WHEN alexa BETWEEN 1 AND 10 THEN '1 to 10'
... WHEN alexa BETWEEN 11 AND 100 THEN '11 to 100'
... ELSE 'morethen 100'
... END AS class
... FROM Websites'''
>>> pd.read_sql(SQL,engine)
            name         class
0         Google       1 to 10
1             淘宝     11 to 100
2           菜鸟教程  morethen 100
3             微博     11 to 100
4       Facebook       1 to 10
5             百度       1 to 10
6  stackoverflow  morethen 100

ORM

>>> ret = session.query(web.c.name,case([(web.c.alexa.between(1,10),'1 to 10'),(web.c.alexa.between(11,100),'11 to 100')],else_='morethen 100').label('class'))
>>> pd.DataFrame(ret.all())
            name         class
0         Google       1 to 10
1             淘宝     11 to 100
2           菜鸟教程  morethen 100
3             微博     11 to 100
4       Facebook       1 to 10
5             百度       1 to 10
6  stackoverflow  morethen 100

数据整理查询

去重 SELECT DISTINCT

>>> from sqlalchemy import distinct

原生SQL

>>> SQL = "SELECT DISTINCT country FROM Websites"
>>> pd.read_sql(SQL,engine)
  country
0     USA
1     IND
2      CN

ORM

>>> ret = session.query(web.c.country).distinct()
>>> pd.DataFrame(ret.all())
  country
0     USA
1     IND
2      CN

或者

>>> ret = session.query(distinct(web.c.country).label('country'))
>>> pd.DataFrame(ret.all())
  country
0     USA
1     IND
2      CN

排序 ORDER BY

原生SQL

>>> SQL = "SELECT * FROM Websites ORDER BY alexa ASC"#ASC正序
>>> SQL = "SELECT * FROM Websites ORDER BY alexa DESC"#DESC倒序
>>> pd.read_sql(SQL,engine)
   id           name                        url  alexa country
0   3           菜鸟教程     http://www.runoob.com/   4689      CN
1   4             微博          http://weibo.com/     20      CN
2   2             淘宝    https://www.taobao.com/     13      CN
3   6             百度     https://www.baidu.com/      4      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   1         Google     https://www.google.cm/      1     USA
6   7  stackoverflow  http://stackoverflow.com/      0     IND

ORM

>>> ret = session.query(web).order_by(web.c.alexa.asc())# 正序
>>> ret = session.query(web).order_by(web.c.alexa.desc())# 倒序
>>> pd.DataFrame(ret.all())
   id           name                        url  alexa country
0   3           菜鸟教程     http://www.runoob.com/   4689      CN
1   4             微博          http://weibo.com/     20      CN
2   2             淘宝    https://www.taobao.com/     13      CN
3   6             百度     https://www.baidu.com/      4      CN
4   5       Facebook  https://www.facebook.com/      3     USA
5   1         Google     https://www.google.cm/      1     USA
6   7  stackoverflow  http://stackoverflow.com/      0     IND

连表 JOIN ON

原生SQL INNER JOIN 找到符合Websites.id=access_log.site_id的所有一一对应的方案

>>> SQL = """SELECT * FROM Websites JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
   id      name                        url  alexa country  aid  site_id  count   col_date
0   1    Google     https://www.google.cm/      1     USA    1        1     45 2016-05-10
1   1    Google     https://www.google.cm/      1     USA    3        1    230 2016-05-14
2   2        淘宝    https://www.taobao.com/     13      CN    4        2     10 2016-05-14
3   3      菜鸟教程     http://www.runoob.com/   4689      CN    2        3    100 2016-05-13
4   3      菜鸟教程     http://www.runoob.com/   4689      CN    9        3    201 2016-05-17
5   3      菜鸟教程     http://www.runoob.com/   4689      CN    7        3    220 2016-05-15
6   4        微博          http://weibo.com/     20      CN    6        4     13 2016-05-15
7   5  Facebook  https://www.facebook.com/      3     USA    8        5    545 2016-05-16
8   5  Facebook  https://www.facebook.com/      3     USA    5        5    205 2016-05-14
>>> SQL = """SELECT * FROM Websites INNER JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
   id      name                        url  alexa country  aid  site_id  count   col_date
0   1    Google     https://www.google.cm/      1     USA    1        1     45 2016-05-10
1   1    Google     https://www.google.cm/      1     USA    3        1    230 2016-05-14
2   2        淘宝    https://www.taobao.com/     13      CN    4        2     10 2016-05-14
3   3      菜鸟教程     http://www.runoob.com/   4689      CN    2        3    100 2016-05-13
4   3      菜鸟教程     http://www.runoob.com/   4689      CN    9        3    201 2016-05-17
5   3      菜鸟教程     http://www.runoob.com/   4689      CN    7        3    220 2016-05-15
6   4        微博          http://weibo.com/     20      CN    6        4     13 2016-05-15
7   5  Facebook  https://www.facebook.com/      3     USA    8        5    545 2016-05-16
8   5  Facebook  https://www.facebook.com/      3     USA    5        5    205 2016-05-14

ORM INNER JOIN

>>> ret = session.query(web, acc).filter(web.c.id == acc.c.site_id)#通过filter实现的inner join
>>> pd.DataFrame(ret.all())
   id      name                        url  alexa country  aid  site_id  count   col_date
0   1    Google     https://www.google.cm/      1     USA    1        1     45 2016-05-10
1   1    Google     https://www.google.cm/      1     USA    3        1    230 2016-05-14
2   2        淘宝    https://www.taobao.com/     13      CN    4        2     10 2016-05-14
3   3      菜鸟教程     http://www.runoob.com/   4689      CN    2        3    100 2016-05-13
4   3      菜鸟教程     http://www.runoob.com/   4689      CN    9        3    201 2016-05-17
5   3      菜鸟教程     http://www.runoob.com/   4689      CN    7        3    220 2016-05-15
6   4        微博          http://weibo.com/     20      CN    6        4     13 2016-05-15
7   5  Facebook  https://www.facebook.com/      3     USA    8        5    545 2016-05-16
8   5  Facebook  https://www.facebook.com/      3     USA    5        5    205 2016-05-14
>>> ret = session.query(web, acc).join(acc,web.c.id == acc.c.site_id)#临时定义的inner join
>>> pd.DataFrame(ret.all())
   id      name                        url  alexa country  aid  site_id  count   col_date
0   1    Google     https://www.google.cm/      1     USA    1        1     45 2016-05-10
1   1    Google     https://www.google.cm/      1     USA    3        1    230 2016-05-14
2   2        淘宝    https://www.taobao.com/     13      CN    4        2     10 2016-05-14
3   3      菜鸟教程     http://www.runoob.com/   4689      CN    2        3    100 2016-05-13
4   3      菜鸟教程     http://www.runoob.com/   4689      CN    9        3    201 2016-05-17
5   3      菜鸟教程     http://www.runoob.com/   4689      CN    7        3    220 2016-05-15
6   4        微博          http://weibo.com/     20      CN    6        4     13 2016-05-15
7   5  Facebook  https://www.facebook.com/      3     USA    8        5    545 2016-05-16
8   5  Facebook  https://www.facebook.com/      3     USA    5        5    205 2016-05-14
>>> ret = session.query(web).join(acc)#通过外键的inner left join
>>> pd.DataFrame(ret.all())
   id      name                        url  alexa country
0   1    Google     https://www.google.cm/      1     USA
1   1    Google     https://www.google.cm/      1     USA
2   2        淘宝    https://www.taobao.com/     13      CN
3   3      菜鸟教程     http://www.runoob.com/   4689      CN
4   3      菜鸟教程     http://www.runoob.com/   4689      CN
5   3      菜鸟教程     http://www.runoob.com/   4689      CN
6   4        微博          http://weibo.com/     20      CN
7   5  Facebook  https://www.facebook.com/      3     USA
8   5  Facebook  https://www.facebook.com/      3     USA
>>> ret = session.query(acc).join(web)#通过外键的inner right join
>>> pd.DataFrame(ret.all())
   aid  site_id  count   col_date
0    1        1     45 2016-05-10
1    2        3    100 2016-05-13
2    3        1    230 2016-05-14
3    4        2     10 2016-05-14
4    5        5    205 2016-05-14
5    6        4     13 2016-05-15
6    7        3    220 2016-05-15
7    8        5    545 2016-05-16
8    9        3    201 2016-05-17

原生SQL LEFT JOIN ,右表没有的记录也显示

>>> SQL = """SELECT * FROM Websites LEFT JOIN access_log ON Websites.id=access_log.site_id"""
>>> pd.read_sql(SQL,engine)
    id           name                        url  alexa country  aid  site_id  count   col_date
0    1         Google     https://www.google.cm/      1     USA  1.0      1.0   45.0 2016-05-10
1    1         Google     https://www.google.cm/      1     USA  3.0      1.0  230.0 2016-05-14
2    2             淘宝    https://www.taobao.com/     13      CN  4.0      2.0   10.0 2016-05-14
3    3           菜鸟教程     http://www.runoob.com/   4689      CN  2.0      3.0  100.0 2016-05-13
4    3           菜鸟教程     http://www.runoob.com/   4689      CN  9.0      3.0  201.0 2016-05-17
5    3           菜鸟教程     http://www.runoob.com/   4689      CN  7.0      3.0  220.0 2016-05-15
6    4             微博          http://weibo.com/     20      CN  6.0      4.0   13.0 2016-05-15
7    5       Facebook  https://www.facebook.com/      3     USA  8.0      5.0  545.0 2016-05-16
8    5       Facebook  https://www.facebook.com/      3     USA  5.0      5.0  205.0 2016-05-14
9    6             百度     https://www.baidu.com/      4      CN  NaN      NaN    NaN        NaT
10   7  stackoverflow  http://stackoverflow.com/      0     IND  NaN      NaN    NaN        NaT

ORM 指定isouter=True相当于left join

>>> ret = session.query(web,acc).join(acc, isouter=True)#通过外键的left join
>>> pd.DataFrame(ret.all())
    id           name                        url  alexa country  aid  site_id  count   col_date
0    1         Google     https://www.google.cm/      1     USA  1.0      1.0   45.0 2016-05-10
1    1         Google     https://www.google.cm/      1     USA  3.0      1.0  230.0 2016-05-14
2    2             淘宝    https://www.taobao.com/     13      CN  4.0      2.0   10.0 2016-05-14
3    3           菜鸟教程     http://www.runoob.com/   4689      CN  2.0      3.0  100.0 2016-05-13
4    3           菜鸟教程     http://www.runoob.com/   4689      CN  9.0      3.0  201.0 2016-05-17
5    3           菜鸟教程     http://www.runoob.com/   4689      CN  7.0      3.0  220.0 2016-05-15
6    4             微博          http://weibo.com/     20      CN  6.0      4.0   13.0 2016-05-15
7    5       Facebook  https://www.facebook.com/      3     USA  8.0      5.0  545.0 2016-05-16
8    5       Facebook  https://www.facebook.com/      3     USA  5.0      5.0  205.0 2016-05-14
9    6             百度     https://www.baidu.com/      4      CN  NaN      NaN    NaN        NaT
10   7  stackoverflow  http://stackoverflow.com/      0     IND  NaN      NaN    NaN        NaT
>>> ret = session.query(web,acc).join(acc,full=True)#通过外键的full join
>>> pd.DataFrame(ret.all())
    id           name                        url  alexa country  aid  site_id  count   col_date
0    1         Google     https://www.google.cm/      1     USA  1.0      1.0   45.0 2016-05-10
1    3           菜鸟教程     http://www.runoob.com/   4689      CN  2.0      3.0  100.0 2016-05-13
2    1         Google     https://www.google.cm/      1     USA  3.0      1.0  230.0 2016-05-14
3    2             淘宝    https://www.taobao.com/     13      CN  4.0      2.0   10.0 2016-05-14
4    5       Facebook  https://www.facebook.com/      3     USA  5.0      5.0  205.0 2016-05-14
5    4             微博          http://weibo.com/     20      CN  6.0      4.0   13.0 2016-05-15
6    3           菜鸟教程     http://www.runoob.com/   4689      CN  7.0      3.0  220.0 2016-05-15
7    5       Facebook  https://www.facebook.com/      3     USA  8.0      5.0  545.0 2016-05-16
8    3           菜鸟教程     http://www.runoob.com/   4689      CN  9.0      3.0  201.0 2016-05-17
9    6             百度     https://www.baidu.com/      4      CN  NaN      NaN    NaN        NaT
10   7  stackoverflow  http://stackoverflow.com/      0     IND  NaN      NaN    NaN        NaT

多表联合 UNION

原生SQL UNION 从 "Websites" 和 "apps" 表中选取所有不同的country(只有不同的值):

>>> SQL = '''SELECT country FROM Websites
... UNION
... SELECT country FROM apps
... ORDER BY country'''
>>> pd.read_sql(SQL,engine)
  country
0      CN
1     IND
2     USA

ORM

>>> ret = session.query(web.c.country).union(session.query(app.c.country))
>>> pd.DataFrame(ret.all())
  country
0      CN
1     IND
2     USA

原生SQL 使用 UNION ALL 从 "Websites" 和 "apps" 表中选取所有的country(也有重复的值):

>>> SQL = '''SELECT country FROM Websites
... UNION ALL
... SELECT country FROM apps
... ORDER BY country'''
>>> pd.read_sql(SQL,engine)
  country
0      CN
1      CN
2      CN
3      CN
4      CN
5      CN
6      CN
7     IND
8     USA
9     USA

ORM

>>> ret = session.query(web.c.country).union_all(session.query(app.c.country))
>>> pd.DataFrame(ret.all())
  country
0     USA
1      CN
2      CN
3      CN
4     USA
5      CN
6     IND
7      CN
8      CN
9      CN
>>> #cte()相当于ORACLE的WITH AS,比上面的方法,多创建了临时表,避免重复多次查询相同内容,提高性能。
>>> sub = session.query(web.c.country).union_all(session.query(app.c.country)).cte("temp_table")
>>> pd.DataFrame(session.query(sub).all())
  web_country
0         USA
1          CN
2          CN
3          CN
4         USA
5          CN
6         IND
7          CN
8          CN
9          CN

合并 CONCAT()

原生SQL

>>> SQL = "SELECT CONCAT(id, name) FROM Websites"
>>> pd.read_sql(SQL,engine)
  CONCAT(ID,NAME)
0         1Google
1             2淘宝
2           3菜鸟教程
3             4微博
4       5Facebook
5             6百度
6  7stackoverflow

ORM

>>> ret = session.query(func.concat(web.c.id, web.c.name).label('id,name'))
>>> pd.DataFrame(ret.all())
          id,name
0         1Google
1             2淘宝
2           3菜鸟教程
3             4微博
4       5Facebook
5             6百度
6  7stackoverflow

原生SQL

>>> SQL = "SELECT CONCAT(CONCAT(id, name), url) FROM Websites"
>>> pd.read_sql(SQL,engine)
               CONCAT(CONCAT(ID,NAME),URL)
0            1Googlehttps://www.google.cm/
1               2淘宝https://www.taobao.com/
2              3菜鸟教程http://www.runoob.com/
3                     4微博http://weibo.com/
4       5Facebookhttps://www.facebook.com/
5                6百度https://www.baidu.com/
6  7stackoverflowhttp://stackoverflow.com/

ORM

>>> ret = session.query(func.concat(func.concat(web.c.id, web.c.name),web.c.url).label('id,name,url'))
>>> pd.DataFrame(ret.all())
                               id,name,url
0            1Googlehttps://www.google.cm/
1               2淘宝https://www.taobao.com/
2              3菜鸟教程http://www.runoob.com/
3                     4微博http://weibo.com/
4       5Facebookhttps://www.facebook.com/
5                6百度https://www.baidu.com/
6  7stackoverflowhttp://stackoverflow.com/

高级查询

统计函数

平均 AVG()

原生SQL 选择访问量高于平均访问量的 "site_id" 和 "count":

>>> SQL = "SELECT site_id, count FROM access_log WHERE count > (SELECT AVG(count) FROM access_log)"
>>> pd.read_sql(SQL,engine)
   site_id  count
0        1    230
1        5    205
2        3    220
3        5    545
4        3    201

ORM

>>> ret = session.query(acc.c.site_id,acc.c.count).filter(acc.c.count>session.query(func.avg(acc.c.count)))
>>> pd.DataFrame(ret.all())
   site_id  count
0        1    230
1        5    205
2        3    220
3        5    545
4        3    201

计数 COUNT()

原生SQL

>>> SQL = "SELECT count(*) FROM access_log"
>>> pd.read_sql(SQL,engine)
   COUNT(*)
0         9

ORM

>>> session.query(acc).count()#一个简便方法
9
>>> session.query(func.count('*')).select_from(acc).scalar()#常规方法
9
>>> session.query(func.count(acc.c.aid)).scalar()#不用select_from()用主键acc.c.aid
9

原生SQL 计算 "access_log" 表中不同 site_id 的记录数:

>>> SQL = "SELECT COUNT(DISTINCT site_id) AS nums FROM access_log"
>>> pd.read_sql(SQL,engine)
   nums
0     5

ORM

>>> session.query(distinct(acc.c.site_id)).count()#一个简便方法
5
>>> ret = session.query(func.count(distinct(acc.c.site_id)).label('nums'))
>>> pd.DataFrame(ret.all())
   nums
0     5
>>> #顺便介绍下子查询
>>> retSub = session.query(acc.c.site_id).distinct().subquery('t')
>>> ret = session.query(func.count(retSub.c.site_id).label('nums'))
>>> pd.DataFrame(ret.all())
   nums
0     5

最大值 MAX()

原生SQL

>>> SQL = "SELECT MAX(alexa) AS max_alexa FROM Websites"
>>> pd.read_sql(SQL,engine)
   max_alexa
0       4689

ORM

>>> ret = session.query(func.max(web.c.alexa).label('max_alexa'))
>>> pd.DataFrame(ret.all())
   max_alexa
0       4689

最小值MIN()

原生SQL

>>> SQL = "SELECT MIN(alexa) AS min_alexa FROM Websites"
>>> pd.read_sql(SQL,engine)
   min_alexa
0          0

ORM

>>> ret = session.query(func.min(web.c.alexa).label('min_alexa'))
>>> pd.DataFrame(ret.all())
   min_alexa
0          0

求和 SUM()

原生SQL

>>> SQL = "SELECT SUM(count) AS nums FROM access_log"
>>> pd.read_sql(SQL,engine)
   nums
0  1569

ORM

>>> ret = session.query(func.sum(acc.c.count).label("nums"))
>>> pd.DataFrame(ret.all())
   nums
0  1569

格式化

原生SQL

>>> SQL = "SELECT aid, site_id, count, to_char(col_date,'yyyymm') as ym FROM access_log"#将字符串转日期用to_date
>>> pd.read_sql(SQL,engine)
   aid  site_id  count      ym
0    1        1     45  201605
1    2        3    100  201605
2    3        1    230  201605
3    4        2     10  201605
4    5        5    205  201605
5    6        4     13  201605
6    7        3    220  201605
7    8        5    545  201605
8    9        3    201  201605

ORM

>>> ret = session.query(acc.c.aid, acc.c.site_id, acc.c.count,
...                     func.to_char(acc.c.col_date,'yyyymm').label("ym"))
>>> pd.DataFrame(ret.all())
   aid  site_id  count      ym
0    1        1     45  201605
1    2        3    100  201605
2    3        1    230  201605
3    4        2     10  201605
4    5        5    205  201605
5    6        4     13  201605
6    7        3    220  201605
7    8        5    545  201605
8    9        3    201  201605

分组聚合 GROUP BY HAVING

原生SQL

>>> SQL = """SELECT site_id, to_char(col_date,'yyyymm') as ym,SUM(count) as sc
 FROM access_log GROUP BY site_id, to_char(col_date,'yyyymm')"""
>>> pd.read_sql(SQL,engine)
   site_id      ym   sc
0        1  201605  275
1        2  201605   10
2        3  201605  521
3        5  201605  750
4        4  201605   13

ORM

>>> cols = [acc.c.site_id,func.to_char(acc.c.col_date,'yyyymm').label('ym')]
>>> ret = session.query(*cols,func.sum(acc.c.count).label('sc')).group_by(*cols)
>>> pd.DataFrame(ret.all())
   site_id      ym   sc
0        1  201605  275
1        2  201605   10
2        3  201605  521
3        5  201605  750
4        4  201605   13

注意:不要写成如下形式:

#这是不对的,同样的代码不能写第二遍
session.query(acc.c.site_id,func.to_char(acc.c.col_date,'yyyymm').label('ym'),
                       func.sum(acc.c.count).label('sc')).group_by(acc.c.site_id,
                              func.to_char(acc.c.col_date,'yyyymm').label('ym'))

原生SQL

>>> SQL = """SELECT Websites.name, SUM(access_log.count) AS nums FROM Websites
... INNER JOIN access_log
... ON Websites.id=access_log.site_id
... WHERE Websites.alexa < 200
... GROUP BY Websites.name
... HAVING SUM(access_log.count) > 200"""
>>> pd.read_sql(SQL,engine)
       name  nums
0    Google   275
1  Facebook   750

ORM

>>> #隐式外键关联
>>> ret = session.query(web.c.name,func.sum(acc.c.count).label("nums")).\
... join(acc).filter(web.c.alexa < 200).\
... group_by(web.c.name).having(func.sum(acc.c.count) > 200)
>>> pd.DataFrame(ret.all())
       name  nums
0    Google   275
1  Facebook   750
>>> #显式外键关联
>>> ret = session.query(web.c.name,func.sum(acc.c.count).label("nums")).\
... join(acc).filter(web.c.id==acc.c.site_id,web.c.alexa < 200).\
... group_by(web.c.name).having(func.sum(acc.c.count) > 200)
>>> pd.DataFrame(ret.all())
       name  nums
0    Google   275
1  Facebook   750

EXISTS/NOT EXISTS

原生SQL EXISTS

>>> SQL = "SELECT Websites.name, Websites.url FROM Websites \
... WHERE EXISTS (SELECT count FROM access_log WHERE Websites.id = access_log.site_id AND count > 200)"
>>> pd.read_sql(SQL,engine)
       name                        url
0    Google     https://www.google.cm/
1      菜鸟教程     http://www.runoob.com/
2  Facebook  https://www.facebook.com/

ORM

>>> ret = session.query().filter(web.c.id==acc.c.site_id,acc.c.count>200).exists()
>>> ret = session.query(web.c.name,web.c.url).filter(ret)
>>> pd.DataFrame(ret.all())
       name                        url
0    Google     https://www.google.cm/
1      菜鸟教程     http://www.runoob.com/
2  Facebook  https://www.facebook.com/
>>> # 另一种方法,更接近原生SQL
>>> from sqlalchemy.sql import exists
>>> ret = exists([acc.c.count]).where(and_(web.c.id==acc.c.site_id,acc.c.count>200))
>>> ret = session.query(web.c.name,web.c.url).filter(ret)
>>> pd.DataFrame(ret.all())
       name                        url
0    Google     https://www.google.cm/
1      菜鸟教程     http://www.runoob.com/
2  Facebook  https://www.facebook.com/

原生SQL NOT EXISTS

>>> SQL = "SELECT Websites.name, Websites.url FROM Websites \
... WHERE NOT EXISTS (SELECT count FROM access_log WHERE Websites.id = access_log.site_id AND count > 200)"
>>> pd.read_sql(SQL,engine)
            name                        url
0             淘宝    https://www.taobao.com/
1             微博          http://weibo.com/
2             百度     https://www.baidu.com/
3  stackoverflow  http://stackoverflow.com/

ORM

>>> ret = session.query().filter(web.c.id==acc.c.site_id,acc.c.count>200).exists()
>>> ret = session.query(web.c.name,web.c.url).filter(~ret)
>>> pd.DataFrame(ret.all())
            name                        url
0             淘宝    https://www.taobao.com/
1             微博          http://weibo.com/
2             百度     https://www.baidu.com/
3  stackoverflow  http://stackoverflow.com/

(完)
参考资料 https://www.runoob.com/sql/sql-tutorial.html

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,185评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,445评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,684评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,564评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,681评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,874评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,025评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,761评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,217评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,545评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,694评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,351评论 4 332
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,988评论 3 315
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,778评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,007评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,427评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,580评论 2 349

推荐阅读更多精彩内容