Adventure项目分析（一）

本文是对Adventure项目案例的分析总结，主要使用jupyter进行数据处理，将处理好的数据存储到数据库中，连接到PowerBI实现可视化。

项目目录

项目简介
分析思路与过程
制作PPT

一、项目简介

公司业务简介

Adventure Works Cycle是国内一家制造公司，该公司生产和销售金属和复合材料自行车在全国各个市场。销售方式主要有两种，前期主要是分销商模式，但是2018年公司实现财政收入目标后，2019就开始通过公司自有网站获取线上商户进一步扩大市场。

分析背景

2019年12月需要向领导汇报2019年11月自行车销售情况，为精细化运营提供数据支持，能精准的定位目标客户群体。

分析目的

1、制定销售策略，调整产品结构，保持高速增长，获取更多的收益，占领更多市场份额。
2、通过对整个公司的自行车销量持续监测和分析，掌握公司自行车销售状况、走势的变化，为客户制订、调整和检查销售策略，完善产品结构提供依据。

数据源简介

根据业务需求，从数据库中梳理出三张表分析：
1.ods_sales_orders 订单明细表——用于用户行为分析。
2.dw_customer_order 时间地区产品聚合表——用于整体销售表现，地域销售表现，产品销售表现，热品销售分析。
3.ods_customer 每日新增用户表——用户用户行为分析。

ods_sales_orders订单明细表

dw_customer_order时间地区聚合表

ods_customer每日新增用户表

二、分析思路与过程

分析思路

分析过程

0、数据集观察

（1）导入常用包

#导入数据模块
import pandas as pd
import numpy as np
#引入pymysql
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
import datetime

（2）导入数据集

#从数据库读取数据源：从Mysql读取dw_customer_order，形成DataFrame格式，赋予变量gather_customer_order
#创建数据库引擎
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83/adventure_ods?charset=utf8')
yuan = engine
gather_customer_order = pd.read_sql_query('select * from dw_customer_order ',con = yuan)

（3）数据集的初步了解

gather_customer_order.head()

gather_customer_order.info()

为了便于后续按月分析数据，需要增加一个月份字段create_year_month，用于存储年月数据。

# 利用create_date字段增加create_year_month月份字段
gather_customer_order['create_year_month'] = gather_customer_order['create_date'].apply(lambda x :x.strftime('%Y-%m'))
gather_customer_order['create_year_month'] .head()

（4）筛选出自行车的数据

# 筛选产品类型cplb_zw中的自行车作为新的gather_customer_order
gather_customer_order=gather_customer_order.loc[gather_customer_order['cplb_zw']=='自行车']
gather_customer_order

1、整体销售表现：分析2019.1—2019.11自行车整体销售表现

字段解释
create_date 订单日期
product_name 产品名
cpzl_zw 产品子类
cplb_zw 产品类别
order_num 产品销售数量
customer_num 购买客户数
sum_amount 产品销售金额
is_current_year 是否当前年（1：是，0：否）
is_last_year 是否上一年（1：是，0：否）
is_yesterday 是否昨天（1：是，0：否）
is_today 是否今天（1：是，0：否）
is_current_month 是否当前余额（1：是，0：否）
is_current_quarter 是否当前季度（1：是，0：否）
chinese_province 所在省份
chinese_city 所在城市
chinese_territory 所在区域

pd.set_option('display.float_format', lambda x: '%.6f' % x)#取消科学计数法

（1）、自行车整体销量表现

# 聚合每月订单数量和销售金额，具体groupby创建一个新的对象，需要将order_num、sum_amount求和，对日期降序排序，记得重置索引
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num' : sum , 'sum_amount' : sum}).reset_index().\
                             sort_values('create_year_month',ascending = False)        
overall_sales_performance

#新增一列order_num_diff，此为每月自行车销售订单量环比，本月与上月相比
order_num_diff =list((overall_sales_performance.order_num.diff())/(overall_sales_performance.order_num)/-1)
order_num_diff.pop(0)
order_num_diff.append(0)
order_num_diff

这里使用diff()函数计算环比，diff()=前一个数—后一个数,将上述的环比列表转换为DataFrame并重命名,并将其拼接在overall_sales_performance中,并重命名为order_num_diff

order_num_diff=pd.DataFrame(order_num_diff)

overall_sales_performance =overall_sales_performance.set_index('create_year_month').reset_index()
overall_sales_performance = pd.concat([overall_sales_performance ,order_num_diff] ,axis = 1)
overall_sales_performance

（2）自行车整体销售额表现

# 新增一列sum_amount_diff，此为每月自行车销售金额环比，原理一样，但是所需字段不同，最后形成按照日期升序排列
sum_amount_diff = list((overall_sales_performance.sum_amount.diff())/(overall_sales_performance.sum_amount)/-1)
sum_amount_diff.pop(0)
sum_amount_diff.append(0)
sum_amount_diff
#将环比转换为Datadiff
sum_amount_diff = pd.DataFrame(sum_amount_diff,columns=['sum_amount_diff'])],axis=1 )
sum_amount_diff
#将overall_sales_performance和sum_amount_diff拼接起来
overall_sales_performance = pd.concat([overall_sales_performance ,sum_amount_diff],axis = 1)
overall_sales_performance

将最终的overall_sales_performance的DataFrame存到Mysql的pt_overall_sale_performance_1当中。

engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
overall_sales_performance.to_sql('pt_overall_sale_performance_1_yuan',con = yuan ,if_exists= 'replace')

可视化实现

自行车销量走势图

近11个月，11月自行车销售额最多，为3316辆，较10月增长了7.1%

自行车销售额走势图

近11月自行车销售量最多，为6190万元，较10月增长了8.7%，销售金额与销售数量趋势一致。

二、2019年11月自行车地域销售表现

（1）2019年11月区域销售表现

数据清洗筛选10月和11月的自行车数据

# 筛选10、11月的自行车数据，赋值变量为gather_customer_order_10_11
gather_customer_order_10_11 = gather_customer_order[(gather_customer_order[ 'create_year_month' ]=='2019-10')|(gather_customer_order[ 'create_year_month' ]=='2019-11') ]
gather_customer_order_10_11

# 按照'chinese_territory','create_year_month'，区域、月份分组，订单量求和、销售金额求和，赋予变量gather_customer_order_10_11_group，记得重置索引
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':sum , 'sum_amount':sum}).reset_index()
gather_customer_order_10_11_group

提取各个区域并存储在列表中，为后续计算11月的环比数据做准备。

region_list = gather_customer_order_10_11['chinese_territory'].unique()
region_list

这里需要生成order_x和amount_x两个空Series，用来存放11月各区域销售量和销售额的环比。pct_change()是（后一个值——前一个值）/前一个值

order_x=pd.Series([])
amount_x=pd.Series([])
#因为没有九月份的数据所以10月份的环比就为NaN，所以这里将Nan换成0
for i in region_list:
    a = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['order_num'].pct_change().fillna(0)
    b = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['sum_amount'].pct_change().fillna(0)
    order_x = order_x.append(a)
    amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']= amount_x
gather_customer_order_10_11_group.head()

将最终的gather_customer_order_10_11_group的DataFrame存入Mysql的pt_bicy_november_territory_2当中，

engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_yaun',con = yuan ,if_exists= 'replace')

将其导入到Excel中

gather_customer_order_10_11_group.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_territory_2_yaun.xlsx')

（2）2019年11月自行车销售量Top10城市环比

筛选11月自行车交易数据赋予变量为gather_customer_order_11

gather_customer_order_11 = gather_customer_order_10_11.loc[gather_customer_order_10_11['create_year_month']== '2019-11']
gather_customer_order_11

按照城市分组并对销售量求和，并进行降序排列，查看销量前十的城市。 ```

# 按照customer_order_11将gather_hinese_city城市分组，求和销售数量order_num，
# 最终查看11月自行车销售数量前十城市，赋予变量gather_customer_order_city_head
gather_customer_order_11 = gather_customer_order_11.groupby('chinese_city').agg({'order_num': sum}).reset_index().sort_values(by = 'order_num', ascending = False)
gather_customer_order_11=gather_customer_order_11.head(10)
gather_customer_order_11_head

# 根据gather_customer_order_city_head的前十城市，查看10月11月自行车销售数据gather_customer_order_10_11
# 赋予变量gather_customer_order_10_11_head
#查看10月11月的自行车销售数据
gather_customer_order_10_11.head()
#查看10月11月的自行车销售数据,筛选的是11月top10的城市，这里会用到isin()函数
gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11.chinese_city.isin(list(gather_customer_order_11_head['chinese_city']))]
#分组计算前十城市，自行车销售数量销售金额
gather_customer_order_city_10_11 =gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':sum ,'sum_amount':sum}).reset_index()
gather_customer_order_city_10_11

注意这里的isin（）函数是要筛选出gather_customer_order_city_10_11中11月Top10城市
计算11月份销售额和销售量的环比

# 根据gather_customer_order_city_10_11，计算前10的销售金额及销售量环比
city_top_list = gather_customer_order_city_10_11.chinese_city.unique()
order_top_x=pd.Series([])
amount_top_x=pd.Series([])
for i in city_top_list:
    a =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change().fillna(0)
    b =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change().fillna(0)
    order_top_x=order_top_x.append(a)
    amount_top_x=amount_top_x.append(b)
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff'] =  amount_top_x
gather_customer_order_city_10_11

将数据存到mysql中，并导出到Excel中

engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_yuan',con = yuan ,if_exists= 'replace')

gather_customer_order_city_10_11.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_october_city_3_yuan.xlsx')

实现可视化

地域销售环比增速

11月华东地区自行车销售量在8个地区中最多，较10月，华南地区增加23.6%，增速最快

Top10城市销售量表现

TOP城市市场份额占比

北京市和上海市销售量最多，郑州市环比增长最快，达到4.8%
TOP城市市场份额总占比13.41%

三、2019年11月自行车产品销量表现

（1）细分市场销量表现
gather_customer_order表利用groupby聚合月份，求每个月自行车的销售数量，赋值给变量gather_customer_order_group_month

# gather_customer_order表利用groupby聚合月份，求每个月自行车的销售数量，赋值给变量gather_customer_order_group_month
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':sum}).reset_index()
gather_customer_order_group_month

利用pd.merge模块合并自行车销售信息表（gather_customer_order）+自行车每月累计销售数量表(gather_customer_order_group_month)

# 利用pd.merge模块合并自行车销售信息表（gather_customer_order）+自行车每月累计销售数量表(gather_customer_order_group_month)
# 赋值变量给order_num_proportion
order_num_proportion = pd.merge(gather_customer_order ,gather_customer_order_group_month, on = ['create_year_month'])
order_num_proportion

通过自行车销量/自行车每月销量计算每单每月的销售量占比

# 计算自行车销量/自行车每月销量占比,计算结果形成新的列'order_proportion'
order_num_proportion['order_proportion']=(order_num_proportion['order_num_x'])/(order_num_proportion['order_num_y'])
order_num_proportion

将每月自行车销售数据存到mysql中,将最终的order_num_proportion的DataFrame存入Mysql的ppt_bicycle_product_sales_month_4

engine = create_engine('mysql+pymysql://frogdataXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
order_num_proportion.to_sql('ppt_bicycle_product_sales_month_4_yuan',con = yuan ,if_exists= 'replace')

导入到Excel 中

order_num_proportion.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\ppt_bicycle_product_sales_month_4_yuan.xlsx')

查看cpzl_zw有哪些产品子类

# 查看cpzl_zw有哪些产品子类
gather_customer_order['cpzl_zw'].unique()

（2）公路自行车细分市场表现

筛选出公路自行车,并将按照月份和不同型号的公路自行车进行分组，对销售量求和，并重置索引。

# 求公路自行车不同型号'product_name'字段的产品销售数量，赋值变量为gather_customer_order_road_month
gather_customer_order_road_month = gather_customer_order_road.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_road_month

# 求每个月公路自行车累计销售数量 赋值为gather_customer_order_road_month_sum，记得重置索引
gather_customer_order_road_month_sum =gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
gather_customer_order_road_month_sum =gather_customer_order_road_month[gather_customer_order_road_month['cpzl_zw'] == '公路自行车']. groupby('create_year_month').agg({'order_num':sum}).reset_index()                                 
gather_customer_order_road_month_sum.head()

# 在gather_customer_order_road_month基础上，合并公路自行车每月累计销售数量gather_customer_order_road_month_sum，主键为'create_year_month'
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month , gather_customer_order_road_month_sum,on='create_year_month')
gather_customer_order_road_month

（3）山地自行车

与公路自行车处理过程一致，赋予变量gather_customer_order_Mountain筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比

#筛选出山地自行车
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw']=='山地自行车']
# 求山地自行车不同型号的产品销售数量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
gather_customer_order_Mountain_month

#求每月累计销售数量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').agg({'order_num': sum}).reset_index()
gather_customer_order_Mountain_month_sum

#合并gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum两个表
gather_customer_order_Mountain_month=pd.merge(gather_customer_order_Mountain_month ,gather_customer_order_Mountain_month_sum ,on = 'create_year_month' )
gather_customer_order_Mountain_month

（4）旅游自行车

与公路自行车处理过程一致，赋予变量gather_customer_order_tour筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比

#筛选
gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
gather_customer_order_tour

#求旅游自行车不同型号产品销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_tour_month ['cpzl_zw'] = '旅游自行车'
gather_customer_order_tour_month

#求每月累计销售数量
gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby('create_year_month').agg({'order_num':sum}).reset_index()
gather_customer_order_tour_month_sum

#合并
gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month ,gather_customer_order_tour_month_sum ,on= 'create_year_month')
gather_customer_order_tour_month

将山地自行车、旅游自行车、公路自行车每月销量信息合并，并计算占比

#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month , gather_customer_order_Mountain_month,gather_customer_order_tour_month])
gather_customer_order_month
# 新增一列'order_num_proportio'，为销售量占每月自行车总销售量比率
#各类自行车，销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] = (gather_customer_order_month['order_num_x'])/(gather_customer_order_month['order_num_y'])
gather_customer_order_month

修改列名

gather_customer_order_month = gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'})
gather_customer_order_month

将数据存入数据库，并将其导入到Excel中

#将数据存入数据库
engine = create_engine('mysql+pymysql://frogdataXXXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_yuan',con = yuan,if_exists = 'replace')

gather_customer_order_month.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicycle_product_sales_order_month_4_yuan.xlsx')

（5）2019年11月自行车环比

筛选出2019年10月和11月的自行车数据

gather_customer_order_month_10_11 =gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
gather_customer_order_month_10_11

将10月和11月的自行车销售信息排序

#排序。将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
gather_customer_order_month_10_11.head()

查看自行车的种类

product_name =list(gather_customer_order_month_10_11['product_name'].unique())
product_name

计算每个类型自行车11月份的环比数据

# 计算每个类型11月份自行车的环比
order_top_x = pd.Series([])
for i in product_name:
    a =gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change().fillna(0)
    order_top_x = order_top_x.append(a)
gather_customer_order_month_10_11['order_num_diff'] =order_top_x
gather_customer_order_month_10_11

筛选出11月份的数据

gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month']== '2019-11']
gather_customer_order_month_11

（6）2019年1月至11月产品累计销量

筛选出1月到11月的数据

#使用str.contains()函数筛选出2019年的数据，然后用~取反,将12月份的数据排除，这里的str.contains()类似于SQL中的like
gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month['create_year_month'].str.contains('2019') & ~gather_customer_order_month['create_year_month'].str.contains('12')]
gather_customer_order_month_1_11.head()

#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby('product_name').agg({'order_month_product':sum}).reset_index()
gather_customer_order_month_1_11_sum

#重命名sum_order_1_11：1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
gather_customer_order_month_1_11_sum.head()

（7）2019年11月自行车产品销量、环比、累计销量

累计销量我们在gather_customer_order_month_1_11_sum中已计算好，11月自行车环比、及产品销量占比在gather_customer_order_month_11已计算好，这里我们只需将两张表关联起来，用pd.merge()

#按相同字段product_name产品名，合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on = 'product_name')
gather_customer_order_month_11

将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中

#将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_yuan',con = yuan,if_exists = 'replace')

细分市场销量表现

细分市场销量表现表

11月公路自行车占比最多，较10月相比，旅游自行车增速最快

公路自行车细分市场销量表现

公路自行车细分市场销量表现表

11月公路自行车，除Road-350-W Yellow外，其他型号的自行车环比都呈上升趋势 Road-650 较10月增长14.29%，增速最快。Road-150 Red销售占比最高，约为19.63%

山地自行车细分市场销售表现

山地自行车细分市场销售表现表

11月山地自行车，除Mountain-200 Black外，其他型号的自行车环比呈上升的趋势型号Mountain-500 Silver增速最快，为19.51%  型号Mountain-200 Silver销售份额占比最大

旅游自行车细分市场销售表现

旅游自行车细分市场销售表现表

11月旅游自行车，除型号Touring-2000 Blue、Touring-3000 Blue外，其他型号的自行车环呈上升趋势，型号Touring-1000 Yellow较10月增速最快，为27.18% ，型号Touring-1000 Blue销售份额占比最大，为32.52%

四、用户行为分析

这里我们需要使用订单明细表：ods_sales_orders，ods_customer用户表
需要读取数据库客户信息表

#读取数据库客户信息表
# 导入order_customer表
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)

#导入ods_sales_orders表
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = datafrog)

观察数据可知，销售订单表中没有客户年龄性别等信息，因此需要将销售信息表和客户信息表合并。

sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how= 'left')
sales_customer_order_11

利用split函数提取sales_customer_order_11['birth_date']，获取客人的年份作为新的一列，以字符串类型存储

customer_birth_year  = sales_customer_order_11['birth_date'].str.split('-',2).apply(lambda x :x[0] if type(x) == list else x)
customer_birth_year.name='birth_year'
sales_customer_order_11 = pd.concat([sales_customer_order_11,customer_birth_year],axis = 1)
sales_customer_order_11

（1）用户年龄分析

#修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].fillna(method  = 'ffill').astype('int')
# 计算用户年龄
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
sales_customer_order_11.head()

利用pd.cut()函数对年龄进行分层

# 请利用customer_age字段，进行年龄分层，划分层次为"30-34","35-39","40-44","45-49","50-54","55-59","60-64"，最终形成age_level字段
customer_age_lst =[i for i in range(30 , 68 ,5)]
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'] , bins =  customer_age_lst ,right =False, labels = ['30-34','35-39','40-44','45-49','50-54','55-59','60-64'])
sales_customer_order_11

筛选出销售订单信息为自行车的订单信息

#筛选销售订单为自行车的订单信息
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
df_customer_order_bycle

计算年龄比例

# 计算年龄比例，最终形成df_customer_order_bycle['age_level_rate']
df_customer_order_bycle ['age_level_rate'] = 1/(df_customer_order_bycle.customer_key.count())

将年龄划分为3个层次，分别为<=29'、'30-39'、'>=40'，因为年龄最大的用户是62岁，所以将上线设置为100

# 将年龄分为3个层次，分别为'<=29'、'30-39'、'>=40'
df_customer_order_bycle['age_level2'] = pd.cut(df_customer_order_bycle['customer_age'], bins = [0,30,40,100] ,right= False, labels = ['<=29','30-39','>=40'])
# 求每个年龄段人数
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()
age_level2_count

（2）用户性别
计算不同性别的总人数

gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
gender_count

计算每个客户的年龄在该年龄段的比率

#将age_level2_count合并在df_customer_order_bycle中，并改名
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})

df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']

计算每个客户的性别在该性别的比率

#将gender_count合并在df_customer_order_bycle中，并改名
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})

df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
df_customer_order_bycle.head()

将df_customer_order_bycle 将11月自行车用户存入数据库

#df_customer_order_bycle 将11月自行车用户存入数据库
#存入数据库
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
df_customer_order_bycle.to_sql('pt_user_behavior_november_yuan',con = yuan ,if_exists='replace')

2019年11月全国网络年龄分布

年龄段消费群分析

根据年龄断划分，年龄35-39岁消费人数占比最高，为29%；之后随着年龄的增长，占比逐渐下降
针对年龄（大于30岁）和细分市场的关联分析，购买公路自行车占比最大，旅游自行车占比最小。

全国男女比例

男女消费群分析

男性与女性购买自行车占比几乎相同
针对性别和细分市场的关联分析，男性和女性购买公路自行车占比最高，购买旅游自行车占比最少

五、2019年11月热品销售分析

（1）11月产品销售量TOP10产品，销售数量及环比

筛选11 月的数据

#筛选11月数据
gather_customer_order_11 = gather_customer_order.loc[gather_customer_order['create_year_month'] == '2019-11']
gather_customer_order_11

计算产品销售数量,按照销量降序,取TOP10产品

#计算产品销售数量,\ 为换行符
#按照销量降序，取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby('product_name').agg({'order_num': 'count' }).reset_index().\
                            sort_values(by = 'order_num',ascending = False).head(10)
#TOP10销量产品信息
list(gather_customer_order_11_top10['product_name'])

计算TOP10销量和环比

#查看11月环比数据
gather_customer_order_month_10_11.head()

这里我们只需要四个字段：create_year_month月份，product_name产品名，order_month_product本月销量，cpzl_zw产品类别，order_num_diff本月产品销量环比

customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\                                                      isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11

给销量前10的型号加上一个字段本月TOP10销量

customer_order_month_10_11['category'] = '本月TOP10销量'
customer_order_month_10_11.head()

（2）11月增速TOP10产品，销售数量及环比

customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
                            sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11

筛选出11 月增速TOP10 的型号

customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\                                                       isin(list(customer_order_month_11['product_name']))]

筛选我们需要的四个字段：create_year_month月份，product_name产品名，order_month_product本月销量，cpzl_zw产品类别，order_num_diff本月产品销量环比

customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
customer_order_month_11_top10_seep

将增速top10的表和销量top10的表合并

#axis = 0按照行维度合并，axis = 1按照列维度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11

将数据存到mysql数据库中

engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
hot_products_11.to_sql('pt_hot_products_november_yuan',con = yuan,if_exists = 'replace')

11月型号为Mountain-200 Silver销售量最多，为395辆；较 10月增长10.64%

11月，型号为Touring-1000 Yellow增速最快；较10月增长 27.18%