单个分组
import pandas as pd
import seaborn as sns
保存分组
tips_10 = sns.load_dataset('tips').sample(10, random_state=42)
print(tips_10)
'''
total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
198 13.00 2.00 Female Yes Thur Lunch 2
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
124 12.48 2.52 Female No Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2
101 15.38 3.00 Female Yes Fri Dinner 2
'''
grouped = tips_10.groupby('sex')
# 查看实际分组
print(grouped.groups)
'''
{'Male': [24, 6, 153, 211, 176, 192, 9], 'Female': [198, 124, 101]}
'''
{'Male': [24, 6, 153, 211, 176, 192, 9], 'Female': [198, 124, 101]}
选择分组
female = grouped.get_group('Female')
print(female)
'''
total_bill tip sex smoker day time size
198 13.00 2.00 Female Yes Thur Lunch 2
124 12.48 2.52 Female No Thur Lunch 2
101 15.38 3.00 Female Yes Fri Dinner 2
'''
total_bill tip sex smoker day time size
198 13.00 2.00 Female Yes Thur Lunch 2
124 12.48 2.52 Female No Thur Lunch 2
101 15.38 3.00 Female Yes Fri Dinner 2
涉及多个变量的分组计算
针对可能计算的列计算,删除不能计算的列
avg = grouped.mean()
# 没有意义的列不计算不展示
print(avg)
'''
total_bill tip size
sex
Male 20.02 2.875714 2.571429
Female 13.62 2.506667 2.000000
'''
total_bill tip size
sex
Male 20.02 2.875714 2.571429
Female 13.62 2.506667 2.000000
历遍分组
for sex_group in grouped:
print(sex_group)
'''
('Male', total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2)
('Female', total_bill tip sex smoker day time size
198 13.00 2.00 Female Yes Thur Lunch 2
124 12.48 2.52 Female No Thur Lunch 2
101 15.38 3.00 Female Yes Fri Dinner 2)
'''
('Male', total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2)
('Female', total_bill tip sex smoker day time size
198 13.00 2.00 Female Yes Thur Lunch 2
124 12.48 2.52 Female No Thur Lunch 2
101 15.38 3.00 Female Yes Fri Dinner 2)
grouped中的元素sex_group是一个元组,sex_group的第一个元素是字符串(类似于‘键’),第二个元素是DataFrame(类似于‘值’)
for sex_group in grouped:
print('the type is: {}'.format(type(sex_group)))
print('the length is: {}\n'.format(len(sex_group)))
first_element = sex_group[0]
print('the first element is:{}'.format(first_element))
print('it has a type of: {}\n'.format(type(first_element)))
second_element = sex_group[1]
print('the second element is:\n{}'.format(second_element))
print('it has a type of: {}\n'.format(type(second_element)))
print('what we have:')
print(sex_group)
break
'''
the type is: <class 'tuple'>
the length is: 2
the first element is:Male
it has a type of: <class 'str'>
the second element is:
total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2
it has a type of: <class 'pandas.core.frame.DataFrame'>
what we have:
('Male', total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2)
'''
the type is: <class 'tuple'>
the length is: 2
the first element is:Male
it has a type of: <class 'str'>
the second element is:
total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2
it has a type of: <class 'pandas.core.frame.DataFrame'>
what we have:
('Male', total_bill tip sex smoker day time size
24 19.82 3.18 Male No Sat Dinner 2
6 8.77 2.00 Male No Sun Dinner 2
153 24.55 2.00 Male No Sun Dinner 4
211 25.89 5.16 Male Yes Sat Dinner 4
176 17.89 2.00 Male Yes Sun Dinner 2
192 28.44 2.56 Male Yes Thur Lunch 2
9 14.78 3.23 Male No Sun Dinner 2)