数据选自Journal of the American Medical Association(http://jse.amstat.org/v4n2/datasets.shoemaker.html
)关于体温、性别、心率的临床数据
现对男性体温抽样计算下95%置信区间总体均值范围。
1、读取数据
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
#读取数据
df = pd.read_csv('http://jse.amstat.org/datasets/normtemp.dat.txt', header = None,sep = '\s+' ,names=['体温','性别','心率'])
2、选取样本大小,查看数据
np.random.seed(42)
#df.describe()
#样本量为90,查看样本数据
df_sam = df.sample(90)
df_sam.head()
3、计算抽取样本中男士体温的均值
df3 = df_sam.loc[df_sam['性别']==1]
df3['体温'].mean()
4、重复抽取样本,计算其他样本中男士体温的均值,得到抽样分布
boot_means = []
for _ in range(10000):
bootsample = df.sample(90, replace=True)
mean = bootsample[bootsample['性别'] == 1]['体温'].mean()
boot_means.append(mean)
5、绘制男士体温抽样分布均值6、计算抽样分布的置信区间以估计总体均值, 置信度95%
np.percentile(boot_means, 2.5), np.percentile(boot_means, 97.5)
(97.89249519230768, 98.30741452991455)