我们比较的对象是南方和非南方各州,因变量为监禁的概率。一个针对两组的独立样本t检验可以用于检验两个总体的均值相等的假设。这里的假设两组数据是独立的,并且是从正态总体中抽取的。检验的格式为
t.test(y ~ x, data),其中,y是一个数值型变量,x是一个二分变量
> library(MASS)
> t.test(Prob ~ So, data= UScrime)
Welch Two Sample t-test
data: Prob by So
t = -3.8954, df = 24.925, p-value = 0.0006506
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.03852569 -0.01187439
sample estimates:
mean in group 0 mean in group 1
0.03851265 0.06371269
> #结果表明:南方各州的非南方各州拥有的监禁概率是显著不同的。其中p=0.000
> library(MASS)
> sapply(UScrime[c("U1","U2")],function(x)(c(mean=mean(x),sd=sd(x))))
U1 U2
mean 95.46809 33.97872
sd 18.02878 8.44545
非独立样本样本t检验的检验假定组间的差异,格式为:
t.test(y1,y2,paried=TRUE)
其中,y1和y2为两个非独立的数值向量
> with(UScrime, t.test(U1, U2, paried=TRUE))
Welch Two Sample t-test
data: U1 and U2
t = 21.174, df = 65.261, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
55.69010 67.28862
sample estimates:
mean of x mean of y
95.46809 33.97872
> #差异的均值(61.5)足够大,可以保证拒绝年长和年轻男性的平均失业率相同的假设。年轻男性的失业率更高
R语言T检验的基本知识到这就结束了,咱们下期再见!O(∩_∩)O哈哈~