假设检验也叫显著性检验,是以小概率反证法的逻辑推理,判断假设是否成立的统计方法,它首先假设样本对应的总体参数(或分布)与某个已知总体参数(或分布)相同,然后根据统计量的分布规律来分析样本数据,利用样本信息判断是否支持这种假设,并对检验假设做出取舍抉择,做出的结论是概率性的,不是绝对的肯定或否定。
一般过程
假设检验的一般流程:提出假设,包括原假设和备择假设——构造统计量,也被称为检验统计量——计算统计量的值——确定显著性水平和相应的拒绝域——验证统计量的值是否落入拒绝域——若落入拒绝域,则拒绝原假设,否之接受原假设。
两类错误
因为总体的情况我们总是无法得知,而根据有限样本去判断一个假设是否成立,不论我们对假设最后做出何种判断,我们总是会在一定的概率下会犯错误。我们把原假设为真而我们却拒绝了称为弃真错误(错误,犯错概率为),第二类错误是原假设不成立而我们却接受了,我们把这类错误称为取伪错误(错误)。
我们总是希望这两类的错误越小越好,但是在样本量一定的情况下,不能同时做到两类错误的概率都很小,一般来说哪一类错误带来的后果比较严重,那么首要控制某一类错误。通常在业界,都将 错误的控制作为首要目标,这是因为我们一般会把研究明确、更关心的东西放在原假设,所以意味着我们不会轻易拒绝
没有拒绝 | 拒绝 | |
---|---|---|
为真 | 1- | |
为假 | 1- |
通过这种框架下展开的检验,我们并不是要弄清是否为真。通过上述的检验过程所进行的决策过程可以保证型错误的风险是固定的(依然是存在的),而且型错误的风险是被最小化的(依然是存在的)。
t检验
T检验是用于两个样本(或样本与群体)平均值差异程度的检验方法。它是用T分布理论来推断差异发生的概率,从而判定两个平均数的差异是否显著。
T检验的适用条件为样本分布符合正态分布。
T检验的应用条件:
当样本例数较小时,要求样本取自正态总体;
做两样本均数比较时,还要求两样本的总体方差相
等。
T检验的用途:(1)样本均数与群体均数的比较;(2)两样本均数的比较。
### two-sided p-value and alpha
##postscript(file=paste(plotDIR, "rejectR2.eps", sep="/"),
## height=2., width=4.5, horizontal=F)
# tikz(file=paste(plotDIRch4, "rejectR2.tex", sep="/"),
# height=2., width=4.5, standAlone=F)
par(mar=c(2,0,0,0), mgp=c(1.25, 0.125, 0), las=1, tck=0.01)
## the null
plot(seq(-3,3,,100), dnorm(seq(-3,3,,100)), type="l",
xlab="", ylab="", axes=F)
polygon(c(1.75, seq(1.75, 3,,50), 3), c(0, dnorm(seq(1.75, 3,,50)), 0),
col=gray(.8))
polygon(c(-3, seq(-3, -1.75,,50), -1.75), c(0, dnorm(seq(-3, -1.75,,50)), 0),
col=gray(.8))
polygon(c(2.25, seq(2.25, 3,,50), 3), c(0, dnorm(seq(2.25, 3,,50)), 0),
density=20, angle=45)
polygon(c(-3, seq(-3,-2.25,,50), -2.25), c(0, dnorm(seq(-3, -2.25,,50)), 0),
density=20, angle=-45)
text(3, dnorm(2.5)*4.2, "$p$-value/2", adj=c(1,0), cex=0.75)
text(-3, dnorm(2.5)*4.2, "$p$-value/2", adj=c(0,0), cex=0.75)
arrows(x0=2.5, y0=dnorm(2.5)/2, x1=2.75, y1=dnorm(2.5)*4, length=0.1,
angle=25, code=1)
arrows(x0=-2.5, y0=dnorm(2.5)/2, x1=-2.75, y1=dnorm(2.5)*4, length=0.1,
angle=25, code=1)
axis(1, at=c(1.75), label="$t_{cutoff}$")
axis(1, at=c(2.25), label="$t_{obs}$")
axis(1, at=c(-1.75), label="$-t_{cutoff}$")
axis(1, at=c(-2.25), label="$-t_{obs}$")
axis(1, at=c(0), label="$\\mu_0$")
text( 1.8, dnorm(1.9)/2.2, "$\\alpha/2$", adj=c(0, 0.5), cex=0.75)
text(-1.8, dnorm(1.9)/2.2, "$\\alpha/2$", adj=c(1, 0.5), cex=0.75)
axis(1, at=c(-3,3), label=c("", ""))
单侧检验与双侧检验
• 在进行t检验时,如果其目的在于检验两个总体均数是否相等,即为双侧检验。 例如检验某种新降压药与常用降压药效力是否相同?就是说,新药效力可能比旧药好,也可能比旧药差,或者相同,都有可能。
• 如果我们已知新药效力不可能低于旧药效力,例如磺胺药+磺胺增效剂从理论上推知其效果不可能低于单用磺胺药,这时,无效假设为H0:μ1=μ2, 备择假设为H1: μ1>μ2 , 统计上称为单侧检验。
par(mfrow=c(2,1),mar=c(2,0,0,0), mgp=c(1.25, 0.125, 0), las=1, tck=0.01)
## the null
plot(seq(-3,3,,100), dnorm(seq(-3,3,,100)), type="l",
xlab="", ylab="", axes=F, xlim=c(-3,6))
polygon(c(1.75, seq(1.75, 3,,50), 3), c(0, dnorm(seq(1.75, 3,,50)), 0),
col=gray(.8))
polygon(c(2.25, seq(2.25, 3,,50), 3), c(0, dnorm(seq(2.25, 3,,50)), 0),
density=20, angle=45)
text(1.9, dnorm(1.9)/2, "$\\alpha$")
text(2.25, dnorm(0)/1.8, "$t_{obs}$", adj=c(0,1))
segments(x0=2.25, x1=2.25, y0=0, y1=dnorm(0)/2.1)
text(2.75, dnorm(2.5)*4.2, "$p$-value", adj=c(0,0))
text(6, dnorm(0)*(2/3), "$H_0$", adj=1)
arrows(x0=2.5, y0=dnorm(2.5)/2, x1=2.75, y1=dnorm(2.5)*4, length=0.1,
angle=25, code=1)
abline(v=1.75)
axis(1, at=seq(-2,6,2)[-3], label=rep("", 4))
axis(1, at=c(1.75), label="$t_{cutoff}$")
axis(1, at=c(0), label="$\\mu_0$")
## the alternative
plot(seq(0,6,,100), dnorm(seq(0,6,,100), mean=3), type="l",
xlab="", ylab="", axes=F, xlim=c(-3,6))
#polygon(c(-1, seq(-1, 1.3,,50), 1.3), c(0, dnorm(seq(-1, 1.3,,50)), 0),
#col=gray(.65))
axis(1, at=c(3), label="$\\mu_a$")
axis(1, at=seq(-2,6,2)[-3], label=rep("", 4))
abline(v=1.75)
#polygon(c(1.75, seq(1.75, 6,,50), 6), c(0, dnorm(seq(1.75, 6,,50),3 ), 0),
#density=20, angle=45)
polygon(c(0, seq(0, 1.75,,50), 1.75), c(0, dnorm(seq(0, 1.75,,50),3 ), 0),
density=15, angle=-45)
text(3, dnorm(0)/2, "$1-\\beta$")
text(3, dnorm(0)/4, "(power)")
text(1, dnorm(1.75, 3)/2, "$\\beta$")
text(-2, dnorm(0)*(2/3), "$H_a$", adj=0)
- 样本与总体均数的比较
单体检验是针对一组样本的假设检验。零假设为H0: μ=μ0。
x<-c(3,8,19,24,6,1)
y<-c(1,25,21,3,2,1)
t.test(y,mu=-15)
One Sample t-test
data: y
t = 5.2732, df = 5, p-value = 0.003263
alternative hypothesis: true mean is not equal to -15
95 percent confidence interval:
-2.784951 20.451618
sample estimates:
mean of x
8.833333
- 非配对双体检验
非配对双体检验针对独立的两组样本。非配对双体检验假设两组样本是从不同的正态分布采样出来的。根据两个正态分布的标准差是否相等,非配对双体检验又可以分两类。一种是分布标准差相等的情况。零假设是两组样本的分布期望相等.
> t.test(y,x)
Welch Two Sample t-test
data: y and x
t = -0.22649, df = 9.6899, p-value = 0.8255
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.50729 11.84062
sample estimates:
mean of x mean of y
8.833333 10.166667
- 配对双体检验
配对双体检验针对配对的两组样本。配对双体检验假设两组样本之间的差值服从正态分布。如果该正态分布的期望为零,则说明这两组样本不存在显著差异。零假设为 H0:μ=μ0
> t.test(y,x,paired=TRUE)
Paired t-test
data: y and x
t = -0.26786, df = 5, p-value = 0.7995
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.12899 11.46232
sample estimates:
mean of the differences
-1.333333
非参数方法
在假设检验中,如果检验统计量是不依赖于总体的分布或参数(粗略地说,就是检验统计量中不包含总体的参数或总体参数的估计值)的,则这种检验方法就称为非参数方法或非参数检验。几乎所有的非参数方法都是基于数据的秩变换。在样本中,秩变换是指用每个数据的排序来代替其取值。
- Sign Test
符号检验用于判断二项分布是否具有相等概率。
binom.test(5, 18)
Exact binomial test
data: 5 and 18
number of successes = 5, number of trials = 18, p-value = 0.09625
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.09694921 0.53480197
sample estimates:
probability of success
0.2777778
- Wilcoxon Signed-Rank Test
如果两个数据样本来自同一受试者的重复观察,则它们是匹配的。利用Wilcoxon Signed-Rank检验,在不假设数据服从正态分布的前提下,判断出相应的数据总体分布是否相同。
library(MASS) # load the MASS package
head(immer)
wilcox.test(immer$Y1, immer$Y2, paired=TRUE)
Wilcoxon signed rank test with continuity correction
data: immer$Y1 and immer$Y2
V = 368.5, p-value = 0.005318
alternative hypothesis: true location shift is not equal to 0
- Mann-Whitney-Wilcoxon Test
如果两个数据样本来自不同的群体,且不相互影响,则它们是独立的。利用Mann-Whitney-Wilcoxon检验,在不假设种群分布服从正态分布的情况下,判断种群分布是否相同。
mtcars$mpg
mtcars$am
wilcox.test(mpg ~ am, data=mtcars)
Wilcoxon rank sum test with continuity correction
data: mpg by am
W = 42, p-value = 0.001871
alternative hypothesis: true location shift is not equal to 0
- Kruskal-Wallis Test
如果数据样本来自不相关的人群,且样本之间不相互影响,则数据样本的收集是独立的。利用Kruskal-Wallis检验,在不假设总体分布服从正态分布的情况下,判断总体分布是否相同。
head(airquality)
kruskal.test(Ozone ~ Month, data = airquality)
Kruskal-Wallis rank sum test
data: Ozone by Month
Kruskal-Wallis chi-squared = 29.267, df = 4, p-value = 6.901e-06
方差分析
方差分析(Analysis of Variance,简称ANOVA),又称“变异数分析”,是R.A.Fisher发明的,用于两个及两个以上样本均数差别的显著性检验。 由于各种因素的影响,研究所得的数据呈现波动状。造成波动的原因可分成两类,一是不可控的随机因素,另一是研究中施加的对结果形成影响的可控因素。
方差分析的基本假设是 不同样本组的平均数间的差异基本来源有两个:
(1) 实验变量,即样本的主要区别的造成的差异(例如,男和女),称为组间差异。用所有变量在各自己组的均值与所有变量糅合在一块儿总均值之偏差平方和的总和表示,记作SSb,其自由度为dfb。
(2) 随机误差,如测量误差造成的差异或每个个体间的差异,称为组内差异,用变量在各组的均值与该组内变量值之偏差平方和的总和表示, 记作SSw,组内自由度为dfw。
总偏差平方和 SSt = SSb + SSw。
组内SSw、组间SSb除以各自的自由度(组内dfw =n-m,组间dfb=m-1,其中n为样本总数,m为组数),得到其均方MSw和MSb,一种情况是实验条件没有作用,即各组样本均来自分布相同的同一总体,MSb/MSw≈1。另一种情况是处理确实有作用,组间均方是由于误差与不同处理共同导致的结果,即各样本来自不同总体。那么,MSb>>MSw(远远大于1)。
MSb/MSw比值构成F分布。用F值与其临界值比较,作为在给定显著性推断各样本是否来自相同的总体的依据。
方差分析的基本思想是:通过分析研究不同来源的变异对总变异的贡献大小,从而确定可控变量对研究结果显著性的大小。
ANOVA in R
- 方差齐性检验
## Everglades example
TP.reference<-read.table("..//Data//EvergladesP.txt",header = T)
library("reshape2")
TP.reference<-melt(TP.reference,measure.vars=c("JAN","FEB","MAR" , "APR" , "MAY" ,"JUN", "JUL" , "AUG" , "SEP" , "OCT" , "NOV", "DEC" , "ANN" ),
variable.name ="Month",
value.name ="TP")
TP.reference<-read.csv("..//Data//WCA2TP.csv",header = T)
TP.reference$Month<-sapply(as.character(TP.reference$Date),function(x){strsplit(x,"/")[[1]][1]})
table(TP.reference$Year, TP.reference$Month)
histogram(~log(RESULT) | SITE, data=TP.reference)
print(
qqmath(~log(RESULT) | factor(Year),
panel=function(x, ...){
panel.grid()
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
}, data=TP.reference,
subset=SITE!="E5"&SITE!="F5",
par.strip.text=list(cex=0.5),
xlab="Unit Normal Quantile",
ylab="Log TP Concentration (ppb)")
)
qqmath(~log(RESULT) | SITE,
panel=function(x, ...){
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
}, data=TP.reference,
subset=SITE!="E5"&SITE!="F5")
qqmath(~log(RESULT) | Month,
panel=function(x, ...){
panel.qqmathline(x, ...)
panel.qqmath(x, ...)
}, data=TP.reference)
EvergPrecip <- read.table(paste(dataDIR, "EvergPrecip.txt", sep="/"),
header=F)
EvergPrecip2 <- read.table(paste(dataDIR, "EvergladesP.txt", sep="/"),
header=T)
EvergPrecip2$ANN2 <- apply(EvergPrecip2[,2:13], 1, sum)
EvergPrecip2$Msd <- apply(EvergPrecip2[,2:13], 1, sd)
Y.lim <- c(0.99, 1.1)*range(c(EvergPrecip2$ANN2+2*EvergPrecip2$Msd),
c(EvergPrecip2$ANN2-2*EvergPrecip2$Msd), na.rm=T)
#postscript(file=paste(plotDIR, "EvergPrecip.eps", sep="/"),
# width=3.75, height=2.5, horizontal=F)
# tikz(file=paste(plotDIRch4, "EvergPrecip.tex", sep="/"),
# width=3.75, height=2.5, standAlone=F)
par(mar=c(2.5,2.5,0.5,0.25), mgp=c(1.5, 0.5, 0))
plot(EvergPrecip2$ANN2/12 ~ EvergPrecip2$YEAR, type="n",
xlab="Year", ylab="Annual Precipitation (in)", ylim= Y.lim, xlim=c(1990,2003))
segments(x0= EvergPrecip2$YEAR, x1=EvergPrecip2$YEAR,
y0=EvergPrecip2$ANN2-2*EvergPrecip2$Msd,
y1=EvergPrecip2$ANN2+2*EvergPrecip2$Msd)
segments(x0= EvergPrecip2$YEAR, x1=EvergPrecip2$YEAR,
y0=EvergPrecip2$ANN2-EvergPrecip2$Msd,
y1=EvergPrecip2$ANN2+EvergPrecip2$Msd, lwd=3)
points(EvergPrecip2$YEAR, EvergPrecip2$ANN2)
segments(x0=c(1993.5, 1999.5), x1=c(1993.5, 1999.5),
y0=rep(Y.lim[1], 2), y1=rep(Y.lim[2], 2), lwd=2, col="grey")
segments(x0=rep(1993.5, 2), x1=rep(1999.5, 2),
y0=c(Y.lim[1], Y.lim[2]), y1=c(Y.lim[1], Y.lim[2]), lwd=2, col="grey")
abline(h=mean(EvergPrecip2$ANN2, na.rm=T), col=gray(0.5) )
(Everg.aov <- aov(log(RESULT) ~ factor(Year), data=TP.reference))
Call:
aov(formula = log(RESULT) ~ factor(Year), data = TP.reference)
Terms:
factor(Year) Residuals
Sum of Squares 16.0253 1579.8779
Deg. of Freedom 5 1272
Residual standard error: 1.11447
Estimated effects may be unbalanced
qqmath(~resid(Everg.aov),
panel = function(x,...) {
panel.grid()
panel.qqmath(x,...)
panel.qqmathline(x,...)
}, ylab="Residuals", xlab="Unit Normal Quantile"
)
xyplot(sqrt(abs(resid(Everg.aov)))~fitted(Everg.aov),
panel=function(x,y,...){
panel.grid()
panel.xyplot(x, y,...)
panel.loess(x, y, span=1, col="grey",...)
}, ylab="Sqrt. Abs. Residualt", xlab="Fitted")
## anova as a linear model (dummy variables)
anova.data <- data.frame(y=TP.reference$RESULT,
x2=ifelse(TP.reference$Year==1995, 1, 0),
x3=ifelse(TP.reference$Year==1996, 1, 0),
x4=ifelse(TP.reference$Year==1997, 1, 0),
x5=ifelse(TP.reference$Year==1998, 1, 0),
x6=ifelse(TP.reference$Year==1999, 1, 0))
anova.lm <- lm(log(y) ~ x2+x3+x4+x5+x6, data=anova.data)
summary(anova.lm)
Call:
lm(formula = log(y) ~ x2 + x3 + x4 + x5 + x6, data = anova.data)
Residuals:
Min 1Q Median 3Q Max
-2.0171 -0.9545 -0.0989 0.8610 4.5633
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.50436 0.08894 -39.399 < 2e-16 ***
x2 -0.36943 0.11170 -3.307 0.000968 ***
x3 -0.16319 0.11478 -1.422 0.155345
x4 -0.17488 0.11896 -1.470 0.141811
x5 -0.27044 0.11323 -2.389 0.017062 *
x6 -0.15415 0.12500 -1.233 0.217751
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.114 on 1272 degrees of freedom
Multiple R-squared: 0.01004, Adjusted R-squared: 0.00615
F-statistic: 2.58 on 5 and 1272 DF, p-value: 0.0248
powerT(TP.reference$RESULT)
One Way Analysis of Variance
datafilename="http://personality-project.org/R/datasets/R.appendix1.data"
data.ex1=read.table(datafilename,header=T) #read the data into a table
aov.ex1 = aov(Alertness~Dosage,data=data.ex1) #do the analysis of variance
summary(aov.ex1) #show the summary table
Df Sum Sq Mean Sq F value Pr(>F)
Dosage 2 426.2 213.12 8.789 0.00298 **
Residuals 15 363.8 24.25
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
print(model.tables(aov.ex1,"means"),digits=3) #report the means and the number of subjects/cell
boxplot(Alertness~Dosage,data=data.ex1) #graphical summary
Two way - between subject analysis of variance
> datafilename="http://personality-project.org/r/datasets/R.appendix2.data"
> data.ex2=read.table(datafilename,header=T) #read the data into a table
> head(data.ex2) #show the data
Observation Gender Dosage Alertness
1 1 m a 8
2 2 m a 12
3 3 m a 13
4 4 m a 12
5 5 m b 6
6 6 m b 7
> aov.ex2 = aov(Alertness~Gender*Dosage,data=data.ex2) #do the analysis of variance
> summary(aov.ex2) #show the summary table
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 76.56 76.56 2.952 0.111
Dosage 1 5.06 5.06 0.195 0.666
Gender:Dosage 1 0.06 0.06 0.002 0.962
Residuals 12 311.25 25.94
> print(model.tables(aov.ex2,"means"),digits=3) #report the means and the number of subjects/cell
Tables of means
Grand mean
14.0625
Gender
Gender
f m
16.25 11.88
Dosage
Dosage
a b
13.50 14.62
Gender:Dosage
Dosage
Gender a b
f 15.75 16.75
m 11.25 12.50
> boxplot(Alertness~Dosage*Gender,data=data.ex2) #graphical summary of means of the 4 cells
1 way ANOVA - within subjects
> datafilename="http://personality-project.org/r/datasets/R.appendix3.data"
> data.ex3=read.table(datafilename,header=T) #read the data into a table
> data.ex3 #show the data
Observation Subject Valence Recall
1 1 Jim Neg 32
2 2 Jim Neu 15
3 3 Jim Pos 45
4 4 Victor Neg 30
5 5 Victor Neu 13
6 6 Victor Pos 40
7 7 Faye Neg 26
8 8 Faye Neu 12
9 9 Faye Pos 42
10 10 Ron Neg 22
11 11 Ron Neu 10
12 12 Ron Pos 38
13 13 Jason Neg 29
14 14 Jason Neu 8
15 15 Jason Pos 35
> aov.ex3 = aov(Recall~Valence+Error(Subject/Valence),data.ex3)
> summary(aov.ex3)
Error: Subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 105.1 26.27
Error: Subject:Valence
Df Sum Sq Mean Sq F value Pr(>F)
Valence 2 2029.7 1014.9 189.1 1.84e-07 ***
Residuals 8 42.9 5.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> print(model.tables(aov.ex3,"means"),digits=3) #report the means and the number of subjects/cell
Tables of means
Grand mean
26.46667
Valence
Valence
Neg Neu Pos
27.8 11.6 40.0
> boxplot(Recall~Valence,data=data.ex3) #graphical output
Two-way Within Subjects ANOVA
> datafilename="http://personality-project.org/r/datasets/R.appendix4.data"
> data.ex4=read.table(datafilename,header=T) #read the data into a table
> data.ex4 #show the data
Observation Subject Task Valence Recall
1 1 Jim Free Neg 8
2 2 Jim Free Neu 9
3 3 Jim Free Pos 5
4 4 Jim Cued Neg 7
5 5 Jim Cued Neu 9
6 6 Jim Cued Pos 10
7 7 Victor Free Neg 12
8 8 Victor Free Neu 13
9 9 Victor Free Pos 14
10 10 Victor Cued Neg 16
11 11 Victor Cued Neu 13
12 12 Victor Cued Pos 14
13 13 Faye Free Neg 13
14 14 Faye Free Neu 13
15 15 Faye Free Pos 12
16 16 Faye Cued Neg 15
17 17 Faye Cued Neu 16
18 18 Faye Cued Pos 14
19 19 Ron Free Neg 12
20 20 Ron Free Neu 14
21 21 Ron Free Pos 15
22 22 Ron Cued Neg 17
23 23 Ron Cued Neu 18
24 24 Ron Cued Pos 20
25 25 Jason Free Neg 6
26 26 Jason Free Neu 7
27 27 Jason Free Pos 9
28 28 Jason Cued Neg 4
29 29 Jason Cued Neu 9
30 30 Jason Cued Pos 10
> aov.ex4=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),data.ex4 )
>
> summary(aov.ex4)
Error: Subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 349.1 87.28
Error: Subject:Task
Df Sum Sq Mean Sq F value Pr(>F)
Task 1 30.00 30.000 7.347 0.0535 .
Residuals 4 16.33 4.083
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Subject:Valence
Df Sum Sq Mean Sq F value Pr(>F)
Valence 2 9.80 4.900 1.459 0.288
Residuals 8 26.87 3.358
Error: Subject:Task:Valence
Df Sum Sq Mean Sq F value Pr(>F)
Task:Valence 2 1.40 0.700 0.291 0.755
Residuals 8 19.27 2.408
> print(model.tables(aov.ex4,"means"),digits=3) #report the means and the number of subjects/cell
Tables of means
Grand mean
11.8
Task
Task
Cued Free
12.8 10.8
Valence
Valence
Neg Neu Pos
11.0 12.1 12.3
Task:Valence
Valence
Task Neg Neu Pos
Cued 11.8 13.0 13.6
Free 10.2 11.2 11.0
> boxplot(Recall~Task*Valence,data=data.ex4) #graphical summary of means of the 6 cells
Mixed (between and Within) designs
>
> datafilename="http://personality-project.org/r/datasets/R.appendix5.data"
> data.ex5=read.table(datafilename,header=T) #read the data into a table
> head(data.ex5) #show the data
Obs Subject Gender Dosage Task Valence Recall
1 1 A M A F Neg 8
2 2 A M A F Neu 9
3 3 A M A F Pos 5
4 4 A M A C Neg 7
5 5 A M A C Neu 9
6 6 A M A C Pos 10
> aov.ex5 = aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+(Gender*Dosage),data.ex5 )
>
> summary(aov.ex5)
Error: Subject
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 542.3 542.3 5.685 0.0345 *
Dosage 2 694.9 347.5 3.643 0.0580 .
Gender:Dosage 2 70.8 35.4 0.371 0.6976
Residuals 12 1144.6 95.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Subject:Task
Df Sum Sq Mean Sq F value Pr(>F)
Task 1 96.33 96.33 39.862 3.87e-05 ***
Task:Gender 1 1.33 1.33 0.552 0.472
Task:Dosage 2 8.17 4.08 1.690 0.226
Task:Gender:Dosage 2 3.17 1.58 0.655 0.537
Residuals 12 29.00 2.42
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Subject:Valence
Df Sum Sq Mean Sq F value Pr(>F)
Valence 2 14.69 7.343 2.998 0.0688 .
Valence:Gender 2 3.91 1.954 0.798 0.4619
Valence:Dosage 4 20.26 5.065 2.068 0.1166
Valence:Gender:Dosage 4 1.04 0.259 0.106 0.9793
Residuals 24 58.78 2.449
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Subject:Task:Valence
Df Sum Sq Mean Sq F value Pr(>F)
Task:Valence 2 5.39 2.6944 1.320 0.286
Task:Valence:Gender 2 2.17 1.0833 0.531 0.595
Task:Valence:Dosage 4 2.78 0.6944 0.340 0.848
Task:Valence:Gender:Dosage 4 2.67 0.6667 0.327 0.857
Residuals 24 49.00 2.0417
> print(model.tables(aov.ex5,"means"),digits=3) #report the means and the number of subjects/cell
Tables of means
Grand mean
15.62963
Task
Task
C F
16.57 14.69
Valence
Valence
Neg Neu Pos
15.28 15.47 16.14
Gender
Gender
F M
17.87 13.39
Dosage
Dosage
A B C
14.19 13.50 19.19
Task:Valence
Valence
Task Neg Neu Pos
C 16.00 16.72 17.00
F 14.56 14.22 15.28
Task:Gender
Gender
Task F M
C 18.93 14.22
F 16.81 12.56
Valence:Gender
Gender
Valence F M
Neg 17.67 12.89
Neu 17.44 13.50
Pos 18.50 13.78
Task:Dosage
Dosage
Task A B C
C 14.94 14.83 19.94
F 13.44 12.17 18.44
Valence:Dosage
Dosage
Valence A B C
Neg 14.25 12.67 18.92
Neu 14.25 13.00 19.17
Pos 14.08 14.83 19.50
Gender:Dosage
Dosage
Gender A B C
F 16.56 16.67 20.39
M 11.83 10.33 18.00
Task:Valence:Gender
, , Gender = F
Valence
Task Neg Neu Pos
C 18.44 19.00 19.33
F 16.89 15.89 17.67
, , Gender = M
Valence
Task Neg Neu Pos
C 13.56 14.44 14.67
F 12.22 12.56 12.89
Task:Valence:Dosage
, , Dosage = A
Valence
Task Neg Neu Pos
C 15.00 15.00 14.83
F 13.50 13.50 13.33
, , Dosage = B
Valence
Task Neg Neu Pos
C 13.67 14.83 16.00
F 11.67 11.17 13.67
, , Dosage = C
Valence
Task Neg Neu Pos
C 19.33 20.33 20.17
F 18.50 18.00 18.83
Task:Gender:Dosage
, , Dosage = A
Gender
Task F M
C 17.22 12.67
F 15.89 11.00
, , Dosage = B
Gender
Task F M
C 18.33 11.33
F 15.00 9.33
, , Dosage = C
Gender
Task F M
C 21.22 18.67
F 19.56 17.33
Valence:Gender:Dosage
, , Dosage = A
Gender
Valence F M
Neg 16.67 11.83
Neu 16.33 12.17
Pos 16.67 11.50
, , Dosage = B
Gender
Valence F M
Neg 16.17 9.17
Neu 15.83 10.17
Pos 18.00 11.67
, , Dosage = C
Gender
Valence F M
Neg 20.17 17.67
Neu 20.17 18.17
Pos 20.83 18.17
Task:Valence:Gender:Dosage
, , Gender = F, Dosage = A
Valence
Task Neg Neu Pos
C 17.33 17.33 17.00
F 16.00 15.33 16.33
, , Gender = M, Dosage = A
Valence
Task Neg Neu Pos
C 12.67 12.67 12.67
F 11.00 11.67 10.33
, , Gender = F, Dosage = B
Valence
Task Neg Neu Pos
C 17.33 18.00 19.67
F 15.00 13.67 16.33
, , Gender = M, Dosage = B
Valence
Task Neg Neu Pos
C 10.00 11.67 12.33
F 8.33 8.67 11.00
, , Gender = F, Dosage = C
Valence
Task Neg Neu Pos
C 20.67 21.67 21.33
F 19.67 18.67 20.33
, , Gender = M, Dosage = C
Valence
Task Neg Neu Pos
C 18.00 19.00 19.00
F 17.33 17.33 17.33
> par(mfcol=c(2,1))
boxplot(Recall~Task*Valence*Gender*Dosage,data=data.ex5) #graphical summary of means of the 36 cells
boxplot(Recall~Task*Valence*Dosage,data=data.ex5) #graphical summary of means of 18 cells
多重检验
- 多重检验的由来
当需要检验的两组样本有多个变量(特征)的时候,而对一组数据进行越多的检验就越有可能在零假设为真的时候拒绝它。这是根据假设检验的逻辑直接推出的:每执行一次检验就有5%的概率发生型错误,如果进行多次检验,我们至少在一次检验中发生型错误的概率将高于5%。一般来讲,我们执行了C次独立检验,每次=5%,至少一次检验发生型错误的概率为。这就是在多次检验时候会有膨胀的现象。
举例而言,我对两组样品(暴露组跟对照组)中每一个样品测定了10000个指标,每组有10个样品,那么如果我想知道差异有多大就需要对比10000次,具体说就是10000次双样本t检验。那么如果我对t检验的置信水平设置在0.05,也就是5%假阳性,做完这10000次检验,我会期望看到500个假阳性,而这500个有显著差异的指标其实对分组不敏感也可以随机生成。假如真实测到了600个有显著差异的指标,那么如何区分其中哪些是对分组敏感?哪些又仅仅只是随机的呢?随机的会不会只有500个整呢?
这就是多重检验问题,做经典科研实验时往往会忽略,深层次的原因是经典的科研实验往往是理论或经验主导需要进行检验的假说。例如,我测定血液中白血球的数目就可以知道你是不是处于炎症中,其背后是医学知识的支撑。然而,再组学或其他高通量实验中,研究实际是数据导向的,也就是不管有用没用反正我测了一堆指标,然后就去对比差异,然后就是上面的问题了,我们可能分不清楚哪些是真的相关,哪些又是随机出现的。
- 解决办法
根据假设检验进行次数调整P value的意思是控制假阳性(Type I error)比例.
FWER校正方法:
- 单步校正
- Bonferroni 校正
当P value≤α/m时,拒绝零假设。
- Bonferroni 校正
例:当进行10000次多重假设检验时,当P value为0.05/10000 = 5e-6使,才能说有统计显著性。
- 序列校正
- 2.1 Holm's方法
FDR校正方法
- Benjamin and Hochberg FDR
- Tukey HSD Test
- 本章示例
### ANOVA and multiple comparisons ###
# Ellison et al 1996
mangrove.sponge <- read.table(paste(dataDIR, "completespongedata.txt",
sep="/"), header=T)
##postscript(file=paste(plotDIR, "mangroveData1.eps", sep="/"),
## width=4, height=3, horizontal=F)
# tikz(file=paste(plotDIRch4, "mangroveData1.tex", sep="/"),
# width=4, height=3, standAlone=F)
par(mar=c(3,3,0.5,0.5), mgp=c(1.25,0.125,0), las=1, tck=0.01)
plot(RootGrowthRate ~ Treatment, data=mangrove.sponge,
xlab="Treatment", ylab="Root Growth Rate (mm/day)", cex.axis=0.75)
qqmath(~RootGrowthRate|Treatment, data=mangrove.sponge,
panel=function(x, ...){
panel.qqmath(x, ...)
panel.qqmathline(x, ...)
panel.grid()
}, xlab="Unit Norma Quantile", ylab="Root Growth Rate (mm/day)")
> mangrove.lm <- lm(RootGrowthRate ~ Treatment, data=mangrove.sponge)
> summary.aov(mangrove.lm)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 3 4.402 1.4673 6.875 0.000411 ***
Residuals 68 14.513 0.2134
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> mangrove.lm2 <- lm(RootGrowthRate ~ Treatment*Location, data=mangrove.sponge)
> summary.aov(mangrove.lm2)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 3 4.402 1.4673 6.488 0.000759 ***
Location 3 0.806 0.2687 1.188 0.322598
Treatment:Location 9 1.043 0.1159 0.512 0.859452
Residuals 56 12.664 0.2261
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
lm.plots(mangrove.lm)
> (mangrove.aov <- aov(RootGrowthRate ~ Treatment, data=mangrove.sponge))
Call:
aov(formula = RootGrowthRate ~ Treatment, data = mangrove.sponge)
Terms:
Treatment Residuals
Sum of Squares 4.401876 14.513052
Deg. of Freedom 3 68
Residual standard error: 0.4619819
Estimated effects may be unbalanced
> (mangrove.multcomp <- TukeyHSD(mangrove.aov))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = RootGrowthRate ~ Treatment, data = mangrove.sponge)
$`Treatment`
diff lwr upr p adj
Foam-Control 0.3543571 -0.02579840 0.7345127 0.0764978
Haliclona-Control 0.4910924 0.09412788 0.8880570 0.0092735
Tedania-Control 0.6764286 0.25661700 1.0962401 0.0003895
Haliclona-Foam 0.1367353 -0.26464445 0.5381150 0.8062980
Tedania-Foam 0.3220714 -0.10191747 0.7460603 0.1979042
Tedania-Haliclona 0.1853361 -0.25378710 0.6244594 0.6836926
par(mar=c(3, 10, 3, 1), las=1)
plot(TukeyHSD(mangrove.aov))
> library(multcomp)
> q2<-glht(mangrove.aov, linfct=mcp(Treatment="Tukey"))
> summary(q2)
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Tukey Contrasts
Fit: aov(formula = RootGrowthRate ~ Treatment, data = mangrove.sponge)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
Foam - Control == 0 0.3544 0.1443 2.455 0.07620 .
Haliclona - Control == 0 0.4911 0.1507 3.258 0.00922 **
Tedania - Control == 0 0.6764 0.1594 4.244 < 0.001 ***
Haliclona - Foam == 0 0.1367 0.1524 0.897 0.80584
Tedania - Foam == 0 0.3221 0.1610 2.001 0.19731
Tedania - Haliclona == 0 0.1853 0.1667 1.112 0.68304
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)
par(mar=c(3, 10, 3, 1), mgp=c(1.25,0.125,0), las=1, tck=0.01)
plot(q2, main="95\\% family-wise confidence level")
置信水平、统计功效1-、值
- 判断所需样本量
- 计算效应值
- 评价统计功效
功效分析不仅可以帮助你判断在给定置信度和效应值的前提下所需的样本量,也能说明在给定样本量时检测到要求效应值的概率。
我的研究到底需要多少个样本呢?
假设检验告诉我们,样本量 、显著水平、功效、效应值,是相互联系的。
通常来说,研究目标是维持一个可接受的显著性水平,尽量使用较少的样本,然后最大化统计检验的功效。也就是说,最大化发现真实效应的几率,并最小化发现错误效应的几率,同时把研究成本控制在合理的范围内。
用 pwr 包做功效分析
函数 | 功效计算的对象 |
---|---|
pwr.2p.test() | 两比例(n相等) |
pwr.2p2n.test() | 两比例(n不相等) |
pwr.anova.test() | 平衡的单因素ANOVA |
pwr.chisq.test() | 卡方检验 |
pwr.f2.test() | 广义线性模型 |
pwr.p.test() | 比例(单样本) |
pwr.r.test() | 相关系数 |
pwr.t.test() | t检验(单样本、两样本、配对) |
pwr.t2n.test() | t检验(n不相等的两样本) |
假设你想评价使用手机对驾驶员反应时间的影响,则零假设为H0: μ1-μ2 = 0, μ1是驾驶员使用手机时的反应时间均值, μ2是驾驶员不使用手机时的反应时间均值(此处, μ1μ2 即感兴趣的总体参数)。假如你拒绝该零假设,备择假设或研究假设就是H1: μ1-μ2 ≠ 0。这等同于μ1 ≠ μ2,即两种条件下反应时间的均值不相等。
现挑选一个由不同个体构成的样本,将他们随机分配到任意一种情况中。第一种情况,参与者边打手机,边在一个模拟器中应对一系列驾驶挑战;第二种情况,参与者在一个模拟器中完成一系列相同的驾驶挑战,但不打手机。然后评估每个个体的总体反应时间。
假定将使用双尾独立样本t检验来比较两种情况下驾驶员的反应时间均值。如果你根据过去的经验知道反应时间有1.25 s的标准偏差,并认定反应时间1 s的差值是巨大的差异,那么在这个研究中,可设定要检测的效应值为d=1/1.25=0.8或者更大。另外,如果差异存在,你希望有90%的把握检测到它,由于随机变异性的存在,你也希望有95%的把握不会误报差异显著。这时,对于该研究需要多少受试者呢?
> pwr.t.test(d=0.8,sig.level = 0.05,power = 0.9,type ="two.sample",alternative = "two.sided")
Two-sample t test power calculation
n = 33.82555
d = 0.8
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
n为样本大小。
d为效应值,即标准化的均值之差。
sig.level表示显著性水平(默认为0.05)。
power为功效水平。
type指检验类型:双样本t检验(two.sample)、单样本t检验(one.sample)或相依样
本t检验(paired)。默认为双样本t检验。
结果表明,每组中你需要34个受试者(总共68人),这样才能保证有90%的把握检测到0.8的效应值,并且最多5%的可能性会误报差异存在。
检验各种效应值下的相关性所需的样本量曲线
# install.packages("pwr")
library(pwr)
#获取相关系数(r)和功效值(p)
r <- seq(.1,.5,.01)
nr <- length(r)
p <- seq(.4,.9,.1)
np <- length(p)
# 获取样本量大小
samsize <- array(numeric(nr*np), dim=c(nr,np))
for (i in 1:np){
for (j in 1:nr){
result <- pwr.r.test(n = NULL, r = r[j],
sig.level = .05, power = p[i],
alternative = "two.sided")
samsize[j,i] <- ceiling(result$n)
}
}
#创建图形
xrange <- range(r)
yrange <- round(range(samsize))
colors <- rainbow(length(p))
plot(xrange, yrange, type="n",
xlab="Correlation Coefficient (r)",
ylab="Sample Size (n)" )
#添加功效曲线
for (i in 1:np){
lines(r, samsize[,i], type="l", lwd=2, col=colors[i])
}
#添加网格线
abline(v=0, h=seq(0,yrange[2],50), lty=2, col="grey89")
abline(h=0, v=seq(xrange[1],xrange[2],.02), lty=2, col="gray89")
#添加注释
title("Sample Size Estimation for Correlation Studies\n
Sig=0.05 (Two-tailed)")
legend("topright", title="Power", as.character(p),
fill=colors)
在40%的置信度下,要检测到0.20的相关性,需要约75的样本量。在 90%的置信度下,要检测到相同的相关性,需要大约185个额外的观测(n=260)。做少许改动,这个方法便可以用来对许多统计检验创建样本量和功效的曲线图。
Power Analysis in R
blog_R-inaction
par(mfrow=c(2,2), mar=c(1.5, .25, 0.5, 0.), mgp=c(1.5, 0.5, 0))
z <- qnorm(1-0.05/2)
plot(seq(7, 17,,100), dnorm(seq(7,17,,100), 10, 2/sqrt(10)), type="l",
xlab="", ylab="", ylim=c(0, 1.1), axes=F)
lines(seq(7, 17,,100), dnorm(seq(7,17,,100), 12, 2/sqrt(10)), type="l",
col=gray(0.5))
abline(v=10+z*2/sqrt(10), lty=5, col=gray(0.5))
text(10, dnorm(10,10, 2/sqrt(10)), "$H_0$", adj=c(0.5,0))
text(12, dnorm(12,12, 2/sqrt(10)), "$H_a$", adj=c(0.5,0))
text(13, 0.6, "$n=10$", adj=c(0,0.5))
text(13, 0.5, "$\\sigma=2$", adj=c(0,0.5))
text(13, 0.4, "$\\alpha=0.05$", adj=c(0,0.5))
text(13, 0.75, paste("Power = ",
round(1-pnorm(10+z*2/sqrt(10), 12, 2/sqrt(10)), digit=3),
sep=""), adj=c(0,0.5))
axis(1)
plot(seq(7, 17,,100), dnorm(seq(7,17,,100), 10, 3/sqrt(10)), type="l",
xlab="", ylab="", ylim=c(0, 1.1), axes=F)
lines(seq(7, 17,,100), dnorm(seq(7,17,,100), 12, 3/sqrt(10)), type="l",
col=gray(0.5))
abline(v=10+z*3/sqrt(10), lty=5, col=gray(0.5))
#abline(v=5:10, col=1:6)
text(10, dnorm(10, 10, 3/sqrt(10)), "$H_0$", adj=c(0.5,0))
text(12, dnorm(12, 12, 3/sqrt(10)), "$H_a$", adj=c(0.5,0))
text(12.75, 0.6, "$n=10$", adj=c(0,0.5))
text(12.75, 0.5, "$\\sigma=3$", adj=c(0,0.5))
text(12.75, 0.4, "$\\alpha=0.05$", adj=c(0,0.5))
text(12.75, 0.75,
paste("Power = ", round(1-pnorm(10+z*3/sqrt(10), 12, 4/sqrt(10)), digit=3),
sep=""), adj=c(0,0.5))
axis(1)
plot(seq(7, 17,,100), dnorm(seq(7,17,,100), 10, 2/sqrt(20)), type="l",
xlab="", ylab="", ylim=c(0, 1.1), axes=F)
lines(seq(7, 17,,100), dnorm(seq(7,17,,100), 12, 2/sqrt(20)), type="l",
col=gray(0.5))
abline(v=10+z*2/sqrt(20), lty=5, col=gray(0.5))
text(10, dnorm(10, 10, 2/sqrt(20)), expression(H[0]), adj=c(0.5,0))
text(12, dnorm(12, 12, 2/sqrt(20)), expression(H[a]), adj=c(0.5,0))
text(12.75, 0.6, "$n=20$", adj=c(0,0.5))
text(12.75, 0.5, "$\\sigma=2$", adj=c(0,0.5))
text(12.75, 0.4, "$\\alpha=0.05$", adj=c(0,0.5))
text(12.75, 0.75, paste("Power = ",
round(1-pnorm(10+z*2/sqrt(20), 12, 2/sqrt(20)), digit=3),
sep=""), adj=c(0,0.5))
axis(1)
z <- qnorm(1-0.1/2)
plot(seq(7, 17,,100), dnorm(seq(7,17,,100), 10, 3/sqrt(10)), type="l",
xlab="", ylab="", ylim=c(0, 1), axes=F)
lines(seq(7, 17,,100), dnorm(seq(7,17,,100), 12, 3/sqrt(10)), type="l",
col=gray(0.5))
abline(v=10+z*3/sqrt(10), lty=5, col=gray(0.5))
text(10, dnorm(10, 10, 3/sqrt(10)), expression(H[0]), adj=c(0.5, 0))
text(12, dnorm(12, 12, 3/sqrt(10)), expression(H[a]), adj=c(0.5, 0))
text(12.75, 0.6, "$n=10$", adj=c(0,0.5))
text(12.75, 0.5, "$\\sigma=3$", adj=c(0,0.5))
text(12.75, 0.4, "$\\alpha=0.1$", adj=c(0,0.5))
text(12.75, 0.75, paste("Power = ",
round(1-pnorm(10+z*3/sqrt(10), 12, 4/sqrt(10)), digit=3),
sep=""), adj=c(0,0.5))
axis(1)
参考
统计学-假设检验(假设检验的基本流程)-20180105
(二)假设检验之——T检验
Quick-R
Non-parametric Methods
统计学习:ANOVA(方差分析)(1)
R and Analysis of Variance
高通量数据的多重检验问题
多重假设检验及其生物学应用
生物信息学中多重检验问题的一个例子(FDR、p值、q值)