要复现的原图来自文章:https://www.sohu.com/a/242022054_278730
图的意义不多说,找数据对作者来说是一件十分恐怖的事情,所以就以两个组合的差异表达结果做简单演示。例如A组合的差异表达结果,如下图:
image.png
B组合与上图类似,为了表示区分,B文件中log2FC列的名字改为log2fold。
第1步:读入两个文件,并合并。
> A<-read.table(file="A.exp.xls",header=T,comment.char = "",stringsAsFactors = F,check.names = F)
> B<-read.table(file="B.exp.xls",header=T,comment.char = "",stringsAsFactors = F,check.names = F)
> tmpA <-A[,c("#ID","log2FC")]
> tmpB <-B[,c("#ID","log2fold")]
> colnames(tmpA)<-c("ID","log2FC")
> colnames(tmpB)<-c("ID","log2fold")
> merge<-merge(tmpA,tmpB,by=c("ID"))
> head(merge)
ID log2FC log2fold
1 ENSMUSG00000000001 0.01938984 0.0406943
2 ENSMUSG00000000028 -0.24899079 -0.7511275
3 ENSMUSG00000000037 -0.24278368 -0.3645860
4 ENSMUSG00000000049 -0.22920711 0.4679781
5 ENSMUSG00000000056 0.19978973 0.6480403
6 ENSMUSG00000000058 -0.48370072 -0.7290207
第2步:为图中的每个点(基因)赋颜色
> merge$color<-"black"
> merge$color[merge$log2FC>=0.5 & merge$log2fold>=0.5]<-"red"
> merge$color[merge$log2FC>=0.5 & merge$log2fold<=-0.5] <-"green"
> merge$color[merge$log2FC<=-0.5 & merge$log2fold>=0.5] <-"blue"
> merge$color[merge$log2FC<=-0.5 & merge$log2fold<=-0.5] <-"purple"
> head(merge)
ID log2FC log2fold color
1 ENSMUSG00000000001 0.01938984 0.0406943 black
2 ENSMUSG00000000028 -0.24899079 -0.7511275 black
3 ENSMUSG00000000037 -0.24278368 -0.3645860 black
4 ENSMUSG00000000049 -0.22920711 0.4679781 black
5 ENSMUSG00000000056 0.19978973 0.6480403 black
6 ENSMUSG00000000058 -0.48370072 -0.7290207 black
> mycol<-merge$color
第3步:计算相关性系数
比方说现在我用log2FC和log2fold做Pearson相关性系数。
> test<-cor.test(merge$log2FC,merge$log2fold)
> test
Pearson's product-moment correlation
data: merge$log2FC and merge$log2fold
t = 100.38, df = 11147, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6791772 0.6986788
sample estimates:
cor
0.6890527
> pearson<-test$estimate
pearson得到的值是0.6890527
第四步:画图
基础绘图函数plot,画散点(pch=20)并缩放点的大小(cex=.5)
点的颜色(col=mycol)
图片标题(main="four quandrant")
图片子标题(sub="sub title")
横坐标的名字(xlab="A expression log2FC")
纵坐标的名字(ylab="B expression log2fold"))
> plot(merge$log2FC,merge$log2fold,pch=20,cex=.5,col=mycol,main="four quandrant",sub="sub title",xlab="expression log2FC",ylab="methylation level log2fold")
image.png
在图上加上横线和纵线以区分象限
四象限就用这个命令:abline(h=0,v=0,lty=1)
九象限就用这个命令:abline(h=c(0.5,-0.5),v=c(0.5,-0.5),lty=c(2,2))
我都加上了
> abline(h=c(0.5,0,-0.5),v=c(0.5,0,-0.5),lty=c(2,1,2))
image.png
加图例,位置在图片左上角
因为是散点图,所以图例也要是散点图表示,所以pch=20
图例中散点的颜色要和plot时点的颜色一一对应,搞定col向量
图例中每个颜色的点分别代表什么,设置legend,和col向量的顺序一致,Pearson相关性系数以legend的title的形式出现就可以居中了
> legend(x="topleft",pch=20,col=c("black","red","green","blue","purple"),legend =c("-0.5<log2FC<0.5 && -0.5<log2fold<0.5","log2FC>=0.5 && log2fold>=0.5","log2FC>=0.5 && log2fold<=-0.5","log2FC<=-0.5 && log2fold>=0.5","log2FC<=-0.5 && log2fold<=-0.5"),title=paste0("R(Pearson)=",pearson))
image.png
小白写在最后,图片可丑,有的可调,但是小白累了。再见~