今天的随笔主要涉及运用DESeq2对基因进行差异分析,先上代码:
# 差异表达 -----------------------------------------------------------------------------------
BiocManager::install("DESeq2")
library(DESeq2)
##这里导入了mRNA或miRNA的矩阵--------------------------------------------------------------------
mRNA_exp=readRDS("TCGA_THCA_mRNA.rds")
miRNA_exp <- data.frame(mirCounts_tcgabiolinks)
##1.区分癌和正常组织-----------------------------------------------------------------------------
metadata_RNA <- data.frame(sample=ifelse(substring(names(mRNA_exp),14,15)<10,"cancer","normal"))
##2.添加样本因子---------------------------------------------------------------------------------
metadata_RNA$sample <- as.factor(metadata_RNA$sample)
##3.用已知列名命名行名---------------------------------------------------------------------------
rownames(metadata_RNA)=names(mRNA_exp)
##4.列出癌与正常组织数---------------------------------------------------------------------------
table(metadata_RNA$sample)
##5.表达矩阵的标准化-----------------------------------------------------------------------------
mycounts <- mRNA_exp
dds <-DESeqDataSetFromMatrix(countData=mycounts,
colData=metadata_RNA,
design=~sample)
##6.表达差异分析(DESeq)------------------------------------------------------------------------
DESeq_mRNA_dds <- DESeq(dds)
##7.保存差异分析文件-----------------------------------------------------------------------------
save(DESeq_mRNA_dds,file = "TCGA_mRNA_DESeq.Rdata")
load("TCGA_mRNA_DESeq.Rdata")
##8.得到差异分析结果-----------------------------------------------------------------------------
res_RNA=results(DESeq_mRNA_dds, contrast=c("sample","cancer","normal"),tidy=TRUE)
DEseq_diff=subset(res_RNA,padj<0.05&abs(log2FoldChange)>1)
##前者为分子,后者为分母
那么,当我们在完成了mRNA的差异分析后,ID的命名我们看到是如下情况:
mRNA差异分析.png
rowname的代码是以ENSG开头的,这里我们就要提一下EMBL数据库命名方式(http://asia.ensembl.org/index.html),在网站内能够查到一对一的对应关系,但是,当遇到很多ID的转换时,我们就要考虑一下如何借助R语言完成了。
# KEGG ---------------------------------------------------------------------------------------------------------
##01 安装 clusterProfiler和org.Hs.eg.db---------------------------------------------------------------------
BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db")
##02 加载clusterProfiler和org.Hs.eg.db----------------------------------------------------------------------
library(clusterProfiler)
library(org.Hs.eg.db)
##03 将ENSEMBL转换为SYMBOL
eg = bitr(res_RNA$row, fromType="ENSEMBL", toType=c("ENTREZID","SYMBOL"), OrgDb="org.Hs.eg.db")
write.csv(eg,"res_RNA.csv")
##04 运用KEGG富集相关功能基因-------------------------------------------------------------------------------
kk <- enrichKEGG(gene = eg$ENTREZID,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk)
##05 富集功能作图-------------------------------------------------------------------------------------------
library(enrichplot)
barplot(kk, showCategory=20)
dotplot(kk, showCategory=20)
以上是通过clusterProfiler和org.Hs.eg.db包完成的ID转换。