因为我用的GTF文件较新,为realease101版本,因此需要自己生成转换文件需要的TxDb文件,首先需要安装GenomicFeatures包,然后直接在线应用命令进行数据库建立TxDb文件,因为我的表达矩阵的行名为ensemble,因此我直接从官网得到数据库文件
library(GenomicFeatures)#载入包
txdb <- makeTxDbFromEnsembl(organism="Homo sapiens",#设定种族
release=101,设定版本
server="ensembldb.ensembl.org")#审定服务器地址
saveDb(txdb, file='txdbensemble101.sqlite')#保存生存的TXDB文件
后续转换的时候可以直接应用Github上别人写的包,我直接下载。下载的地址为https://codeload.github.com/mapawlak/normalizeGeneCounts/zip/master
下载下来后解压,本地直接source加载后就可以了
TxDb <- loadDb(file='txdbensemble101.sqlite')#载入参考数据集
counts <- read.table("all.id.txt",#读入原始counts文件
header = T,#第一行为列名
row.names = 1,#第一列为行名
check.names = F)#保持原文名称
RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")
source("normalizeGeneCounts.R")#载入本地脚本
#以下分别生成校准后文件
RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")
TPM <- normalizeGeneCounts(counts, TxDb, method = "TPM")
CPM <- normalizeGeneCounts(counts, TxDb, method = "CPM")
save(CPM,RPKM,TPM,file = "normal.Rdata")#保持到本地
以下命令是normalizeGeneCounts作者的说明文件中的命令
# normalizeGeneCounts
## Description
Function takes raw counts of NGS data and normalize to [CPM, RPKM or TPM](https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/).
## Usage
normalizeGeneCounts(counts, TxDb, method)
## Arguments
*counts* A data frame with gene ID in rownames and sample ID in colnames
*TxDb* `GenomicFeatures` object created from the same gtf as counts object
*method* Should be "CPM", "RPKM" or "TPM"
## Example
CPM <- normalizeCounts(counts, TxDb, method = "CPM")
以下命令是作者R文件的原始代码
normalizeGeneCounts <- function(counts, TxDb, method) {
require("GenomicFeatures")
length <- width(genes(TxDb))/10 ^ 3 #gene length per kb
if (method == "CPM") {
normalized <-
as.data.frame(apply(counts,2, function(x) (x/sum(x)) * 10 ^ 6))
} else if (method == "RPKM") {
normalized <-
as.data.frame(t(apply(counts, 1, "/", colSums(counts) / 10 ^ 6)) / length)
} else if (method == "TPM") {
normalized <-
as.data.frame(t(apply(
counts / length, 1, "/", colSums(counts / length) / 10 ^ 6
)))
} else {
stop("method must be CPM, RPKM or TPM")
}
return(normalized)
}