A random rarefaction of sample reads according to a specific reads length (usually the smallest value) should be performed firstly for downstream analysis.
扩增子测序拿到OTU表之后通常会被要求进行抽平处理,这样去进行后续比较分析,测序量一致后续分析比较才有意义,但是这种方式的缺陷在于当样品测序量相差比较大时候,会造成数据的极大浪费,假设样品A测序量为3万条reads,样品B测序量10万条,抽平后样品B就会浪费7万条reads,当然抽平并不是唯一的解决途径,文献中也有通过像Deseq2这种方法去进行后续分析的,Deseq2有自己的标准化的方法,做过转录组的人应该大多都清楚,这里呢我就先说下前者--抽平的实现
Option 1 Vegan包
library(vegan)
otu = read.table('16s_OTU_Table.txt', header=T, sep="\t", quote = "", row.names=1, comment.char="",stringsAsFactors = FALSE)%>%select(-13)
colSums(otu)
otu_rare = as.data.frame(t(rrarefy(t(otu), min(colSums(otu)))))
colSums(otu_rare)
Option 2 Phyloseq包
library(phyloseq)
set.seed(123)#这种方法最好设置一个随机种子便于重复
otu1 = otu_table(otu, taxa_are_rows = T)
phyloseq = phyloseq(otu1)
#这种方法会自动去除一些低丰度的otu
rare.data = rarefy_even_depth(phyloseq,replace = TRUE)
#8OTUs were removed because they are no longer present in any sample after random subsampling
#查看抽平前后的变化
sample_sums(phyloseq)
sample_sums(rare.data)
#提取抽平后的otu表格
rare.otu = rare.data@.Data %>%
as.data.frame()
可以看到通过phyloseq方法会过滤掉一下低丰度的OTU,所以通过这种方法进行抽平的话,最好set.seed一下,便于重复.
且看下被过滤掉的这8个OTU在各样品中的值如何
otu[setdiff(rownames(otu),rownames(rare.otu)),]
en,确实蛮低的,删就删了吧!~~ 方法没有好坏,大家自主选择吧!