bam文件在进行后续分析前,需要进行排序,samtools的安装见文章:
sam文件转换为bam文件——SAMtools - 简书 (jianshu.com)
1. samtools sort 基础命令:
$ samtools sort
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-u Output uncompressed data (equivalent to -l 0)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-M Use minimiser for clustering unaligned/unplaced reads
-K INT Kmer size to use for minimiser [20]
-n Sort by read name (not compatible with samtools index command)
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--no-PG do not add a PG line
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
--write-index
Automatically index the output files [off]
--verbosity INT
Set level of verbosity
2. 排序:
默认是按序列在fasta文件中的顺序(即header)和序列从左往右的位点排序。
$ samtools sort -@8 LPF1_R1_MP.bam -o LPF1_R1_MP.sort.bam
# 查看bam文件
$ samtools view LPF1_R1_MP.sort.bam
-@8:8个线程
-o:输出文件
按read name排序:
$ samtools sort -@8 -n LPF1_R1_MP.bam -o LPF1_R1_MP.name.sort.bam
这里发现,原始的.bam文件,和.sort.bam以及.name.sort.bam文件的大小不一致,并且.sort.bam小很多,检查三个文件的行数:
$ samtools view -c LPF1_R1_MP.bam
44038570
$ samtools view -c LPF1_R1_MP.sort.bam
44038570
$ samtools view -c LPF1_R1_MP.name.sort.bam
44038570
行数一致,没有问题。常用的是默认排序,即按染色体顺序进行排序。
如果是1.9版本的SAMtools可以参考这篇文章:
//www.greatytc.com/p/6b7a442d293f
引用转载请注明出处,如有错误敬请指出。