一. 简介
Snippy是一款用于SNP检测的软件,可以通过分析得到核心SNP,进行比对构建进化树。
Snippy finds SNPs between a haploid reference genome and your NGS sequence reads. It will find both substitutions (snps) and insertions/deletions (indels). It will use as many CPUs as you can give it on a single computer (tested to 64 cores). It is designed with speed in mind, and produces a consistent set of output files in a single folder. It can then take a set of Snippy results using the same reference and generate a core SNP alignment (and ultimately a phylogenomic tree).
二. 安装
可以利用conda进行安装:
conda install -c bioconda snippy
也可以直接从Github安装最新版本(conda安装试了几次都是老版本的,找不到snippy-multi):
cd $ HOME
git clone https://github.com/tseemann/snippy.git
$HOME/snippy/bin/snippy --help
三. 运行
snippy运行常用参数包括:输出文件(--outdir),参考基因组文件(--ref ),输入文件可以是单末端(--se)或双末端(--R1,--R2)fastq文件,也可以是fasta文件(--ctgs)或bam文件(--bam),CPU数目(--cpus 默认8个)
snippy [options] --outdir <dir> --ref <ref> --R1 <R1.fq.gz> --R2 <R2.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --se <R.fq.gz> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --ctgs <contigs.fa> --cpus 10
snippy [options] --outdir <dir> --ref <ref> --bam <reads.bam> --cpus 10
具体详细参数如下:
RESOURCES
--cpus N Maximum number of CPU cores to use (default '8')
--ram N Try and keep RAM under this many GB (default '8')
--tmpdir F Fast temporary storage eg. local SSD (default '/tmp')
INPUT
--reference F Reference genome. Supports FASTA, GenBank, EMBL (not GFF) (default '')
--R1 F Reads, paired-end R1 (left) (default '')
--R2 F Reads, paired-end R2 (right) (default '')
--se F Single-end reads (default '')
--ctgs F Don't have reads use these contigs (default '')
--peil F Reads, paired-end R1/R2 interleaved (default '')
--bam F Use this BAM file instead of aligning reads (default '')
--targets F Only call SNPs from this BED file (default '')
--subsample n.n Subsample FASTQ to this proportion (default '1')
OUTPUT
--outdir F Output folder (default '')
--prefix F Prefix for output files (default 'snps')
--report Produce report with visual alignment per variant (default OFF)
--cleanup Remove most files not needed for snippy-core (inc. BAMs!) (default OFF)
--rgid F Use this @RG ID: in the BAM header (default '')
--unmapped Keep unmapped reads in BAM and write FASTQ (default OFF)
PARAMETERS
--mapqual N Minimum read mapping quality to consider (default '60')
--basequal N Minimum base quality to consider (default '13')
--mincov N Minimum site depth to for calling alleles (default '10')
--minfrac n.n Minumum proportion for variant evidence (0=AUTO) (default '0')
--minqual n.n Minumum QUALITY in VCF column 6 (default '100')
--maxsoft N Maximum soft clipping to allow (default '10')
--bwaopt F Extra BWA MEM options, eg. -x pacbio (default '')
--fbopt F Extra Freebayes options, eg. --theta 1E-6 --read-snp-limit 2 (default '')
也可以利用snippy-multi生成shell脚本文件批量执行,snippy-multi输入文件包括:
snippy-multi abc.txt --reference ref.gbk --cpus 10 > run_snp.sh
nohup ./run_snp.sh &
- 文件名和路径列表文件,格式如下:
abc.txt:
a /Absolute path/a.fq.gz
b /Absolute path/b.fq.gz
c /Absolute path/c.fq.gz
...
- 参考序列文件,可以是fasta文件,也可以是gbk文件。
- 需要分析的fq文件或fasta文件。
eg: more run_snp.sh
snippy --outdir a --ref ref.fas --se a.fq.gz --cpus 10
snippy --outdir b --ref ref.fas --se b.fq.gz --cpus 10
snippy --outdir c --ref ref.fas --se c.fq.gz --cpus 10
...
snippy-core --ref 'a/ref.fa' a b c ...
得到的run_snp.sh脚本是逐个执行,如果服务器性能好可以对脚本进行修改,在snippy命令行加上:nohup &,同时运行多个snippy命令;等所有snippy运行完后在单独执行snippy-core 命令。
上述命令运行完之后,再执行以下命令构建进化树:
nohup snippy-clean_full_aln core.full.aln > clean.full.aln &
nohup run_gubbins.py -p gubbins clean.full.aln & # 报错可以调整--filter_percentage 50
nohup snp-sites -c gubbins.filtered_polymorphic_sites.fasta > clean.core.aln &
nohup FastTree -gtr -nt clean.core.aln > clean.core.tree &
其中,snippy,snippy-core,snippy-multi,snippy-clean_full_aln命令可以在~ /snippy/bin/目录下找到,snp-sites命令在~/snippy/binaries/linux/目录下,run_gubbins.py需要另外安装gubbins(conda install -c bioconda gubbins),如果找不到FastTree也需另外安装。