Verkko是一个用于实现端粒到端粒(telomere to telomere, T2T)基因组组装的新工具。
-
Rautiainen, M., Nurk, S., Walenz, B.P. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol (2023)
如上图所示,流程关键组件包括Canu、MBG、GraphAligner和Rukki,这些组件的整合使得Verkko可以实现自动流程处理输入的三代测序数据,最终获得高连续、高准确率的单倍体分型基因组,高质量三代测序数据的输入可获得T2T组装水平基因组。
#安装,环境要求python=3.7
conda install -c conda-forge -c bioconda -c defaults verkko
#运行,此处以单独hifi测序数据为例
verkko -d /home/verkko_assemb --hifi hifiseq_data.fasta --no-nano --threads 30
##如果为ONT或PacBio HiFi数据则不添加--no-nano
#verkko参数
MANDATORY PARAMETERS:
-d <output-directory> Directory to use for verkko intermediate and final results.
Will be created if needed.
--hifi <files ...> List of files containing PacBio HiFi reads.
--nano <files ...> List of files containing Oxford Nanopore reads.
Input reads can be any combination of FASTA/FASTQ,
uncompressed or gzip/bzip2/xz compressed. Any
number of files can be supplied; *.gz works.
ALGORITHM PARAMETERS:
--no-correction Do not perform Canu correction on the HiFi reads.
--no-nano Assemble without ONT data.
--hap-kmers h1 h2 type Use rukki to assign paths to haplotypes. 'h1' and 'h2
must be Meryl databases of homopolymer-compressed parental
kmers. 'type' must be 'trio', 'hic' or 'strandseq'.
--base-k
--max-k
--window
--threads
--split-bases
--split-reads
--min-ont-length
--correct-k-mer-size
--correct-mer-threshold
--correct-min-read-length
--correct-min-overlap-length
--correct-hash-bits
--seed-min-length
--seed-max-length
--align-bandwidth
--score-fraction
--min-identity
--min-score
--end-clipping
--incompatible-cutoff
--max-trace
COMPUTATIONAL PARAMETERS:
--python <interpreter> Path or name of a python interpreter. Default: 'python'.
--mbg <path> Path to MBG. Default for both is the
--graphaligner <path> Path to GraphAligner. one packaged with verkko.
--cleanup Remove intermediate results.
--no-cleanup Retain intermediate results (default).
--local Run on the local machine (default).
--local-memory Specify the upper limit on memory to use, in GB, default 64
--local-cpus Specify the number of CPUs to use, default 'all'
--sge Enable Sun Grid Engine support.
--slurm Enable Slurm support.
--lsf Enable IBM Spectrum LSF support.
--snakeopts <string> Append snakemake options in "string" to the
snakemake command. Options MUST be quoted.
--sto-run Set resource limits for various stages.
--mer-run Format: number-of-cpus memory-in-gb time-in-hours
--ovb-run --cns-run 8 32 2
--ovs-run
--red-run
--mbg-run
--utg-run
--spl-run
--ali-run
--pop-run
--utp-run
--lay-run
--sub-run
--par-run
--cns-run```