一条命令行区分Contigs中的真核原核序列

本文介绍一款可用于宏基因组中的分类小软件，简单一条命令可以将上游组装的Contigs进行原核与真核生物区分~

Github地址：https://github.com/patrickwest/EukRep

image-20220226153211440

安装

Conda直接安装(python3环境)

conda create -y -n eukrep-env -c bioconda scikit-learn==0.19.2 eukrep

可以看到用到了python中的机器学习的包scikit-leran

使用pip安装

$ pip install EukRep

使用

EukRep -h

image-20220226151836569

常用参数不多：

-i: 输入fasta文件
-o 输出文件
--min 设置最短序列，默认3kb
--model : 线性SVM训练模型
--seq_names：输出序列ID名称

默认-o 输出预测真核序列

 EukRep -i <Sequences in Fasta format> -o <Eukaryote sequence output file>

加上--prokarya即可预测出原核生物的序列

EukRep -i <Sequences in Fasta format> -o <Eukaryote sequence output file> --prokarya <Prokaryote sequence output file>

获得真核生物Bins

Eukrep软件旨在用作后续Bining分析管道中的一部分，可用于获得高质量的真核生物的预测序列或者Binning，详细内容可以看“Genome-reconstruction for eukaryotes from complex natural microbial communities"（West et Al。）文中的方法部分(https://doi.org/10.1101/171355)

另外，作者也提供了一个workfolw例子：https://github.com/patrickwest/EukRep_Pipeline，有需要的可以试一下~~

一条命令行区分Contigs中的真核原核序列

安装

使用

获得真核生物Bins

推荐阅读更多精彩内容