PartitionFinder2的Lunix安装与使用 2021-07-23

PartitionFinder官网
http://www.robertlanfear.com/partitionfinder/

图片.png

点击DOWNLOAD进入github（PartitionFinder）

1.下载：

1）下载后上传服务器。

图片.png

2) 直接下载到服务器

可以使用git clone 和wget命令下载

git clone https://github.com/brettc/partitionfinder.git
wget https://codeload.github.com/brettc/partitionfinder/tar.gz/refs/tags/v2.1.1

2.环境准备

图片.png

partitionfinder使用前准备Python2环境以及依赖包。python3是不行的。
现在一般都是Python3，所以我单独创建一个环境。

#创建Python2.7的环境
conda create -n partitionfinder python=2.7
#激活该环境
source activate partitionfinder
#安装依赖包
conda install numpy pandas pyparsing scipy 
pip install -U scikit-learn
pip install tables
#参考//www.greatytc.com/p/855bda1fb2c3

3.使用测试

进入环境后，看看是否能输出help。

source activate partitionfinder
python PartitionFinder.py -h

(partitionfinder) animal1@animalia:/apps/partitionfinder-2.1.1$ python PartitionFinder.py -h
INFO     | 2021-07-23 16:45:25,998 | Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not s
INFO     | 2021-07-23 16:45:25,998 | NumExpr defaulting to 8 threads.
Usage: python PartitionFinder.py [options] <foldername>

    PartitionFinder and PartitionFinderProtein are designed to discover optimal
    partitioning schemes for nucleotide and amino acid sequence alignments.
    They are also useful for finding the best model of sequence evolution for datasets.

    The Input: <foldername>: the full path to a folder containing:
        - A configuration file (partition_finder.cfg)
        - A nucleotide/aa alignment in Phylip format
    Take a look at the included 'example' folder for more details.

    The Output: A file in the same directory as the .cfg file, named
    'analysis' This file contains information on the best
    partitioning scheme, and the best model for each partiiton

    Usage Examples:
        >python PartitionFinder.py example
        Analyse what is in the 'example' sub-folder in the current folder.

        >python PartitionFinder.py -v example
        Analyse what is in the 'example' sub-folder in the current folder, but
        show all the debug output

        >python PartitionFinder.py -c ~/data/frogs
        Check the configuration files in the folder data/frogs in the current
        user's home folder.

        >python PartitionFinder.py --force-restart ~/data/frogs
        Deletes any data produced by the previous runs (which is in
        ~/data/frogs/output) and starts afresh


Options:
  -h, --help            show this help message and exit
  -v, --verbose         show debug logging information (equivalent to --debug-
                        out=all)
  -c, --check-only      just check the configuration files, don't do any
                        processing
  -f, --force-restart   delete all previous output and start afresh (!)
  -p N, --processes=N   Number of concurrent processes to use. Use -1 to match
                        the number of cpus on the machine. The default is to
                        use -1.
  --show-python-exceptions
                        If errors occur, print the python exceptions
  --save-phylofiles     save all of the phyml or raxml output. This can take a
                        lot of space(!)
  --dump-results        Dump all results to a binary file. This is only of use
                        for testing purposes.
  --compare-results     Compare the results to previously dumped binary
                        results. This is only of use for testing purposes.
  -q, --quick           Avoid anything slow (like writing schemes at each
                        step),useful for very large datasets.
  -r, --raxml           Use RAxML (rather than PhyML) to do the analysis. See
                        the manual
  -n, --no-ml-tree      Estimate a starting tree with NJ (PhyML) or MP (RaxML)
                        instead of the default which is to estimate a starting
                        tree with ML  using in RAxML. Not recommended.
  --cmdline-extras=N    Add additional commands to the phyml or raxml
                        commandlines that PF uses.This can be useful e.g. if
                        you want to change the accuracy of lnL calculations
                        ('-e' option in raxml), or use multi-threaded versions
                        of raxml that require you to specify the number of
                        threads you will let raxml use ('-T' option in raxml.
                        E.g. you might specify this: --cmndline_extras ' -e
                        2.0 -T 10 ' N.B. MAKE SURE YOU PUT YOUR EXTRAS IN
                        QUOTES, and only use this command if you really know
                        what you're doing and are very familiar with raxml and
                        PartitionFinder
  --weights=N           Mainly for algorithm development. Only use it if you
                        know what you're doing.A list of weights to use in the
                        clustering algorithms. This list allows you to assign
                        different weights to: the overall rate for a subset,
                        the base/amino acid frequencies, model parameters, and
                        alpha value. This will affect how subsets are
                        clustered together. For instance: --cluster_weights
                        '1, 2, 5, 1', would weight the base freqeuncies 2x
                        more than the overall rate, the model parameters 5x
                        more, and the alpha parameter the same as the model
                        rate
  --kmeans=type         This defines which sitewise values to use: entropy or
                        tiger  --kmeans entropy: use entropies for sitewise
                        values --kmeans tiger: use TIGER rates for sitewise
                        values (only valid for Morphology)
  --rcluster-percent=N  This defines the proportion of possible schemes that
                        the relaxed clustering algorithm will consider before
                        it stops looking. The default is 10%. e.g. --rcluster-
                        percent 10.0
  --rcluster-max=N      This defines the number of possible schemes that the
                        relaxed clustering algorithm will consider before it
                        stops looking. The default is to look at the larger
                        value out of 1000, and 10 times the number of data
                        blocks you have. e.g. --rcluster-max 1000
  --min-subset-size=N   This defines the minimum subset size that the kmeans
                        and rcluster algorithm will accept. Subsets smaller
                        than this  will be merged at with other subsets at the
                        end of the algorithm (for kmeans) or at the start of
                        the algorithm (for rcluster). See manual for details.
                        The default value for kmeans is 100. The default value
                        for rcluster is to ignore this option. e.g. --min-
                        subset-size 100
  --debug-output=REGION,REGION,...
                        (advanced option) Provide a list of debug regions to
                        output extra information about what the program is
                        doing. Possible regions are 'all' or any of {subset,su
                        bset_ops,raxml,parser,model_util,results,entropy,numex
                        pr,alignment,concurrent.futures,threadpool,numexpr.uti
                        ls,progress,main,config,reporter,kmeans,util,concurren
                        t,morph_tige,analysis_m,neighbour,scheme,submodels,dat
                        abase,analysis,phyml,raxml_mode,model_load,phyml_mode,
                        sklearn}.
  --all-states          In the kmeans and rcluster algorithms, this stipulates
                        that PartitionFinder should not produce subsets that
                        do not have all possible states present. E.g. for DNA
                        sequence data, all subsets in the final scheme must
                        have A, C, T,  and G nucleotides present. This can
                        occasionally be useful for downstream  analyses,
                        particularly concerning amino acid datasets.
  --profile             Output profiling information after running (this will
                        slow everything down!)

3.使用方法

1）准备序列矩阵文件以及配置文件

准备一个文件夹下包含phy文件和cfg。
phy是序列矩阵信息，cfg是配置文件

图片.png

partition_finder.cfg文件内部，一般需要改动的就是序列矩阵文件，分区情况。其他设置可以摸索试过以后固定使用。
参考：https://bin-ye.com/post/2019/10/19/%E5%A5%BD%E5%A5%BD%E5%85%88%E7%94%9F-mrbayes-%E6%93%8D%E4%BD%9C%E8%AF%B4%E6%98%8E/

## ALIGNMENT FILE 序列矩阵文件##
alignment = Acan.phy;

## BRANCHLENGTHS: linked | unlinked （一般）##
branchlengths = unlinked;

## MODELS OF EVOLUTION: all | allx | mrbayes | beast | gamma | gammai | <list> ##
models = mrbayes;

# MODEL SELECCTION: AIC | AICc | BIC #
model_selection = bic;

## DATA BLOCKS: see manual for how to define （分区情况）##
[data_blocks]
atp6 = 1-107;
cox1 = 108-566;
cox2 = 567-731;
cox3 = 732-870;
cytb = 871-1182;
nad1 = 1183-1382;
nad2 = 1383-1523;
nad3 = 1524-1574;
nad4L = 1575-1599;
nad4 = 1600-1818;
nad5 = 1819-2055;
nad6 = 2056-2087;

## SCHEMES, search: all | user | greedy | rcluster | rclusterf | kmeans ##
[schemes]
search = greedy;

2）运行

序列矩阵文件以及配置文件置于一文件夹下

(partitionfinder) animal1@animalia:~/Documents/20210723_MB/PartitionFinder$ l
Acan.phy  partition_finder.cfg
(partitionfinder) animal1@animalia:~/Documents/20210723_MB/PartitionFinder$ cd ../
(partitionfinder) animal1@animalia:~/Documents/20210723_MB$ l
PartitionFinder/

运行方式：
python <$PartitionFinder文件路径/PartitionFinder.py> <序列矩阵文件以及配置文件的文件夹>
注意！
氨基酸序列分析使用PartitionFinderProtein.py
核苷酸序列分析使用PartitionFinder.py
我这里使用氨基酸序列进行分析。

python /apps/partitionfinder-2.1.1/PartitionFinderProtein.py full_path/PartitionFinder

主要的可用运行结果在analysis/schemes/start_scheme.txt 文件中 MrBayes 中各分区的适用模型。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 216,470评论 6赞 501
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 92,393评论 3赞 392
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 162,577评论 0赞 353
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 58,176评论 1赞 292
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 67,189评论 6赞 388
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 51,155评论 1赞 299
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 40,041评论 3赞 418
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,903评论 0赞 274
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 45,319评论 1赞 310
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,539评论 2赞 332
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,703评论 1赞 348
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 35,417评论 5赞 343
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 41,013评论 3赞 325
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,664评论 0赞 22
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,818评论 1赞 269
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,711评论 2赞 368
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,601评论 2赞 353

PartitionFinder2的Lunix安装与使用 2021-07-23

1.下载：

1）下载后上传服务器。

2) 直接下载到服务器

2.环境准备

3.使用测试

3.使用方法

1）准备序列矩阵文件以及配置文件

2）运行

推荐阅读更多精彩内容