有人说生命科学的所有问题的答案都应该到进化里寻找,还有人说不懂进化的生命科学家是耍流氓。
系统发育树或进化树是一种常用来表示物种宗谱亲缘关系的树状结构图。在分子水平,亲缘关系的远近通常用DNA(或protein)序列的差异来表示。构建发育树的算法非常多,具体参考参考文献中的文章。
一、Distance-based methods
基于距离的方法的最主要的原则是利用序列的遗传距离来构建发育树,遗传距离矩阵通常假设两个物种在进化上分离后每个核苷酸有不同的替换率,以此来推断物种的亲缘关系。这种方法的缺点就是真实的遗传距离是未知的,不管用什么方法都会引入噪音。
这种方法常用的算法有unweighted pair group method with arithmetic means(UPGMA)和Neighbor-joining(NJ)。
1、cluster analysis:UPGMA
UPGMA假设一个不随时间和物种谱系关系变化的常数替换率。(由于这个假设的局限性,这种方法已经很少使用了)。该方法首先计算两个遗传距离最小的物种(A和B)并聚类成一个新的单元(OTU)—AB,然后计算与新单元AB最小的物种,在聚类成另一个单元,这个过程一直持续下去,直达只剩两个OTU。
2、neighbor joining
这种方法非常快,所以在大型数据中使用比较广泛,也是一种非常常用的建树方法。
该算法起始于一棵总的无序的树(a totally unresolved tree),分别比较成对的序列,如果一对序列使总的树枝长度最小化,就将这对序列合成一个OTU,然后形成部分有序的树,这个过程一直继续,直到只剩三个OTU。BIONJ,FASTME,Neighbor-Net都是修改版的NJ。
二、Maximum Parsimony
最大简约原则是构建分子进化树比较流行的方法。该算法的目标是寻找到一个可以解释观察到的字符分布状态的最简约的拓扑结构。一般最简约的拓扑结构都隐含了最少的转换事件,比如核苷酸替换等,所以大多数最简约的树都可以比较准确地反应谱系关系。
三、Probabilistic Methods of Phylogenetic Inference
总的来说,该方法就是应用最大似然法来构建发育树。
在所有可能的树中寻找一棵能解释观察到数据的概率最大的树。
四、构建系统发育树软件集合
Functionality | Title | Homepage | Main features | Availability |
---|---|---|---|---|
Alignment | ClustalW | http://www.ebi.ac.uk/Tools/clustalw/index.html | Outdated; GUI (ClustalX) | Free binaries and source code; web service |
Alignment | Dialign | http://dialign.gobics.de/ | Accurate | Source code; web service |
Alignment | MAFFT | http://align.bmr.kyushu-u.ac.jp/mafft/software/ | Fast and accurate | Free binaries and source code; web service |
Alignment | MUSCLE | http://www.drive5.com/muscle/ | Fast and accurate | Free binaries and source code; web service |
Alignment | POA | http://bioinfo.mbi.ucla.edu.poa | Fast and accurate | Source code; web service |
Tree reconstruction | BEAST | http://beast.bio.ed.ac.uk/ | Bayesian analysis under a molecular clock | Free binaries and source code |
Tree reconstruction | FastME | http://atgc.lirmm.fr/fastme/ | Very fast distance method | Free binaries and source code; web service |
Tree reconstruction | GARLI | http://www.bio.utexas.edu/faculty/antisense/garli/Garli.html | Very fast ML program | Free binaries and source code |
Tree reconstruction | IQPNNI | http://www.cibiv.at/software/iqpnni/ | Fast ML program | Free binaries and source code |
Tree reconstruction | Leaphy | http://www.bioinf.manchester.ac.uk/leaphy/Leaphy.htm | Fast ML program | Free binaries |
Tree reconstruction | MEGA | http://www.megasoftware.net/ | Distance and MP methods; GUI | Free binaries (Windows only) |
Tree reconstruction | MrBayes | http://sourceforge.net/projects/mrbayes/ | Bayesian analysis | Free binaries and source code |
Tree reconstruction | PAUP* | http://paup.csit.fsu.edu/downl.html | Rich set of methods; GUI for non-Intel-based Mac only | Commercial license |
Tree reconstruction | PHYLIP | http://evolution.genetics.washington.edu/phylip/getme.html | Rich set of methods | Free binaries and source code; some functionality as web service |
Tree reconstruction | PHYML | http://atgc.lirmm.fr/phyml/ | Fast ML program | Free binaries; web service |
Tree reconstruction | POY | http://research.amnh.org/scicomp/projects/poy.php | Direct optimization of unaligned sequences | Free binaries and source code |
Tree reconstruction | RAxML | http://www.kramer.in.tum.de/exelixis/software.html | Very fast ML program | Free binaries and source code; web service |
Tree reconstruction | TNT | http://www.zmuc.dk/public/phylogeny/TNT/ | Very fast MP program; GUI for Windows only | Commercial license and free test versions |
Tree reconstruction | Treefinder | http://www.treefinder.org/ | Fast ML program; GUI | Free binaries |
Network reconstruction | SplitsTree | http://www.splitstree.org/ | Rich set of methods; GUI | Free binaries |
Network reconstruction | T-Rex | http://www.labunix.uqam.ca/~makarenv/trex.html | Constructs reticulation networks; GUI | Free binaries (Windows only); web service |
Viewing and editing trees | Dendroscope | http://www-ab.informatik.uni-tuebingen.de/software/dendroscope/welcome.html | Suitable for very large trees; GUI | Free binaries |
Viewing and editing trees | FigTree | http://tree.bio.ed.ac.uk/software/figtree | GUI | Free binaries |
Viewing and editing trees | Njplot | http://pbil.univ-lyon1.fr/software/njplot.html | GUI | Free binaries and source code |
Viewing and editing trees | TreeView | http://darwin.zoology.gla.ac.uk/~rpage/treeviewx/download.html | GUI | Free binaries and source code |
Miscellaneous | MacClade | http://macclade.org/ | Reconstruction of character evolution; GUI | Commercial license (Mac only) |
Miscellaneous | Mesquite | http://mesquiteproject.org/ | Testing of various evolutionary hypotheses; GUI | Free binaries; web service |
Miscellaneous | Modeltest | http://darwin.uvigo.es/software/modeltest.html | Determines best model in an ML framework | Free binaries and source code; web service |
Miscellaneous | PAML | http://abacus.gene.ucl.ac.uk/software/paml.html | Testing of various evolutionary hypotheses | Free binaries and source code |
参考文献
1、https://www.sciencedirect.com/science/article/pii/B9780444521491000124