简介
简单多线程快速计算同源基因对kaks
依赖工具
- ParaAT2.0
- KaKs_Calculator2.0
ParaAT 使用说明
export PATH=/storage_wut/user/software/ParaAT2.0:$PATH
cd /storage_wut/user/software/ParaAT2.0
ParaAT.pl -h test.homologs -n test.cds -a test.pep -p proc -o output -f axt
--------------------------------
-h, 指定同源基因列表文件
-n, 指定核酸序列文件
-a, 指定蛋白序列文件
-p, 指定多线程文件 ## 文件中给定线程数,默认为6
-m, 指定比对工具 ## muscle
-g, 去除比对有gap的密码子
-k, 用KaKs_Calculator ## 计算kaks值
-o, 输出结果的目录
-f, 输出比对文件的格式
计算 kaks
echo start at time `date +%F' '%H:%M:%S`
export PATH=/storage_wut/user/software/ParaAT2.0:$PATH
export PATH=/storage_wut/user/software/KaKs_Calculator2.0/bin/Linux/:$PATH
cd /storage_wut/user/project/06lumeng_project/19.homologs_kaks/01.kaks
ParaAT.pl -h ../00.data/A_CC.collinearity_one2one.dat -n ../00.data/homo.gene.cds.fa -a ../00.data/homo.gene.pep.fa -p proc -m muscle -f axt -g -k -o result_dir
cat ./result_dir/*kaks |awk 'NR==1;NR>=1 { print $0| "grep -v Sequence"}' > ../all.kaks.result.xls
less all.kaks.result.xls |cut -f 5|grep -v 'NA' > kaks.list
echo finish at time `date +%F' '%H:%M:%S`
### all.kaks.result.xls 文件格式
Sequence Method Ka Ks Ka/Ks P-Value(Fisher) Length S-Sites N-Sites Fold-Sites(0:2:4) Substitutions S-Substitutions N-Substitutio
Cg-F_10146-gene7838 MA 0.0194491 0.172237 0.112921 6.96313e-06 303 67.5573 235.443 NA 14 10.0464 3.95362 NA
Cg-F_11450-gene46992 MA 0.018447 0.18238 0.101146 8.74657e-22 1335 376.13 958.87 NA 75 59.6254 15.3746 NA NA
Cg-F_11533-gene3021 MA 0.0364833 0.133713 0.272848 3.03892e-07 984 254.578 729.422 NA 56 31.4295 24.5705 NA
Cg-F_11705-gene4507 MA 0.043183 0.281557 0.153372 5.71615e-10 450 99.3644 350.636 NA 37 24.007 12.993 NA
Cg-F_11829-gene26952 MA 0.0670496 0.195014 0.343819 0.000123585 528 128.586 399.414 NA 47 22.7275 24.2725 NA
Cg-F_12075-gene67778 MA 0.163755 0.446331 0.366892 4.00233e-08 510 129.087 380.913 NA 96 46.0956 49.9044 NA
Cg-F_12095-gene37099 MA 0.0459748 0.131137 0.350585 3.28611e-05 1056 236.285 819.715 NA 64 28.8778 35.1222 NA
Cg-F_12212-gene32496 MA 0.0351454 0.113734 0.309015 0.000255903 639 182.649 456.351 NA 34 19.1865 14.8135 NA
Cg-F_12217-gene33956 MA 0.0545515 0.128713 0.423823 0.00831507 552 132.318 419.682 NA 37 15.7831 21.2169 NA
绘制 kaks 条形图
rm(list = ls())
library(ggplot2)
windowsFonts(myFont = windowsFont("Times New Roman"))
setwd("D:\\gooagle_data\\work_r\\kaks")
data <- read.table("kaks.list",sep='\t')
ggplot(data,aes(V1))+ geom_histogram(color='#39A0FE',fill='#39A0FE', binwidth = 0.5)
ggplot(data,aes(V1))+ geom_histogram(fill='#39A0FE', binwidth = 0.03,color='white')+
ylab(label = 'Number of gene pair')+xlab(label = 'ka/ks')+theme_classic()+
theme(axis.title = element_text(size=20),axis.text = element_text(size = 18,color = "black"))+
scale_x_continuous(limits = c(-0.1,5),breaks=c(0,1,2,3,4,5))