引言:以下学习笔记主要参考《一文学会常规转录组分析》“//www.greatytc.com/p/bdeebd669eb8”
1.数据获取及质控
提前安装Stratooklit、Prefetch、Aspera、Fastaqc、Multiqc
创建下载数据记录号的文件
#cat dir_6.txt
SRR3286802
SRR3286803
SRR3286804
SRR3286805
SRR3286806
SRR3286807
1.1Aspera下载SRA数据:
使用命令
下载
解压
sh ibm-aspera-connect_4.1.3.93_linux.s
#~/.aspera/connect/bin/ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty --mode recv --host ftp-private.ncbi.nlm.nih.gov --user anonftp --file-list dir_6.txt
报错1:
ascp: destination required
Startup failed, exit
解决1:将.aspera路径改为绝对路径,最后的Data是你下载文件的指定路径
#/home/radish/.aspera/connect/bin/ascp -i /home/radish/.aspera/connect/etc/asperaweb_id_dsa.putty --mode recv --host ftp-private.ncbi.nlm.nih.gov --user anonftp --file-list dir_6.txt Data/
报错2:
ascp: Failed to open TCP connection for SSH, exiting.
Session Stop (Error: Failed to open TCP connection for SSH)
2.下载gff/gtf注释文件并提取出感兴趣的基因/转录本区间
#less Arabidopsis_thaliana.TAIR10.42.gff3 | awk'{ if($3=="gene") print $0 }'>gene27655.gff
3.安装Hisat2
3.1root下安装,所以无需写bashrc
#anaconda search -t conda hisat2
#anaconda show bioconda/hisat2
#conda install --channel https://conda.anaconda.org/bioconda hisat2
运行
#hisat
没问题
3.2如果普通用户,则需要写入bashrc
#vi ~/.bashrc
#export PATH=~/home/radish/bio_soft/hisat2-2.2.0/hisat2:$PATH
#source ~/.bashrc
3.3将SRA数据比对到参考基因组:
3.3.1建立索引:
#hisat2-build Arabidopsis_thaliana.TAIR10.dna.toplevel.fa Arabidopsis_thaliana &
3.3.2单独比对:
#hisat2 -p 6 -x Arabidopsis_thaliana -1 SRR3286802_1.fastq.gz -2 SRR3286802_2.fastq.gz -S SRR3286802.sam
3.3.2脚本比对:
#cat 3.sh
for i in `seq 2 7`
do
hisat2 -x ~/bio_soft/Arabidopsis_thaliana -p 8 \
-1 ~/bio_soft/SRR328680${i}_1.fastq.gz \
-2 ~/bio_soft/SRR328680${i}_2.fastq.gz \
-S ~/bio_soft/SRR328680${i}.sam
done
#sh 3.sh
报告文件来看比对率都挺高的,97%以上。
4.sam转bam并排序。安装Samtools时报错:
#ibncurses.so.5: cannot open shared object fil
解决:
#whereis libncurses.so.5
#ln -s /usr/lib64/libncurses.so.6.1 /usr/lib64/libncurses.so.5
安装Samtools
与上述Hisat2同命令
运行:
单独转换和排序:
#samtools view -bS SRR3286805.sam > SRR3286805.bam
#samtools sort SRR3286805.bam > SRR3286805.n.bam
脚本转化和排序:
#cat 1.sh
for i in `seq 2 7`
do
samtools view -@ 8 -Sb SRR328680${i}.sam > SRR328680${i}.bam
samtools sort -@ 8 -n SRR328680${i}.bam > SRR328680${i}.n.bam
done
#sh 1.sh
5.计算表达量
5.1.安装FeatureCounts
#export PATH=~/home/radish/bio_soft/subread-1.6.0-Linux-x86_64/bin:$PATH
5.2.安装Stringtie
#wget http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.Linux_x86_64.tar.gz
#tar -zvxf stringtie-1.3.3b.Linux_x86_64.tar.gz
#cd stringtie-1.3.3b.Linux_x86_64/
#pwd
将打印出来的路径写入bashrc
#vi ~/.bashrc
#export PATH=~/home/radish/bio_soft/stringtie-1.3.3b.Linux_x86_64/stringtie:$PATH
#source ~/.bashrc
未完待续