脚本位置:/home/shen/biosoft/HiC-Pro_2.11.1/bin/utils
1.digest_genome.py
## Digest the mm9 genome by HindIII
HICPRO_PATH/bin/utils/digest_genome.py -r A^AGCTT -o mm9_hindiii.bed mm9.fasta
## The same ...
HICPRO_PATH/bin/utils/digest_genome.py -r hindiii -o mm9_hindiii.bed mm9.fasta
## Double digestion, HindIII + DpnII
HICPRO_PATH/bin/utils/digest_genome.py -r hindiii dpnii -o mm9_hindiii_dpnii.bed mm9.fasta
"mboi": ["^GATC"],
"dpnii": ["^GATC"],
"bglii": ["A^GATCT"],
"hindiii": ["A^AGCTT"]}
2.hicpro2fithic.py
python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2fithic.py -i raw/100000/TU-2_100000.matrix -b raw/100000/TU-2_100000_abs.bed -s iced/100000/TU-2_100000_iced.matrix.biases
生成三个文件:
228K fithic.biases.gz
144K fithic.fragmentMappability.gz
2.7M fithic.interactionCounts.gz
接着:
ref:https://github.com/ay-lab/fithic#-x---chromosome_region
pip install fithic
fithic -h
报错:“No module named functools_lru_cache” 解决:在普通py环境下:pip install arrow==0.12.0
运行:
fithic -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -t fithic.biases.gz -o fithic -l TU-2 -v -x intraOnly -r 100000
-l lable
-v 可视化
-x intraOnly interOnly All
-r, --resolution
他的文章:https://genome.cshlp.org/content/24/6/999
3.hicpro2higlass.sh
bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2higlass.sh -h
chrsize=/media/shen/*/sun/refdata/UCSC_mm10/mm10.chrom.sizes
bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2higlass.sh -i raw/100000/TU-2_100000.matrix -r 100000 -c $chrsize -n
生成两个文件:
接着: 可以用HiGlass浏览器来看。
ref:http://gehlenborglab.org/research/projects/higlass/
4.hicpro2juicebox.sh
bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh -h
bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh -i TU-2/TU-2.allValidPairs -g /home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes -j /media/shen/*/sun/biosoft/juicer_tools.jar -r /home/shen/biosoft/HiC-Pro_2.11.1/annotation/mm10.mbo1.bed
!!! 注意:这里需要矫正?
bash /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/hicpro2juicebox.sh \
-i TU-2/TU-2.allValidPairs \ validpairs
-g /home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes \ 基因组size信息
-j /media/shen/*/sun/biosoft/juicer_tools.jar \ juicer包
-r /home/shen/biosoft/HiC-Pro_2.11.1/annotation/mm10.mbo1.bed # 限制酶切文件
生成:hic文件
juicerbox:
java -jar /media/shen/*/sun/biosoft/Juicebox_1.9.8.jar
5.sparseToDense.py
python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/sparseToDense.py -h
python /home/shen/biosoft/HiC-Pro_2.11.1/bin/utils/sparseToDense.py iced/100000/TU-2_100000_iced.matrix -b raw/100000/TU-2_100000_abs.bed -d -o TU-2.matrix
接着:
Create the DI file:
chrsize=/home/shen/biosoft/HiC-Pro_2.11.1/annotation/chrom_mm10.sizes
/home/shen/biosoft/domaincall_software/perl_scripts/DI_from_matrix.pl TU-2.matrix 100000 200000 $chrsize > TU-2.matrix.di
修改DI文件:(mouse)
cat TU-2.matrix.di | grep -v "M" | sed -e 's/^X/20/g' | sed -e 's/^Y/21/g' > TU-2.matrix.fine.di
cp /home/shen/biosoft/domaincall_software/HMM_calls.m ./
编辑:HMM_calls.m文件:
Note:
[1]In line 9 of the script HMM_calls.m,pleasehardcodeyour inputDIfilename.
[2]In line 77 of the script HMM_calls.m,pleasehardcodeyour outputfilename.
nice matlab < HMM_calls.m > dumpfile
# 用dekker的2015perl脚本:
ref: https://github.com/dekkerlab/crane-nature-2015
perl /home/shen/biosoft/crane-nature-2015-master/scripts/matrix2insulation.pl -i TU-2.matrix
6.R包:HiTC
library("HiTC")
x=importC("../rawdata_40000_iced.matrix", xgi.bed="../rawdata_40000_ord.bed")
R=extractRegion(x$chr1chr1, chr="chr1", from=1, to=249250621) #chesize
di<-directionalityIndex(R)
ord=read.delim(paste("rawdata_40000_ord.chr1.cut.bed",sep="."), header=F)
RM=as.matrix(intdata(R))
RF=cbind(ord[,c(1:3)], RM)
ord$DI=di
ord$V1=paste("1")
ord$DI[is.na(ord$DI)]<-0
ord$DI[is.nan(ord$DI)]<-0
FIN.DI=ord[,c(1,2,3,4)]
=> RF is the matrix from HiTC, FIN.DI is the DI file from HiTC