hi-c是目前探索相关信息的主要方法之一,本篇文献首次提出hi-c method,并进一步发现了染色体compartment分域方法。
- 文献:Comprehensive mapping of long range interactions reveals folding principles of the human genome
- PubMed ID:19815776
- GEO:GSE18199
- 关键字:hi-c method;A/B compartment
1、构建 hi-c library
- (1)cells are crosslinked with formaldehyde;
DNA与甲醇交联,固定DNA(如果DNA间发生interaction,会有相关蛋白将interaction 区域固定在一起) - (2)DNA is digested with a restriction enzyme that leaves a 5′-overhang;
形; - (3)the 5′-overhang is filled, including a biotinylated residue;
粘性末端缺口补平(包含biotinylated,如上图是那个紫色的标记) - (4)the resulting blunt-end fragments are ligated ;
形 - (5)A Hi-C library is created by shearing the DNA and selecting the biotin-containing fragments with streptavidin beads.
DNA打断碎片,富集、纯化含有biotinylated的片段,最终形成Hi-c library,进行测序。 - (6)parallel DNA sequencing, producing a catalog of interacting fragments
将上述文库进行双端测序,以biotin为分隔成一组read pairs(read1、read2)分别比对到基因组两个位置 A、B
位置A、B即是染色体上的一段区域。单位长度即为bin 或者说resolution
bin越短 可匹配的越精确,但同时匹配的reads 数就减少了;
bin越长 匹配到的就越宽泛,但能够匹配到更多的reads ;
- 测序结果表明共有8.4M read pairs total。其中6.7 million corresponded to long-range contacts between segments greater than >20Kb apart.即六百万+的read pairs 相隔20KB长度以上,说明许多一维相距很远的bin,但空间距离很近。
2、genome-wide contact matrix M Heatmap
- 目的:将上述测序比对结果可视化。
- 矩阵格子:人染色体长度一般有100~200Mb,文献中用到的chr14长度为104M,bin设置为1Mb。
横、纵轴为相同的染色体长度,以bin长度(1mb)为轴刻度单位。因此对应的contact matrix M就是104*104大小的矩阵。(如下图)
- Mij to be the number of ligation products between locusi and locus j (SOM).
举例来说,假如M(2,8)=10就表示reads pairs分别匹配到染色体第2个bin与第8个bin的数目为10;再转换为对应热图相应的颜色等级。 - This matrix reflects an ensemble average of the interactions present inthe original sample of cells;
- It can be visually represented as a heatmap, with intensity indicating contact frequency.
如上讨论的情况是pairs 的read1,read2均比对到同一条染色体的情况,称为
Cis interaction
。而Trans interaction
就是指read pairs分别比对到不同染色体的情况。
3、average intrachromosomal contact probability
- 疑问:如何计算?(基于1D component?)
- 概念:I n(s) for pairs of loci separated by a genomic distance s onchromosome n.
指在染色体n上,平均相距长度为s(一维距离)的两个position contact probability - 结果,如下图:从上往下依次代表染色体1内部、染色体1与10、染色体与其余所有染色体平均、染色体1与21。
(1)contact probability decreases monotonically on every chromosome;就是说一般距离近,contact probability(interaction)作用强
(2)chromosome territories.即使一条染色体上相距很远(超过200mb),其contact probability也比两条染色体里的任意两个position 高得多
4.1、observed contact matrix M Heatmap
- 为了校正 sequence proximity strongly influences contact probability,将raw data进行标准化;
- 标准化方法:dividing each entry in the contact matrix by the genome-wide average contact probability for loci at that genomic distance (上述第三点).
The normalized matrix shows many large blocks of enriched and depleted interactions generating a ‘plaid’ pattern(格子图案)
4.3、correlation matrix C
- If two loci (here 1 Mb regions) are nearby in space, we reasoned that they will share neighbors and have correlated interaction profiles. (未太理解这句话)
转换为相关矩阵correlation matrix C
Cij is the Person correlation between the i row and j column of M which dramatically sharpened the plaid pattern(如下图)
- The plaid pattern suggests that each chromosome can be decomposed into two sets of loci(arbitrarily labeled A and B) such that contacts within each set are enriched and contacts between sets are depleted.
即表示将染色体分为两个区域,区域内的bins interaction 明显高于 区域间的interacton
- 对相关矩阵进行主成分分析,一般来说, the first principal component (PC) clearly corresponded to the plaid pattern (positive values defining one set, negative values the other)
The entries of the PC vector reflected the sharp transitions from compartment to compartment observed within the plaid heatmaps.
4.5 compartment A B
- The Hi-C data imply that regions tend be closer in space if they belong to the same
compartment. - 但是如上G图以及文章其它证据表明 compartment A compartment A is more closely associated with open, accessible, actively transcribed chromatin. 相对来说compartment B closed chromatin domains ,相对来说表达不活跃。