题目:《综合单细胞异质性分析方法来定义人类心脏核心转录因子层次结构》
摘要
诱导人多能干细胞分化为心肌细胞(hiPSC-CMs)的技术已成为相关疾病建模和治疗测试的有力工具。然而,由于不成熟和异质性的的存在使得推广仍然受到的限制。为了阐明这种异质性的原因,作者在hiPSC心肌诱导分化和成人心脏组织细胞中应用了单细胞转录组和常规转录组测序技术。通过整合及拼接数据等分析,观察到了超过六个的不同单细胞亚群,其中几群细胞在分化的某个时间点(第30天不同)被重复观测到。为了剖析与每个细胞群相关的不同心脏核心转录因子的调控作用,本文使用了single-cell 和 bulk RNA-seq、CRISPR技术、ChIP-seq,同时配合电生理、钙成像和CyTOF分析检测到三个转录因子(NR2F2、TBX5和HEY2)的上调或下调产生的影响。汇总这些靶标、数据和基因组分析方法为理解体外细胞异质性提供了一个强大的平台。
首先是样品,建库测序,RNA-seq上游分析概况
- Two hiPSC lines were obtained from the Stanford Cardiovascular Institute biobank (CVI0076, CVI0059). CVI0059 was processed for single cell RNA-seq at day 5, day 14, and day 45 of the cardiomyocyte differentiation protocol using the 10X Genomics single-cell RNA-seq v1 kit.CVI0076 was processed for single cell RNA-seq at day 0, day 5, day 14, and day 45 use v2 kit.
- Libraries were quantified using Bioanalyzer (Agilent) and qPCR (KAPA) analysis. Libraries were sequenced on the NextSeq 500 (Illumina).
- Unsupervised cell population discovery analyses were performed with Seurat-CCA and the software ICGS available in AltAnalyze version 2.1.1 (http://www.altanalyze.org)
- For these analyses, only protein-coding genes were considered, applying a correlation cutoff of 0.3 and Euclidean column HOPACH clustering. Associated t-SNE visualizations were obtained in AltAnalyze using ICGS obtained dynamically regulated genes.
- ERCC spike-ins were included for further evaluation of sample quality.
- libraries were pooled and sequenced using Illumina’s HiSeq 2000 using 2 × 100 paired-end sequencing (Macrogen, South Korea)
- Filtered reads were aligned to the reference genome hg19 using STAR
- Using STAR BAM files, AltAnalyze was used to generate exon read counts for gene expression analysis and junction read counts for splicing analysis
- All retained single-cell libraries were required to have a minimum of 1 million uniquely aligning paired-end fragments and > 40% aligned fragments, based on STAR analysis. The retained libraries had an average of ~3 million aligned fragments.
- To calculate RPKM values for each gene, AltAnalyze was run on the junction and exon BED files using default settings
- To identify discrete cell states, unsupervised clustering was initially performed to define predominant populations (ICGS module of AltAnalyze, Pearson correlation coefficient > 0.4).
- Although this analysis identified three initial populations, we augmented these results using a supervised analysis of cardiac transcription factors from our 10X Genomics identified using the ICGS supervised correlation option.
- In agreement with our Fluidigim C1 microscopy analyses, no gene expression signatures with evident “doublet cell” profiles (more than one cell population signature) were discerned from this analysis.
- Furthermore, ERCC spike-in expression (ERCC92.fa, Kallisto TPM) ratios indicated single-cell transcriptome profiles were being assessed.
- the MarkerFinder algorithm in AltAnalyze was run to identify additional genes with population- restricted expression profiles (Pearson correlation coefficient > 0.4).
- Additional differentiations were performed on NR2F2GE1 (N = 2), TBX5GE1 (N = 2), HEY2GE1 (N = 2), NR2F2GE2 (N = 4), TBX5GE2 (N = 3), and HEY2GE2 (N = 2) lines and sequenced using Illumina’s HiSeq 4000 2 × 150 paired end sequencing (Novogene).
- Pseudotemporal ordering of these cells with the software Monocle designated SF1-expressing cardiomyocytes as the “earliest” population and HOPX as the latest, suggesting that cardiomyocyte subpopulations underlie distinct cardiac maturation states
- Data availability
GSE81585;
10x Genomics synapse ID: syn7818379.
然后是质量控制情况,最后的表达矩阵是多少个基因多少个细胞
- 200 hiPSC-CMs at day 30 were run throuth Fluidigm C1 microfluidic chip to capture single hiPSC-CMs (site 8shown) and processed for single-cell RNA-seq.
- Cells were labeled using a viability dye(Calcein-AM) to ensure RNA for live cells were processed. IHC TNNT2,MYL2,ACTC1,MYL7 marker
- 54 hiPSC-CMs were successfully sequenced which expressed cardiac markers
- single cell 10X genomics RNA-seq clusters called transcription factor and GO terms related to cardiac developmental progression
- Monocle applied to single- cell RNA-seq was used to identify a pseudotime progression of different populations of hiPSC-CMs in relation to each other.
接着介绍作者是如何挑选重要的基因和降维
-
To visualize and interpret the high- dimensional dataset generated, we applied the t-SNE algorithm based on seven cardiac markers preselected for the dataset, in which individual cells in the high-dimensional space were pro- jected onto a two-dimensional map but their neighboring rela- tionship was preserved.
Heatmap of gene markers specific for each day of differentiation. Selected cardiac specific genes are overlaid in the right panel.
降维后的聚类以及对每个类的注释
类的下游分析(差异分析或者实验验证等)
- Given that our single-cell RNA-seq of the wildtype and genome-edited lines suggested that NR2F2, TBX5, and HEY2 can regulate atrial-like and ventricular-like signatures, we next quantified the expression of these transcription factors within the adult heart
- RNA-seq of the human atria confirmed that NR2F2 and TBX5 are specifically enriched within the atria, and HEY2 is highly enriched within the ventricle.(Supplementary Fig. 5E).
- RNA-seq quantification demonstrated that MYL2 is highly expressed within ventricular tissue, while MYL7 is enriched within atrial tissue(Supplementary Fig. 5F).
-
differentiating hiPSC-CMs reveals that MYL2 is only observed at later differentiation time points (e.g. day 30 and day 90) (Supplementary Fig. 5G).
supFigure5
总结一下
- 本文作者通过对human embryonic stem cell-derived cardiomyocytes (hESC-CMs) 以及 human induced pluripotent stem cell-derived cardiomyocytes (hiPSC- CMs)取不同时间点及相应的转录因子上调下调表达后选取特定时间的样本进行single-cell 和 bulk RNA-seq的分析,确定了由不同基因表达谱富集的hiPSC-CM的亚种群。意义是由于心肌细胞的再生性差,损伤修复较困难,而且受损后严重危害人群健康,科学家们研究了hiPSC- CMs来治疗心肌损伤,但是hiPSC- CMs自身的混杂导致了预后的异质性,因此本文用单细胞测序的技术找到这个混杂的干细胞分化的的心肌细胞的特殊分化时期亚型所高表达的细胞标记基因,从而实现分类富集相应的亚群的心肌细胞,降低混杂差异提高治疗效果非常值得期待。