10X空间转录组数据分析之思路总结（针对肿瘤样本）

hello，大家好，今天我们来分享一下有关空间转录组研究肿瘤样本切片的一些思路，主要的参考文献是Comprehensive Analysis of Spatial Architecture in Primary Liver Cancer，里面的方法都很经典，值得我们多多关注，关于文献的分析内容呢，大家感兴趣可以看一看，我们这里呢就总结方法思路，希望大家能用的到，我们逐步来分析

第一部分、取样，肿瘤样本的切片取样这个也很有讲究，如下图。（客户的切片是不可以展示的，我这里采用了文献的切片）。

图片.png

采用切片的思路就是三部分都要取到，1、正常区域、2、肿瘤区域、3、边界区域，三者必不可少。但是更加建议的是一个冷冻块至少切三个片，分别是，只有正常区域的切片、只有肿瘤区域的切片和包含正常肿瘤区域的切片，这样包含的信息最为全面。如下图：

图片.png

第二部分，空间转录组的基本分析，这个地方也需要各位注意，重点的地方我加粗

For the gene-spot matrixes generated by Space Ranger, some routine statistical analyses were performed firstly, including calculating the number of the detected UMIs (nUMI), and genes (nGene) in each spot. Based on them, the basic quality controls (QC) were applied on the data. In detail, the spots with extremely low nUMI or nGene (outliers), and the spots isolated from the main tissue sections were removed. The genes expressed in less than 3 spots, and mitochondrial, ribosomal genes were filtered. 其实个人建议不要去除，不过具体情况具体分析，不能干扰下游分析的真实性。

第三部分，空间转录组数据的整合分析

After QC, we used the R package harmony (v1.0) (30) to integrate the expression data from different sections of each patient, and used the Seurat package (v3.1.5) to perform the basic downstream analysis and visualization. In detail, we firstly combined the expression matrixes of each patient’s all sections, and performed normalization, log-transformation, centering and scaling on them. Next, we identified 2,000 highly variable genes according to their expression means and variances. Based on them, principal components analysis (PCA) was performed to project the spots into a low-dimensional space, which was defined by the first 20 principal components (PCs). Then, by setting the section source as the batch factor and using the “RunHarmony” function, we iteratively corrected the spots’ low-dimensional PC representation to reduce the of impact of batch effect. After this step, the corrected PC matrixes were used to perform unsupervised shared-nearest-neighbor-based clustering and UMAP (uniform manifold approximation and projection) visualization analysis further. And to compare the clusters at gene level, we identified differentially expressed genes of the all or selected clusters by using fold-change analysis and Wilcoxon Rank Sum test with Bonferroni correction.

第三部分这个地方大家应该都很熟悉了吧，就是单细胞做harmony矫正的做法，这个地方没做过的面壁反思一下。

第四部分，Cluster similarity analysis，这个也是一个比较常规的点，不过一般都是10X单细胞数据在用，整合分析之后每个cluster会包含不同的样本，在每个切片上的空间位置也千差万别，不过能聚类到一起，说明表达相似，这里的工作就是比较这些cluster的相关性。

For the clusters from different patients, we represented them by their spots’ average expression profiles (the log-transformed normalization values). To reduce the impact of extreme values, we excluded some outlier spots in advance, whose first three PC values beyond the range of the mean±3standard deviation of the cluster they belonged to. Moreover, only the genes with the mean above 0.1 and the variance above 0.05 across all the cluster expression vectors were retained for the downstream comparison analyses 这个地方还是很值得注意的，剔除异常值采用的是whose first three PC values beyond the range of the mean±3standard deviation of the cluster they belonged to，表示很赞用。

To measure the clusters’ similarities across patients, we preformed two types of analyses, hierarchical clustering and low-dimensional projection. In detail, we firstly applied PCA on the centered and scaled clusters’ average expression profiles, and used the first five PCs to perform hierarchical clustering，这里的层次聚类采用了前五个PC，这样的层次聚类大家可以学一学.

图片.png

层次聚类图上的信息也很丰富，colorbar采用的是平均值。

Besides, the diffusion map was used to project clusters of different patients into a two-dimension space (the first two diffusion components) based on the package destiny (34) with default parameter setting

图片.png

For convenience of comparison, we annotated each cluster with a region label (normal, stromal, or tumor), which was decided by integrating the information of the cluster’s marker genes and H&E staining images.也是非常好的一个点，明显diffusion map 的结果具有区域性，相同的区域一般聚集在一起。

第五部分，Cell type scoring by a signature-based strategy

At the current Visium ST resolution, each spot may contain approximately 8-20 cells, so that we couldn’t assign a certain cell type for each spot.（这也是限制10X空间转录组发展的最大原因）。Considering this, to compare the distribution of cell types across the tissue sections, we proposed a signature-based strategy to score the cell type enrichments in each spot.（marker gene的富集，这个方式我在我的公开课上提到过，marker gene富集的方式看看各个地方的细胞类型的富集程度）.

做marker gene富集的步骤，我们来看一下文章是怎么做的

第一步，we curated a set of gene signatures of common cell types in liver cancer based on the Xcell signatures and biology prior knowledge（找marker gene）

图片.png

第二步，很关键，Then, we defined the average log-transformed normalization expression values of the genes in the signature as the corresponding cell type scores.（这个富集分数的计算方式，让我猝不及防~~~~~😄）。

第三步，Taking advantage of these scores, the cell type relative enrichment degree across different tissue regions can be compared.（嗯，梯度比较，这个就比较正常了）。By testing on some single cell RNA-seq datasets of liver cancer, we proved that our curated gene signatures had high sensitivity and specificity.(marker gene的验证确实很重要)。

后面作者还进行了MIA的分析模式，关于MIA这里就不展开讲了，大家可以参考我的文章MIA用于单细胞和空间的联合分析。which determined the cell type enrichment degrees by performing hypergeometric test on the overlap between the tissue region-specific genes of ST data and the cell type-specific genes of single cell data.

Here, we took advantage of cell type annotation and differential expression gene results of a liver cancer single cell dataset and performed MIA on the clusters of our ST data, so that we can use the p-values of hypergeometric test to measure the enrichment of different cell types in each cluster（下图C）

图片.png

By comparing these enrichment degrees and the mean values of our signature-based cell type scores of the all ST clusters, we observed generally high correlation（上图D）。which proved the reliability of our signature-based cell type scoring method. At the same time, it had the advantage of not requiring single cell data, which was more flexible.

第六部分，Intratumor spatial heterogeneity measurement ，衡量空间异质性。两个思路transcriptome diversity degree and spatial continuity degree，我们详细看一看。

transcriptome diversity degree,这个地方有点东西

For the transcriptome diversity degree, we firstly calculated the Pearson correlation coefficients between each pair of tumor region spots based on the highly variable genes.（首先计算每个spot的Pearson的相关性）.然后我们将样本的转录组多样性程度定义为这些相关性的中值绝对偏差（MAD）的 1.4826 倍，这是标准偏差的近似值，但可以避免异常值的影响。该度量越大意味着样本肿瘤点之间的相似性具有更大的方差,使样本具有更高的瘤内异质性。公式化地，它可以计算为

图片.png

where e_i indicated the expression vector of the tumor region spot i, and the MAD was defined as

图片.png

spatial continuity degree

first compared the cluster identities of each tumor region spot with its six neighbor spots

Then the total fraction of the neighbor spots with the same cluster identity was defined as the spatial continuity degree.（然后将具有相同簇标识的相邻点的总分数定义为空间连续度。）。该指标测量了肿瘤区域的空间异质性。

The larger this metric meant the sample’s tumor region more tended to be block-like (higher spatial continuity degree and lower spatial mixed degree). Formulaically, it can be calculated as

图片.png

where i indicated a tumor region spot, and I() was the indicative function.

第七部分，GSVA分析，这部分大家应该都知道才对

In detail, the log-transformed normalization expression matrix of tumor spots was inputted into the “gsva” function with the default parameters setting.

to compare the tumor clusters across patients at pathway level, we averaged the resulting GSVA score matrixes over each cluster and performed hierarchical clustering on them with Ward's minimum variance method （这部分相对简单）。

图片.png

第八部分，Spatial gradient change analysis，这个很重要

The spatial gradient distributions of hallmark pathway activities were analyzed on our leading-edge samples (L-sections) and the intact HCC nodule (HCC-5).

For the leading-edge samples, we focused on analyzing the gradient changes from capsules or tumor-normal boundary lines to the both tumor and normal sides.（正常区域向肿瘤区域过度的地方）。

we divided the normal and tumor regions into continuous zones parallel to the shape of the boundary lines at intervals of 5 spots（有点意思）。And the gradient changes along these zones were analyzed。

图片.png

第九部分，空间通讯分析Cluster interaction analysis

这里作者做通讯分析只做临近cluster的通讯分析，For each pair of neighbor tumor clusters, we selected their interface regions with 4 spots wide (2 spots wide for each cluster) and excluded the spots identified as stromal clusters（看来也不是盲目的全部选择,体现了空间做通讯位置的重要性）。

图片.png

方法就是cellphoneDB

图片.png

第10部分，Copy number variation (CNV) comparison analysis

作者直接用空间数据做inferCNV，结果么，文献的结果很符合实际。

图片.png

生活很好，等你超越

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 211,817评论 6赞 492
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 90,329评论 3赞 385
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 157,354评论 0赞 348
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 56,498评论 1赞 284
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 65,600评论 6赞 386
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,829评论 1赞 290
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,979评论 3赞 408
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,722评论 0赞 266
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 44,189评论 1赞 303
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,519评论 2赞 327
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,654评论 1赞 340
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 34,329评论 4赞 330
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,940评论 3赞 313
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,762评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,993评论 1赞 266
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 46,382评论 2赞 360
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 43,543评论 2赞 349

10X空间转录组数据分析之思路总结（针对肿瘤样本）

第一部分、取样，肿瘤样本的切片取样这个也很有讲究，如下图。（客户的切片是不可以展示的，我这里采用了文献的切片）。

第二部分，空间转录组的基本分析，这个地方也需要各位注意，重点的地方我加粗

第三部分，空间转录组数据的整合分析

第三部分这个地方大家应该都很熟悉了吧，就是单细胞做harmony矫正的做法，这个地方没做过的面壁反思一下。

层次聚类图上的信息也很丰富，colorbar采用的是平均值。

Besides, the diffusion map was used to project clusters of different patients into a two-dimension space (the first two diffusion components) based on the package destiny (34) with default parameter setting

第五部分，Cell type scoring by a signature-based strategy

做marker gene富集的步骤，我们来看一下文章是怎么做的

第一步，we curated a set of gene signatures of common cell types in liver cancer based on the Xcell signatures and biology prior knowledge（找marker gene）

第二步，很关键，Then, we defined the average log-transformed normalization expression values of the genes in the signature as the corresponding cell type scores.（这个富集分数的计算方式，让我猝不及防~~~~~😄）。

第六部分，Intratumor spatial heterogeneity measurement ，衡量空间异质性。两个思路transcriptome diversity degree and spatial continuity degree，我们详细看一看。

transcriptome diversity degree,这个地方有点东西

where ei indicated the expression vector of the tumor region spot i, and the MAD was defined as

spatial continuity degree

first compared the cluster identities of each tumor region spot with its six neighbor spots

Then the total fraction of the neighbor spots with the same cluster identity was defined as the spatial continuity degree.（然后将具有相同簇标识的相邻点的总分数定义为空间连续度。）。该指标测量了肿瘤区域的空间异质性。

The larger this metric meant the sample’s tumor region more tended to be block-like (higher spatial continuity degree and lower spatial mixed degree). Formulaically, it can be calculated as

where i indicated a tumor region spot, and I() was the indicative function.

第七部分，GSVA分析，这部分大家应该都知道才对

In detail, the log-transformed normalization expression matrix of tumor spots was inputted into the “gsva” function with the default parameters setting.

to compare the tumor clusters across patients at pathway level, we averaged the resulting GSVA score matrixes over each cluster and performed hierarchical clustering on them with Ward's minimum variance method （这部分相对简单）。

第八部分，Spatial gradient change analysis，这个很重要

The spatial gradient distributions of hallmark pathway activities were analyzed on our leading-edge samples (L-sections) and the intact HCC nodule (HCC-5).

For the leading-edge samples, we focused on analyzing the gradient changes from capsules or tumor-normal boundary lines to the both tumor and normal sides.（正常区域向肿瘤区域过度的地方）。

we divided the normal and tumor regions into continuous zones parallel to the shape of the boundary lines at intervals of 5 spots（有点意思）。And the gradient changes along these zones were analyzed。

第九部分，空间通讯分析Cluster interaction analysis

方法就是cellphoneDB

第10部分，Copy number variation (CNV) comparison analysis

作者直接用空间数据做inferCNV，结果么，文献的结果很符合实际。

推荐阅读更多精彩内容

where e_i indicated the expression vector of the tumor region spot i, and the MAD was defined as