作者,Evil Genius
周四了,变懒了,重申一遍,不要抄袭。
分享文章Screening cell-cell communication in spatial transcriptomics via collective optimal transport, 2023年1月发表于nature methods。
方法链接在COMMOT
网页示例在COMMOT — commot 0.0.3 documentation
非常棒的推断空间通讯的方法。
单细胞推断细胞通讯的缺点:
- non-spatial studies often contain significant false positives given that CCC takes place only within limited spatial distances that are not measured in scRNA-seq datasets
目前空间的个性化分析软件
- CellPhoneDB v3 restricts interactions to cell clusters in the same microenvironment defined based on spatial information;
- stLearn relates the co-expression of ligand and receptor genes to the spatial diversity of cell types;
- SVCA and MISTy use probabilistic and machine learning models, respectively, to identify the spatially constrained intercellular gene–gene interactions;
- NCEM fits a function to relate cell type and spatial context to gene expression。
空间推断通讯的不足
current methods examine CCC locally and on cell pairs independently, and focus on information between cells or in the neighborhoods of individual cells. As a result,
collective or global information in CCC, such as competition between cells, is neglected.
COMMOT软件特点
a package that infers CCC by simultaneously considering numerous ligand–receptor pairs for either spatial transcriptomics data or spatially annotated scRNA-seq data equipped with spatial distances between cells estimated from paired spatial imaging data; summarizes and compares directions of spatial signaling; identifies downstream effects of CCC on gene expressions using ensemble of trees models; and provides visualization utilities for the various analyses.
软件总览----COMMOT
配体和受体通常在有限的空间范围内与多种复合物相互作用。考虑到这一点,作者提出了具有三个重要特征的collective optimal transport:首先,the use of non-probability mass distributions to control the marginals of the transport plan to maintain comparability between species(需要一点数学背景知识);其次,对CCC实施空间距离约束,以避免连接空间上相距较远的细胞;最后,将多种配体分布结合到多中受体分布以解释多种相互作用。
由于缺乏配体和受体蛋白的空间共定位测量,对空间数据的CCC推断方法的直接验证是困难的。在这里,建立了PDE模型来模拟空间中的CCC。COMMOT模拟了不同数量的配体和受体以及不同的竞争模式,从生成的合成数据中准确地重建了CCC连接。comot强化空间限制和不需要概率分布的特点在其他真实的空间转录组数据集上得到了进一步的说明(文章示例)。
对于每一对配体-受体和每一对细胞或spot,CCC推理量化了由一个spot贡献给另一个spot的配体-受体复合物的配体。然后,进行了几个下游分析:首先,空间信号方向的补充分析和CCC区域之间差异的识别;二是在空间聚类层面上对CCC进行归纳与分组;最后,对受CCC影响的下游基因进行鉴定。空间信号方向是通过将每个细胞的CCC矩阵补充到一个向量场来确定接收或发送信号的方向。对于下游分析,首先识别与接收到的信号有差异表达的基因,然后通过结合机器学习模型(使用接收到的信号和其他相关基因预测目标基因水平)来量化CCC对这些基因的影响。
文章给到的分析示例1(human epidermal development)
文章给到的分析示例2(高精度的空间数据分析,例如华大平台)
文章给到的分析示例3(10X空间转录组数据)
最后,我们来分享代码
安装
pip install commot
加载并载入数据,python版本,与scanpy无缝衔接
import commot as ct
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
加载空间数据,大家有自己的数据就读取自身数据
adata = sc.datasets.visium_sge(sample_id='V1_Mouse_Brain_Sagittal_Posterior')
adata.var_names_make_unique()
基础处理
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
Specify ligand-receptor pairs
Here, we use an example with only three LR pairs.
A user-defined LR database can be specified in the same way or alternatively, built-in LR databases can be obtained with the function commot.pp.ligand_receptor_database().
For example df_ligrec=ct.pp.ligand_receptor_database(database='CellChat', species='mouse').
LR=np.array([['Tgfb1', 'Tgfbr1_Tgfbr2', 'Tgfb_pathway'],
['Tgfb2', 'Tgfbr1_Tgfbr2', 'Tgfb_pathway'],
['Tgfb3', 'Tgfbr1_Tgfbr2', 'Tgfb_pathway'],
['Fgf1', 'Fgfr1', 'FGF_pathway'],
['Fgf1', 'Fgfr2', 'FGF_pathway']],dtype=str)
df_ligrec = pd.DataFrame(data=LR)
构建通讯网络
Use optimal transport to construct CCC networks for the ligand-receptor pairs with a spatial distance constraint of 500um (coupling between cells with distance greater than 500 is prohibited). 这里的距离限制在500um以内。
For example, the spot-by-spot matrix for the pair Tgfb1 (ligand) and Tgfbr1_Tgfbr2 (receptor)is stored in adata.obsp['commot-user_database-Tgfb1-Tgfbr1_Tgfbr2']
. The total sent or received signal for each pair is stored in adata.obsm['commot-user_database-sum-sender']
and adata.obsm['commot-user_database-sum-receiver']
.
ct.tl.spatial_communication(adata,database_name='user_database', df_ligrec=df_ligrec, dis_thr=500, heteromeric=True, pathway_sum=True)
可视化
#Plot the amount of sent and received signal, for example, of the LR pair Fgf1-Fgfr1.
pts = adata.obsm['spatial']
s = adata.obsm['commot-user_database-sum-sender']['s-Fgf1-Fgfr1']
r = adata.obsm['commot-user_database-sum-receiver']['r-Fgf1-Fgfr1']
fig, ax = plt.subplots(1,2, figsize=(10,4))
ax[0].scatter(pts[:,0], pts[:,1], c=s, s=5, cmap='Blues')
ax[0].set_title('Sender')
ax[1].scatter(pts[:,0], pts[:,1], c=r, s=5, cmap='Reds')
ax[1].set_title('Receiver')
可视化信号方向
ct.tl.communication_direction(adata, database_name='user_database', lr_pair=('Fgf1','Fgfr1'), k=5)
ct.pl.plot_cell_communication(adata, database_name='user_database', lr_pair=('Fgf1','Fgfr1'), plot_method='grid', background_legend=True,
scale=0.00003, ndsize=8, grid_density=0.4, summary='sender', background='image', clustering='leiden', cmap='Alphabet',
normalize_v = True, normalize_v_quantile=0.995)
或者
ct.pl.plot_cell_communication(adata, database_name='user_database', lr_pair=('Fgf1','Fgfr1'), plot_method='grid', background_legend=True,
scale=0.00003, ndsize=8, grid_density=0.4, summary='receiver', background='summary', clustering='leiden', cmap='Reds',
normalize_v = True, normalize_v_quantile=0.995)
示例2,通路角度,前面处理一致
import os
import gc
import ot
import pickle
import anndata
import scanpy as sc
import pandas as pd
import numpy as np
from scipy import sparse
from scipy.stats import spearmanr, pearsonr
from scipy.spatial import distance_matrix
import matplotlib.pyplot as plt
import commot as ct
adata = sc.datasets.visium_sge(sample_id='V1_Mouse_Brain_Sagittal_Posterior')
adata.var_names_make_unique()
adata.raw = adata
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
adata_dis500 = adata.copy()
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata = adata[:, adata.var.highly_variable]
sc.tl.pca(adata, svd_solver='arpack')
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.4)
sc.pl.spatial(adata, color='leiden')
空间通讯推断
use the CellChatDB ligand-receptor database . Only the secreted signaling LR pairs will be used.
df_cellchat = ct.pp.ligand_receptor_database(species='mouse', signaling_type='Secreted Signaling', database='CellChat')
print(df_cellchat.shape)
# filter the LR pairs to keep only the pairs with both ligand and receptor expressed in at least 5% of the spots
df_cellchat_filtered = ct.pp.filter_lr_database(df_cellchat, adata_dis500, min_cell_pct=0.05)
print(df_cellchat_filtered.shape)
print(df_cellchat_filtered.head())
0 1 2 3
0 Tgfb1 Tgfbr1_Tgfbr2 TGFb Secreted Signaling
1 Tgfb2 Tgfbr1_Tgfbr2 TGFb Secreted Signaling
2 Tgfb3 Tgfbr1_Tgfbr2 TGFb Secreted Signaling
3 Tgfb1 Acvr1b_Tgfbr2 TGFb Secreted Signaling
4 Tgfb1 Acvr1c_Tgfbr2 TGFb Secreted Signaling
Now perform spatial communication inference for these 250 ligand-receptor pairs with a spatial distance limit of 500. CellChat database considers heteromeric units. The signaling results are stored as spot-by-spot matrices in the obsp slots. For example, the score for spot i signaling to spot j through the LR pair can be retrieved from adata_dis500.obsp['commot-cellchat-Wnt4-Fzd4_Lrp6'][i,j]
.
ct.tl.spatial_communication(adata_dis500,
database_name='cellchat', df_ligrec=df_cellchat_filtered, dis_thr=500, heteromeric=True, pathway_sum=True)
#adata_dis500.write("./adata.h5ad")
Determine the spatial direction of a signaling pathway, for example, the PSAP pathway. The interpolated signaling directions for where the signals are sent by the spots and where the signals received by the spots are from are stored in adata_dis500.obsm['commot_sender_vf-cellchat-PSAP']
and adata_dis500.obsm['commot_receiver_vf-cellchat-PSAP']
, respectively.
ct.tl.communication_direction(adata_dis500, database_name='cellchat', pathway_name='PSAP', k=5)
ct.pl.plot_cell_communication(adata_dis500, database_name='cellchat', pathway_name='PSAP', plot_method='grid', background_legend=True,
scale=0.00003, ndsize=8, grid_density=0.4, summary='sender', background='image', clustering='leiden', cmap='Alphabet',
normalize_v = True, normalize_v_quantile=0.995)
Downstream analysis
识别信号差异基因,还是以PSAP通路为例
Now we further examine what genes are likely differentially expressed with respect to the inferred signaling activity. tradeSeq will be used to model the DE genes. This analysis is similar to finding temporal DE genes just with the pseudotime variable replaced by the amount of received signal. Count data is needed in adata_dis500.layers['counts']
adata_dis500 = sc.read_h5ad("./adata.h5ad")
adata = sc.datasets.visium_sge(sample_id='V1_Mouse_Brain_Sagittal_Posterior')
adata_dis500.layers['counts'] = adata.X
Look for genes that are differentially expressed with respect to PSAP signaling.
df_deg, df_yhat = ct.tl.communication_deg_detection(adata_dis500,
database_name = 'cellchat', pathway_name='PSAP', summary = 'receiver')
import pickle
deg_result = {"df_deg": df_deg, "df_yhat": df_yhat}
with open('./deg_PSAP.pkl', 'wb') as handle:
pickle.dump(deg_result, handle, protocol=pickle.HIGHEST_PROTOCOL)
Cluster the downstream genes and visualize the expression trends with respect to increased level of received signal through PSAP pathway
with open("./deg_PSAP.pkl", 'rb') as file:
deg_result = pickle.load(file)
df_deg_clus, df_yhat_clus = ct.tl.communication_deg_clustering(df_deg, df_yhat, deg_clustering_res=0.4)
top_de_genes_PSAP = ct.pl.plot_communication_dependent_genes(df_deg_clus, df_yhat_clus, top_ngene_per_cluster=5,
filename='./heatmap_deg_PSAP.pdf', font_scale=1.2, return_genes=True)
Plot some example signaling DE genes
X_sc = adata_dis500.obsm['spatial']
fig, ax = plt.subplots(1,3, figsize=(15,4))
colors = adata_dis500.obsm['commot-cellchat-sum-receiver']['r-PSAP'].values
idx = np.argsort(colors)
ax[0].scatter(X_sc[idx,0],X_sc[idx,1], c=colors[idx], cmap='coolwarm', s=10)
colors = adata_dis500[:,'Ctxn1'].X.toarray().flatten()
idx = np.argsort(colors)
ax[1].scatter(X_sc[idx,0],X_sc[idx,1], c=colors[idx], cmap='coolwarm', s=10)
colors = adata_dis500[:,'Gpr37'].X.toarray().flatten()
idx = np.argsort(colors)
ax[2].scatter(X_sc[idx,0],X_sc[idx,1], c=colors[idx], cmap='coolwarm', s=10)
ax[0].set_title('Amount of received signal')
ax[1].set_title('An example negative DE gene (Ctxn1)')
ax[2].set_title('An example positive DE gene (Gpr37)')