Seurat官方教程 | Using Seurat with multimodal data

Compiled: 2021-03-18

Source: vignettes/multimodal_vignette.Rmd

教程链接:https://satijalab.org/seurat/articles/multimodal_vignette.html

对同一个细胞同时测量多种数据类型,称为multimodal analysis。例如,

  • CITE-seq可以同时测量同一细胞的转录组和细胞表面蛋白。
  • 10x multiome试剂盒允许成对测量细胞转录组和染色质可及性(即scRNA-seq+scATAC-seq)

我们设计了Seurat4以实现对多种多模态单细胞数据集的无缝存储、分析和探索。

1.数据介绍和加载

此篇教程,我们使用8,617 cord blood mononuclear cells (CBMCs)脐带血单个核细胞,配对的单细胞转录组数据和11个表面蛋白。

首先,我们加载进来两个count矩阵,一个是RNA,一个是antibody-derived tags (ADT)。数据可以从这里下载:

# Note that this dataset also contains ~5% of mouse cells, which we can use as negative controls
# for the protein measurements. For this reason, the gene expression matrix has HUMAN_ or MOUSE_
# appended to the beginning of each gene.
cbmc.rna <- as.sparse(read.csv(file = "data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",", header = TRUE, row.names = 1))

# To make life a bit easier going forward, we're going to discard all but the top 100 most
# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix
cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)

# Load in the ADT UMI matrix
cbmc.adt <- as.sparse(read.csv(file = "data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",", header = TRUE, row.names = 1))

# Note that since measurements were made in the same cells, the two matrices have identical
# column names
all.equal(colnames(cbmc.rna), colnames(cbmc.adt))

2.构造Seurat对象

现在,我们构造一个Seurat对象,然后将ADT数据添加为第二个assay

# creates a Seurat object based on the scRNA-seq data
cbmc <- CreateSeuratObject(counts = cbmc.rna)

# We can see that by default, the cbmc object contains an assay storing RNA measurement
Assays(cbmc)
[1] "RNA"

# create a new assay to store ADT information
adt_assay <- CreateAssayObject(counts = cbmc.adt)

# add this assay to the previously created Seurat object
cbmc[["ADT"]] <- adt_assay

# Validate that the object now contains multiple assays
Assays(cbmc)
[1] "RNA" "ADT"

提取ADT中测量的基因

# Extract a list of features measured in the ADT assay
rownames(cbmc[["ADT"]])
[1] "CD3"    "CD4"    "CD8"    "CD45RA" "CD56"   "CD16"   "CD10"   "CD11c"  "CD14"   "CD19"   "CD34"  "CCR5"   "CCR7" 

对于assay,我们可以在RNA和ADT之间来回切换以便于后续的分析和可视化

# List the current default assay
DefaultAssay(cbmc)

# Switch the default to ADT
DefaultAssay(cbmc) <- "ADT"
DefaultAssay(cbmc)

3.基于单细胞数据对细胞进行聚类

这个地方使用的是一个简版必要步骤的代码,省略了很多参数和可视化的解读,如果要看详细的,还是需要去看那篇入门教程:https://satijalab.org/seurat/articles/pbmc3k_tutorial.html,中文版://www.greatytc.com/p/2b5c5c849ec0

# Note that all operations below are performed on the RNA assay Set and verify that the default assay is RNA
DefaultAssay(cbmc) <- "RNA"
DefaultAssay(cbmc)

# perform visualization and clustering steps
cbmc <- NormalizeData(cbmc)
cbmc <- FindVariableFeatures(cbmc)
cbmc <- ScaleData(cbmc)
cbmc <- RunPCA(cbmc, verbose = FALSE)
cbmc <- FindNeighbors(cbmc, dims = 1:30)
cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE)
cbmc <- RunUMAP(cbmc, dims = 1:30)
DimPlot(cbmc, label = TRUE)
image-20210405003849284.png

4.并排可视化多种模式

现在,经过前面的处理,我们有了单细胞数据的聚类信息,我们就可以对RNA或蛋白的表达进行可视化。值得注意的是:Seurat提供了好几种方法在不同模态数据之间进行切换,这一点尤其重要,因为在某些情况下,相同的特征可以以多种形式出现——例如,该数据集包含B细胞标记物CD19(蛋白质和RNA水平)的独立测量。

# Normalize ADT data,
DefaultAssay(cbmc) <- "ADT"
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2)
DefaultAssay(cbmc) <- "RNA"

# Note that the following command is an alternative but returns the same result
cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")

# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can
# visualize one or the other
DefaultAssay(cbmc) <- "ADT"
p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
DefaultAssay(cbmc) <- "RNA"
p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")

# place plots side-by-side
p1 | p2
image-20210405010719014.png

或者,我们可以使用特定的assay关键词来指定特定的方式识别RNA和蛋白质assay关键词

# Alternately, we can use specific assay keys to specify a specific modality Identify the key for the RNA and protein assays
Key(cbmc[["RNA"]])
[1] "rna_"

Key(cbmc[["ADT"]])
[1] "adt_"

现在,我们可以在基因名字中加上key进行可视化

# Now, we can include the key in the feature name, which overrides the default assay
p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")
p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA")
p1 | p2
image-20210405012315824.png

5.识别scRNA-seq簇的细胞表面标记物

我们可以利用我们配对的CITE-seq测量来帮助注释来自scRNA-seq的cluster,并识别蛋白质和RNA marker。

# as we know that CD19 is a B cell marker, we can identify cluster 5 as expressing CD19 on the surface
VlnPlot(cbmc, "adt_CD19")
image-20210405013514814.png

我们也可以通过差异表达来筛选候选的cluster的蛋白和RNA marker

adt_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "ADT")
rna_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "RNA")

head(adt_markers)
               p_val  avg_logFC pct.1 pct.2     p_val_adj
CD19   7.164366e-218  2.0582747     1     1 9.313675e-217
CD45RA 7.330397e-110  0.8163007     1     1 9.529515e-109
CD4    1.736704e-108 -1.1652553     1     1 2.257715e-107
CD14   9.016660e-106 -0.7940111     1     1 1.172166e-104
CD3     9.578480e-89 -1.1131282     1     1  1.245202e-87
CD8     1.218701e-18 -0.8828616     1     1  1.584311e-17

head(rna_markers)
      p_val avg_logFC pct.1 pct.2 p_val_adj
IGHM      0  3.260649 0.971 0.044         0
TCL1A     0  2.917187 0.902 0.028         0
CD79A     0  2.888065 0.957 0.045         0
CD79B     0  2.615201 0.945 0.089         0
IGLC2     0  2.591397 0.286 0.005         0
MS4A1     0  2.493754 0.850 0.016         0

6.多模态数据的额外可视化

绘制ADT散点图(如FACS的双轴图)。请注意,如果需要,您甚至可以使用HoverLocator和FeatureLocator来“gate”细胞。

FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")
image-20210405014357926.png
# view relationship between protein and RNA
FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")
image-20210405014451238.png

与RNA数据相比,蛋白的原始count要高很多。

# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high, particularly in comparison to RNA values. This is due to the significantly higher protein copy number in cells, which significantly reduces 'drop-out' in ADT data

FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")
image-20210405014732600.png

7.加载来自10X多模态实验的数据

Seurat还能够分析CellRanger v3处理的多模态10X实验数据;例如,我们使用10X Genomics免费提供的7,900个外周血单个核细胞(PBMC)数据集重建上面的图。

数据下载链接:https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_protein_v3/pbmc_10k_protein_v3_filtered_feature_bc_matrix.tar.gz

pbmc10k.data <- Read10X(data.dir = "../data/pbmc10k/filtered_feature_bc_matrix/")
rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "", x = rownames(x = pbmc10k.data[["Antibody Capture"]]))

pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)
pbmc10k <- NormalizeData(pbmc10k)
pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])
pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")

plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1)
plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1)
plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1)

(plot1 + plot2 + plot3) & NoLegend()
image-20210405015927414.png

8.Seurat多模态数据的附加功能

Seurat v4还包括用于分析、可视化和集成多模态数据集的附加功能。欲了解更多信息,请浏览以下资源:

  • Defining cellular identity from multimodal data using WNN analysis in Seurat v4 vignette link
  • Mapping scRNA-seq data onto CITE-seq references [vignette]
  • Introduction to the analysis of spatial transcriptomics analysis [vignette]
  • Analysis of 10x multiome (paired scRNA-seq + ATAC) using WNN analysis [vignette]
  • Signac: Analysis, interpretation, and exploration of single-cell chromatin datasets [package]
  • Mixscape: an analytical toolkit for pooled single-cell genetic screens [vignette]
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 212,884评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,755评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,369评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,799评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,910评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,096评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,159评论 3 411
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,917评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,360评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,673评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,814评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,509评论 4 334
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,156评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,882评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,123评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,641评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,728评论 2 351

推荐阅读更多精彩内容