旧号无故被封,小号再发一次
更多空间转录组文章:
1. 新版10X Visium
- 【10X空间转录组Visium】(一)Space Ranger 1.0.0(更新于20191205)
- 【10X空间转录组Visium】(二)Loupe Browser 4.0.0
- 【10X空间转录组Visium】(三)跑通Visium全流程记录
- 【10X空间转录组Visium】(四)R下游分析的探索性代码示例
- 【10X空间转录组Visium】(五)Visium原理、流程与产品
- 【10X空间转录组Visium】(六)新版Seurat v3.2分析Visium空间转录组结果的代码实操
- 【10X空间转录组Visium】(七)思考新版Seurat V3.2作者在Github给予的回答
2. 旧版Sptial
- 【旧版空间转录组Spatial】(一)ST Spot Detector使用指南
- 【旧版空间转录组Spatial】(二)跑通流程试验记录
- 【旧版空间转录组Spatial】(三)ST Spot Detector实操记录
下载数据集
https://support.10xgenomics.com/spatial-gene-expression/datasets
我选择的是:Mouse Brain Section (Coronal)
$ tar -xvf V1_Adult_Mouse_Brain_fastqs.tar
$ ls
V1_Adult_Mouse_Brain_S5_L001_I1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L001_R2_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_R1_001.fastq.gz
V1_Adult_Mouse_Brain_S5_L001_I2_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_I1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_R2_001.fastq.gz
V1_Adult_Mouse_Brain_S5_L001_R1_001.fastq.gz V1_Adult_Mouse_Brain_S5_L002_I2_001.fastq.gz
- 同一个样本的测序数据,这里总共有2条lane
- 每条lane因为是双索引的缘故,所以存在I1 I2 R1 R2共4个fastq文件、
-
所以总共有8条fastq
与之对应的情况是:
运行spaceranger count
此处选择自动对齐的方案
由于服务器没有连接外网:所以手动下载slide文件
https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/count
$ spaceranger count --id=V1_Adult_Mouse_Brain \
--transcriptome=/share/nas1/Data/luohb/Visium/reference/refdata-cellranger-mm10-3.0.0/ \
--fastqs=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_fastqs \
--sample=V1_Adult_Mouse_Brain \
--image=/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain_image.tif \
--slide=V19L01-041 \
--area=C1 \
--slidefile=/share/nas1/Data/luohb/Visium/test2/V19L01-041.gpr \
--localcores=32 \
--localmem=128
顺利地跑完了,因为服务器同时还跑着几个比较大的任务,然后居然跑了接近13个小时。。。
查看结果文件
$ ls
_cmdline _finalstate _jobmode _mrosource _perf _sitecheck _tags _uuid _vdrkill
_filelist _invocation _log outs _perf._truncated_ SPATIAL_RNA_COUNTER_CS _timestamp V1_Adult_Mouse_Brain.mri.tgz _versions
$ cd outs/
$ ls
analysis filtered_feature_bc_matrix metrics_summary.csv possorted_genome_bam.bam raw_feature_bc_matrix spatial
cloupe.cloupe filtered_feature_bc_matrix.h5 molecule_info.h5 possorted_genome_bam.bam.bai raw_feature_bc_matrix.h5 web_summary.html
-
查看web_summary.html
- 查看count管道输出几个包含自动二级分析结果的CSV文件
$cd analysis/
$ls
clustering diffexp pca tsne umap
1. PCA降维结果:
$cd /pca/10_components
$ls
components.csv dispersion.csv features_selected.csv projection.csv variance.csv
投影
$head -3 projection.csv
Barcode,PC-1,PC-2,PC-3,PC-4,PC-5,PC-6,PC-7,PC-8,PC-9,PC-10
AAACAAGTATCTCCCA-1,-10.281241313083257,-24.67223115562252,-0.19850052930601336,-2.1734929997144388,6.630976878797487,-0.12128746693282366,6.040708059434257,4.657495740394594,16.344239212184327,6.523601903899456
AAACAATCTACTAGCA-1,17.830458684877186,-27.53526668134934,15.877302377060623,9.74572143694312,-0.7208195934715782,-4.339470398396214,2.5444608437485288,-5.084679351848514,2.9247276185469495,-1.0731021612191327
components matrix
$less -S components.csv
PC,ENSMUSG00000051951,ENSMUSG00000089699,ENSMUSG00000025900,ENSMUSG00000025902,ENSMUSG00000033845,ENSMUSG00000025903,ENSMUSG00000104217,ENSMUSG00000033813,(略……)
1,9.807402710059275e-05,-0.0007359419037463138,0.0018506647696503106,0.0019216677830155664,-0.009477278899046813,-0.005003056852125207,0.0,-0.008498306263180
2,-0.0013017257339919546,0.0015759310908915448,0.0013809836795030965,0.0009513422156874659,0.007418499981929492,0.003222355732773671,0.0,0.00887178686827463,
3,-0.001920230193482586,0.003378841598139873,-0.00012165106820253075,-0.00024897415838216264,-0.0031447165300072175,-0.007787586978438225,0.0,-0.003148852394
(略……)
总方差的比例
$head -3 variance.csv
PC,Proportion.Variance.Explained
1,0.030645967432188836
2,0.015067575203691749
归一化的离散度
$head -3 dispersion.csv
Feature,Normalized.Dispersion
ENSMUSG00000051951,0.261762717719762
ENSMUSG00000089699,-1.5988672040435437
2. t-SNE结果文件:
$cd ../../tsne/2_components/
$ls
projection.csv
$head -5 projection.csv
Barcode,TSNE-1,TSNE-2
AAACAAGTATCTCCCA-1,-18.47081216664088,7.240054873818881
AAACAATCTACTAGCA-1,-4.219964329936257,-9.182632464702484
AAACACCAATAACTGC-1,14.744060324279337,13.360913482080413
AAACAGAGCGACTCCT-1,-11.72411901642397,-7.924228663324808
3. 聚类结果:
$cd ../../clustering/
$ls
graphclust kmeans_2_clusters kmeans_4_clusters kmeans_6_clusters kmeans_8_clusters
kmeans_10_clusters kmeans_3_clusters kmeans_5_clusters kmeans_7_clusters kmeans_9_clusters
对于每个聚类, spaceranger为每个点生成聚类分配cluster assignments
打开聚类3看看:
$cd kmeans_3_clusters
$ls
clusters.csv
$head -5 clusters.csv
Barcode,Cluster
AAACAAGTATCTCCCA-1,1
AAACAATCTACTAGCA-1,3
AAACACCAATAACTGC-1,2
AAACAGAGCGACTCCT-1,1
4. 差异表达分析:
$cd ../../diffexp/
$ls
graphclust kmeans_2_clusters kmeans_4_clusters kmeans_6_clusters kmeans_8_clusters
kmeans_10_clusters kmeans_3_clusters kmeans_5_clusters kmeans_7_clusters kmeans_9_clusters
这次看个总表:
$cd /graphclust
$ls
differential_expression.csv
$head -3 differential_expression.csv
Feature ID,Feature Name,Cluster 1 Mean Counts,Cluster 1 Log2 fold change,Cluster 1 Adjusted p value,Cluster 2 Mean Counts,Cluster 2 Log2 fold change,Cluster 2 Adjusted p value,Cluster 3 Mean Counts,Cluster 3 Log2 fold change,Cluster 3 Adjusted p value,Cluster 4 Mean Counts,Cluster 4 Log2 fold change,Cluster 4 Adjusted p value,Cluster 5 Mean Counts,Cluster 5 Log2 fold change,Cluster 5 Adjusted p value,Cluster 6 Mean Counts,Cluster 6 Log2 fold change,Cluster 6 Adjusted p value,Cluster 7 Mean Counts,Cluster 7 Log2 fold change,Cluster 7 Adjusted p value,Cluster 8 Mean Counts,Cluster 8 Log2 fold change,Cluster 8 Adjusted p value,Cluster 9 Mean Counts,Cluster 9 Log2 fold change,Cluster 9 Adjusted p value
ENSMUSG00000051951,Xkr4,0.09115907843838432,0.15688013442205495,0.9130108472807676,0.08789156406190936,0.094226986457139,1.0,0.059424476860418934,-0.5579910544947899,0.4792687534164091,0.09747791035014447,0.270272692975412,0.7950049780312995,0.08717356987748102,0.14776402072440886,1.0,0.05406634025868632,-0.6310298603360582,0.7980928917515894,0.15030400022885756,0.9570457266970553,0.22931236900985477,0.0606581027791399,-0.4319057525382224,1.0,0.10761817731957228,0.4400508833584902,1.0
ENSMUSG00000089699,Gm1992,0.0016574377897888059,1.3866145310996707,0.8220253607506287,0.0,0.423008752385563,1.0,0.0,0.22991150489664136,1.0,0.0033613072534532575,2.5793194965660433,0.5338242296758853,0.0,2.3542148981918345,1.0,0.003180372956393313,2.490599584065473,0.8676482778053517,0.0,1.5959470345290159,1.0,0.0,1.4568374963600368,1.0,0.0,2.146642828481177,1.0
5 .矩阵:Feature-Barcode Matrices
矩阵的每个元素是与特征(行)和条形码(列)关联的UMI的数量。
$cd /share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs
$ls
analysis filtered_feature_bc_matrix metrics_summary.csv possorted_genome_bam.bam raw_feature_bc_matrix spatial
cloupe.cloupe filtered_feature_bc_matrix.h5 molecule_info.h5 possorted_genome_bam.bam.bai raw_feature_bc_matrix.h5 web_summary.html
$tree filtered_feature_bc_matrix
filtered_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
0 directories, 3 files
$tree raw_feature_bc_matrix
raw_feature_bc_matrix
├── barcodes.tsv.gz
├── features.tsv.gz
└── matrix.mtx.gz
0 directories, 3 files
$gzip -cd filtered_feature_bc_matrix/features.tsv.gz |head -3
ENSMUSG00000051951 Xkr4 Gene Expression
ENSMUSG00000089699 Gm1992 Gene Expression
ENSMUSG00000102343 Gm37381 Gene Expression
其中:
第一列 第二列 第三列
功能ID 基因名 标识特征的类型
尝试将矩阵加载到R
library(Matrix)
matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path,
header = FALSE,
stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path,
header = FALSE,
stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1
dim(mat)
[1] 31053 2698
尝试将矩阵加载到Python
import csv
import gzip
import os
import scipy.io
matrix_dir = "/share/nas1/Data/luohb/Visium/test2/V1_Adult_Mouse_Brain/outs/filtered_feature_bc_matrix"
mat = scipy.io.mmread(os.path.join(matrix_dir, "matrix.mtx.gz"))
features_path = os.path.join(matrix_dir, "features.tsv.gz")
feature_ids = [row[0] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
gene_names = [row[1] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
feature_types = [row[2] for row in csv.reader(gzip.open(features_path), delimiter="\t")]
barcodes_path = os.path.join(matrix_dir, "barcodes.tsv.gz")
barcodes = [row[0] for row in csv.reader(gzip.open(barcodes_path), delimiter="\t")]
6. 看图片
$cd spatial/
$ls
aligned_fiducials.jpg detected_tissue_image.jpg scalefactors_json.json tissue_hires_image.png tissue_lowres_image.png tissue_positions_list.csv
tissue_hires_image.png:较高像素的明场图片
tissue_lowres_image.png:较低像素的明场图片
aligned_fiducials.jpg(尺寸与 tissue_hires_image.png相同):用于验证基准对齐是否成功
相应的像素坐标转换文件:scalefactors_json.json
$cat scalefactors_json.json
{"spot_diameter_fullres": 89.44476048022638, "tissue_hires_scalef": 0.17011142, "fiducial_diameter_fullres": 144.48769000651953, "tissue_lowres_scalef": 0.05
PS:这部有点像旧流程的ST_spot_detector的步骤了
其中:
- issue_hires_scalef:将原始全分辨率图像中的像素位置转换为tissue_hires_image.png中的像素位置的比例因子。
- tissue_lowres_scalef:将原始全分辨率图像中的像素位置转换为tissue_lowres_image.png中的像素位置的比例因子。
- fiducial_diameter_fullres:跨越原始全分辨率图像中基准点直径的像素数。
- spot_diameter_fullres:跨越原始全分辨率图像中组织点直径的像素数。
detected_tissue_image.jpg:
tissue_positions_list.txt:
$head -2 tissue_positions_list.csv
ACGCCTGACACGCGCT-1,0,0,0,1252,1211
TACCGATCCAACACTT-1,0,1,1,1372,1280
其中列对应着:
- barcode:与该点相关的条形码的顺序。
- in_tissue:二进制,指示该斑点位于组织的内部(1)还是外部(0)。
- array_row:点在阵列中的行坐标从0到77。该阵列有78行。
- array_col:阵列中点的列坐标。为了表示 the orange crate arrangement of the spots,此列索引对偶数行使用0到126的偶数,对奇数行使用1到127的奇数。注意,每行(偶数或奇数)有64个斑点。
- pxl_col_in_fullres:全分辨率图像中斑点中心的列像素坐标。
- pxl_row_in_fullres:全分辨率图像中斑点中心的行像素坐标。
7. BAM:Barcoded BAM
$cd outs/
$samtools view possorted_genome_bam.bam |head -5
A00984:21:HMKLFDMXX:2:2117:10357:1235 16 1 3000100 255 25M199730N72M23S * 0 0 TTTTTTTTTTTTTTTTTTTTTTTTGCAAGAAAAAAAATCAGATAACCGAGGAAAATTATTCATTATGAAGTACTACTTTCCACTTCATTTCATCCCATGTACTCTGCGTTGATACCACTG F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF NH:i:1 HI:i:1 AS:i:83 nM:i:1 RE:A:I xf:i:0 ts:i:21 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:GACGACGATCCGCGTT CY:Z:FFFFFFFFFFFFFFFF CB:Z:GACGACGATCCGCGTT-1 UR:Z:CCTGTTTGTTGT UY:Z:FFFFFFFFFFFF UB:Z:CCTGTTTGTTGT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
A00984:21:HMKLFDMXX:1:1306:5041:10034 16 1 3000100 255 25M199611N95M * 0 0 TTTTTTTTTTTTTTTTTTTTTTTTGAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCA FFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TGGTCTGTTGGGCGTA CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGGTCTGTTGGGCGTA-1 UR:Z:GTTACCCTATGT UY:Z:FFFFFFFFFFFF UB:Z:GTTACCCTATGT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
A00984:21:HMKLFDMXX:2:2345:21206:5087 16 1 3010019 255 98M22S * 0 0 ATAGTGTCCCAGATTTCCTGGCTGTTTCTTGTTAGGATTTTTTTAGATTTAACATTTCTGTCATAGATTAATCTATTTTGCAGATGTAATCCCATGTACTCTGCGTTGATACCACTGCTT F:FFFFFFFFFFF::FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFF NH:i:1 HI:i:1 AS:i:90 nM:i:3 RE:A:I xf:i:0 ts:i:30 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:ACGGTCACCGAGACCCY:Z:FFFFFFFFFFFFF,F: CB:Z:ACGGTCACCGAGAACA-1 UR:Z:TCGATCTCGTAA UY:Z:FFFFFFFFFFFF UB:Z:TCGATCTCGTAA RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:2
A00984:21:HMKLFDMXX:1:1164:15980:17738 16 1 3013014 255 17M186702N103M * 0 0 TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFF,FFFFFF CR:Z:TCAAGGTTACTACACC CY:Z:FFFFFFFFFFF:FFFF CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT UY:Z:FFFFFFFFFFFF UB:Z:CCGGGCAGTTAT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
A00984:21:HMKLFDMXX:1:1451:3477:33912 16 1 3013014 255 17M186702N103M * 0 0 TTTTTTTTTTTTTTTGTTTAAAATGACCACAGTGTACTTTATTTAATGATTTTTGTACTTTGTGTTGCAATAAAATAAAAAAAAAATCTACAAAATTCAAATATATAAAATTTCAAGTTT FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:108 nM:i:0 RE:A:I xf:i:0 li:i:0 BC:Z:ACCAGACAAC QT:Z:FFFFFFFFFF CR:Z:TCAAGGTTACTACACC CY:Z:FFFFFFFFFFF:F,FF CB:Z:TCAAGGTTACTACACC-1 UR:Z:CCGGGCAGTTAT UY:Z:FFFFFFFFFFFF UB:Z:CCGGGCAGTTAT RG:Z:V1_Adult_Mouse_Brain:0:1:HMKLFDMXX:1
貌似没看到官网讲的AGAATGGTCTGCAT-1
这种spot barcodeCB标签包含带短划线分隔符的后缀,后跟数字的结构啊。。。
进行R的下游分析
由于现在还没有现成的用于10X Visium空间转录组的R包,只好参考官网的R代码
官网地址:https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/rkit
通过Loupe Browser 4.0.0进行下游分析
- 打开Xftp,打开
cloupe.cloupe
-
查看tSNE
-
UMAP
-
Feacture Plot
Feature Plot视图可让您可视化每个点的一个或两个基因的表达水平。此视图使得根据一个或两个基因的表达水平对点组进行阈值化变得容易。特征(在这种情况下为基因)可以在Y轴顶部或X轴右侧的文本框中输入。这些选择器还包含一个控件,用于在线性和对数刻度之间切换轴的刻度。