旧号无端被封,小号再发一次
更多空间转录组文章:
1. 新版10X Visium
- 【10X空间转录组Visium】(一)Space Ranger 1.0.0(更新于20191205)
- 【10X空间转录组Visium】(二)Loupe Browser 4.0.0
- 【10X空间转录组Visium】(三)跑通Visium全流程记录
- 【10X空间转录组Visium】(四)R下游分析的探索性代码示例
- 【10X空间转录组Visium】(五)Visium原理、流程与产品
- 【10X空间转录组Visium】(六)新版Seurat v3.2分析Visium空间转录组结果的代码实操
- 【10X空间转录组Visium】(七)思考新版Seurat V3.2作者在Github给予的回答
2. 旧版Sptial
- 【旧版空间转录组Spatial】(一)ST Spot Detector使用指南
- 【旧版空间转录组Spatial】(二)跑通流程试验记录
- 【旧版空间转录组Spatial】(三)ST Spot Detector实操记录
一、运行st_pipeline
工作流程概要图
1.1 需要的输入文件
- FASTQ文件(读取1包含空间信息和UMI,读取2包含基因组序列)
- 用STAR生成的基因组索引
- GTF或GFF3格式的注释文件(使用转录组时可选)
- 包含条形码和数组坐标的文件(查看文件夹“ ids”并选择正确的一个)。基本上,此文件包含3列(BARCODE,X和Y)。如果数据不是条形码(例如RNA-Seq数据),则此文件也是可选的。
- 数据集的名称
ST管道具有多个参数,这些参数主要与修剪,映射和注释有关,但是通常默认值已经足够了。安装ST管道后,您可以看到键入“ st_pipeline_run.py --help”的参数的完整说明。
(base) [Robin@SC-201910280935 pipl_test]$ st_pipeline_run.py --help
usage: st_pipeline_run.py [-h] [--ids [FILE]] --ref-map [FOLDER]
[--ref-annotation [FILE]] --expName [STRING]
[--allowed-missed [INT]] [--allowed-kmer [INT]]
[--overhang [INT]]
[--min-length-qual-trimming [INT]]
[--mapping-rv-trimming [INT]]
[--contaminant-index [FOLDER]] [--qual-64]
[--htseq-mode [STRING]] [--htseq-no-ambiguous]
[--start-id [INT]] [--no-clean-up] [--verbose]
[--mapping-threads [INT]]
[--min-quality-trimming [INT]] [--bin-path [FOLDER]]
[--log-file [STR]] [--output-folder [FOLDER]]
[--temp-folder [FOLDER]]
[--umi-allowed-mismatches [INT]]
[--umi-start-position [INT]]
[--umi-end-position [INT]] [--keep-discarded-files]
[--remove-polyA [INT]] [--remove-polyT [INT]]
[--remove-polyG [INT]] [--remove-polyC [INT]]
[--remove-polyN [INT]] [--filter-AT-content [INT%]]
[--filter-GC-content [INT%]] [--disable-multimap]
[--disable-clipping]
[--umi-cluster-algorithm [STRING]]
[--min-intron-size [INT]] [--max-intron-size [INT]]
[--umi-filter] [--umi-filter-template [STRING]]
[--compute-saturation]
[--saturation-points SATURATION_POINTS [SATURATION_POINTS ...]]
[--include-non-annotated]
[--inverse-mapping-rv-trimming [INT]]
[--two-pass-mode] [--strandness [STRING]]
[--umi-quality-bases [INT]]
[--umi-counting-offset [INT]]
[--demultiplexing-metric [STRING]]
[--demultiplexing-multiple-hits-keep-one]
[--demultiplexing-trim-sequences DEMULTIPLEXING_TRIM_SEQUENCES [DEMULTIPLEXING_TRIM_SEQUENCES ...]]
[--homopolymer-mismatches [INT]]
[--star-genome-loading [STRING]]
[--star-sort-mem-limit STAR_SORT_MEM_LIMIT]
[--disable-barcode] [--disable-umi]
[--transcriptome] [--version]
fastq_files fastq_files
1.1 基础语法
1.2 运行测试程序看看能否跑通
$ cp -r test tests2
$ cd test2
$ mkdir index
$ cd /opt/st_pipeline/test2/config
$ gzip -d Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
# STAR比对
$ STAR --runThreadN 10 --runMode genomeGenerate --genomeDir ./index \
--genomeFastaFiles ./config/Homo_sapiens.GRCh38.dna.chromosome.19.fa \
--sjdbGTFfile ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf
# 运行st_pipeline_run.py
$ mkdir results
$ st_pipeline_run.py --expName test2 \
--ids ./config/idfiles/150204_arrayjet_1000L2_probes.txt \
--ref-map ./index --log-file log.txt --output-folder ./results
--ref-annotation ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf \
./input/arrayjet_1002/testdata_R1.fastq
./input/arrayjet_1002/testdata_R2.fastq
得到结果:
$ cd results/
$ ls
test2_reads.bed test2_stdata.tsv
二、运行Spatial Transcriptomics Analysis
(base) [Robin@SC-201910280935 data]$ unsupervised.py --help
usage: unsupervised.py [-h] --counts-table-files COUNTS_TABLE_FILES
[COUNTS_TABLE_FILES ...] [--normalization [STR]]
[--num-clusters [INT]] [--num-exp-genes [FLOAT]]
[--num-exp-spots [FLOAT]] [--min-gene-expression [INT]]
[--num-genes-keep [INT]] [--clustering [STR]]
[--dimensionality [STR]] [--use-log-scale]
[--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]]
[--image-files IMAGE_FILES [IMAGE_FILES ...]]
[--num-dimensions [INT]] [--spot-size [INT]]
[--top-genes-criteria [STR]] [--use-adjusted-log]
[--tsne-perplexity [INT]] [--tsne-theta [FLOAT]]
[--outdir OUTDIR] [--color-space-plots]
optional arguments:
-h, --help show this help message and exit
--counts-table-files COUNTS_TABLE_FILES [COUNTS_TABLE_FILES ...]
One or more matrices with gene counts per feature/spot (genes as columns)
--normalization [STR]
Normalize the counts using:
RAW = absolute counts
DESeq2 = DESeq2::estimateSizeFactors(counts)
DESeq2PseudoCount = DESeq2::estimateSizeFactors(counts + 1)
DESeq2Linear = DESeq2::estimateSizeFactors(counts, linear=TRUE)
DESeq2SizeAdjusted = DESeq2::estimateSizeFactors(counts + lib_size_factors)
RLE = EdgeR RLE * lib_size
TMM = EdgeR TMM * lib_size
Scran = Deconvolution Sum Factors (Marioni et al)
REL = Each gene count divided by the total count of its spot
(default: DESeq2)
--num-clusters [INT] The number of clusters/regions expected to be found.
If not given the number of clusters will be computed.
Note that this parameter has no effect with DBSCAN clustering.
--num-exp-genes [FLOAT]
The percentage of number of expressed genes (>= --min-gene-expression) a spot
must have to be kept from the distribution of all expressed genes (default: 1)
--num-exp-spots [FLOAT]
The percentage of number of expressed spots a gene
must have to be kept from the total number of spots (default: 1)
--clustering [STR] What clustering algorithm to use after the dimensionality reduction:
Hierarchical = Hierarchical Clustering (Ward)
KMeans = Suitable for small number of clusters
DBSCAN = Number of clusters will be automatically inferred
Gaussian = Gaussian Mixtures Model
(default: KMeans)
--dimensionality [STR]
What dimensionality reduction algorithm to use:
tSNE = t-distributed stochastic neighbor embedding
PCA = Principal Component Analysis
ICA = Independent Component Analysis
SPCA = Sparse Principal Component Analysis
(default: tSNE)
--use-log-scale Use log2(counts + 1) values in the dimensionality reduction step
--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]
One or more tab delimited files containing and alignment matrix for the images as
a11 a12 a13 a21 a22 a23 a31 a32 a33
Only useful is the image has extra borders, for instance not cropped to the array corners
or if you want the keep the original image size in the plots.
--image-files IMAGE_FILES [IMAGE_FILES ...]
When provided the data will plotted on top of the image
It can be one ore more, ideally one for each input dataset
It is desirable that the image is cropped to the array
corners otherwise an alignment file is needed
--num-dimensions [INT]
The number of dimensions to use in the dimensionality reduction (2 or 3). (default: 2)
--spot-size [INT] The size of the spots when generating the plots. (default: 20)
--top-genes-criteria [STR]
What criteria to use to keep top genes before doing
the dimensionality reduction (Variance or TopRanked) (default: Variance)
--use-adjusted-log Use adjusted log normalized counts (R Scater::normalized())
in the dimensionality reduction step (recommended with SCRAN normalization)
--tsne-perplexity [INT]
The value of the perplexity for the t-sne method. (default: 30)
--tsne-theta [FLOAT] The value of theta for the t-sne method. (default: 0.5)
--outdir OUTDIR Path to output dir
unsupervised.py --counts-table-files test2_stdata.tsv --normalization DESeq2 --num-clusters 5 \
--clustering KMeans --dimensionality tSNE --image-files HE_Rep6_MOB.jpg --use-log-scale