【旧版空间转录组Spatial】(二)跑通流程试验记录

旧号无端被封,小号再发一次

更多空间转录组文章:

1. 新版10X Visium
2. 旧版Sptial

一、运行st_pipeline

工作流程概要图

工作流程概要图

详细工作流程图

1.1 需要的输入文件

  • FASTQ文件(读取1包含空间信息和UMI,读取2包含基因组序列)
  • 用STAR生成的基因组索引
  • GTF或GFF3格式的注释文件(使用转录组时可选)
  • 包含条形码和数组坐标的文件(查看文件夹“ ids”并选择正确的一个)。基本上,此文件包含3列(BARCODE,X和Y)。如果数据不是条形码(例如RNA-Seq数据),则此文件也是可选的。
  • 数据集的名称

ST管道具有多个参数,这些参数主要与修剪,映射和注释有关,但是通常默认值已经足够了。安装ST管道后,您可以看到键入“ st_pipeline_run.py --help”的参数的完整说明。

(base) [Robin@SC-201910280935 pipl_test]$ st_pipeline_run.py --help
usage: st_pipeline_run.py [-h] [--ids [FILE]] --ref-map [FOLDER]
                          [--ref-annotation [FILE]] --expName [STRING]
                          [--allowed-missed [INT]] [--allowed-kmer [INT]]
                          [--overhang [INT]]
                          [--min-length-qual-trimming [INT]]
                          [--mapping-rv-trimming [INT]]
                          [--contaminant-index [FOLDER]] [--qual-64]
                          [--htseq-mode [STRING]] [--htseq-no-ambiguous]
                          [--start-id [INT]] [--no-clean-up] [--verbose]
                          [--mapping-threads [INT]]
                          [--min-quality-trimming [INT]] [--bin-path [FOLDER]]
                          [--log-file [STR]] [--output-folder [FOLDER]]
                          [--temp-folder [FOLDER]]
                          [--umi-allowed-mismatches [INT]]
                          [--umi-start-position [INT]]
                          [--umi-end-position [INT]] [--keep-discarded-files]
                          [--remove-polyA [INT]] [--remove-polyT [INT]]
                          [--remove-polyG [INT]] [--remove-polyC [INT]]
                          [--remove-polyN [INT]] [--filter-AT-content [INT%]]
                          [--filter-GC-content [INT%]] [--disable-multimap]
                          [--disable-clipping]
                          [--umi-cluster-algorithm [STRING]]
                          [--min-intron-size [INT]] [--max-intron-size [INT]]
                          [--umi-filter] [--umi-filter-template [STRING]]
                          [--compute-saturation]
                          [--saturation-points SATURATION_POINTS [SATURATION_POINTS ...]]
                          [--include-non-annotated]
                          [--inverse-mapping-rv-trimming [INT]]
                          [--two-pass-mode] [--strandness [STRING]]
                          [--umi-quality-bases [INT]]
                          [--umi-counting-offset [INT]]
                          [--demultiplexing-metric [STRING]]
                          [--demultiplexing-multiple-hits-keep-one]
                          [--demultiplexing-trim-sequences DEMULTIPLEXING_TRIM_SEQUENCES [DEMULTIPLEXING_TRIM_SEQUENCES ...]]
                          [--homopolymer-mismatches [INT]]
                          [--star-genome-loading [STRING]]
                          [--star-sort-mem-limit STAR_SORT_MEM_LIMIT]
                          [--disable-barcode] [--disable-umi]
                          [--transcriptome] [--version]
                          fastq_files fastq_files

1.1 基础语法

1.2 运行测试程序看看能否跑通

$ cp -r test tests2
$ cd test2
$ mkdir index
$ cd /opt/st_pipeline/test2/config
$ gzip -d Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
# STAR比对
$ STAR --runThreadN 10  --runMode genomeGenerate --genomeDir ./index \
--genomeFastaFiles ./config/Homo_sapiens.GRCh38.dna.chromosome.19.fa \
--sjdbGTFfile ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf
# 运行st_pipeline_run.py
$ mkdir results
$ st_pipeline_run.py --expName test2 \
     --ids ./config/idfiles/150204_arrayjet_1000L2_probes.txt \
     --ref-map ./index --log-file log.txt  --output-folder ./results 
     --ref-annotation ./config/annotations/Homo_sapiens.GRCh38.79_chr19.gtf  \                  
     ./input/arrayjet_1002/testdata_R1.fastq 
     ./input/arrayjet_1002/testdata_R2.fastq  

得到结果:

$ cd results/
$ ls
test2_reads.bed  test2_stdata.tsv

二、运行Spatial Transcriptomics Analysis

(base) [Robin@SC-201910280935 data]$ unsupervised.py --help
usage: unsupervised.py [-h] --counts-table-files COUNTS_TABLE_FILES
                       [COUNTS_TABLE_FILES ...] [--normalization [STR]]
                       [--num-clusters [INT]] [--num-exp-genes [FLOAT]]
                       [--num-exp-spots [FLOAT]] [--min-gene-expression [INT]]
                       [--num-genes-keep [INT]] [--clustering [STR]]
                       [--dimensionality [STR]] [--use-log-scale]
                       [--alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]]
                       [--image-files IMAGE_FILES [IMAGE_FILES ...]]
                       [--num-dimensions [INT]] [--spot-size [INT]]
                       [--top-genes-criteria [STR]] [--use-adjusted-log]
                       [--tsne-perplexity [INT]] [--tsne-theta [FLOAT]]
                       [--outdir OUTDIR] [--color-space-plots]


optional arguments:
  -h, --help            show this help message and exit
  --counts-table-files COUNTS_TABLE_FILES [COUNTS_TABLE_FILES ...]
                        One or more matrices with gene counts per feature/spot (genes as columns)
  --normalization [STR]
                        Normalize the counts using:
                        RAW = absolute counts
                        DESeq2 = DESeq2::estimateSizeFactors(counts)
                        DESeq2PseudoCount = DESeq2::estimateSizeFactors(counts + 1)
                        DESeq2Linear = DESeq2::estimateSizeFactors(counts, linear=TRUE)
                        DESeq2SizeAdjusted = DESeq2::estimateSizeFactors(counts + lib_size_factors)
                        RLE = EdgeR RLE * lib_size
                        TMM = EdgeR TMM * lib_size
                        Scran = Deconvolution Sum Factors (Marioni et al)
                        REL = Each gene count divided by the total count of its spot
                        (default: DESeq2)
  --num-clusters [INT]  The number of clusters/regions expected to be found.
                        If not given the number of clusters will be computed.
                        Note that this parameter has no effect with DBSCAN clustering.
  --num-exp-genes [FLOAT]
                        The percentage of number of expressed genes (>= --min-gene-expression) a spot
                        must have to be kept from the distribution of all expressed genes (default: 1)
  --num-exp-spots [FLOAT]
                        The percentage of number of expressed spots a gene
                        must have to be kept from the total number of spots (default: 1)
  --clustering [STR]    What clustering algorithm to use after the dimensionality reduction:
                        Hierarchical = Hierarchical Clustering (Ward)
                        KMeans = Suitable for small number of clusters
                        DBSCAN = Number of clusters will be automatically inferred
                        Gaussian = Gaussian Mixtures Model
                        (default: KMeans)
  --dimensionality [STR]
                        What dimensionality reduction algorithm to use:
                        tSNE = t-distributed stochastic neighbor embedding
                        PCA = Principal Component Analysis
                        ICA = Independent Component Analysis
                        SPCA = Sparse Principal Component Analysis
                        (default: tSNE)
  --use-log-scale       Use log2(counts + 1) values in the dimensionality reduction step
  --alignment-files ALIGNMENT_FILES [ALIGNMENT_FILES ...]
                        One or more tab delimited files containing and alignment matrix for the images as
                                 a11 a12 a13 a21 a22 a23 a31 a32 a33
                        Only useful is the image has extra borders, for instance not cropped to the array corners
                        or if you want the keep the original image size in the plots.
  --image-files IMAGE_FILES [IMAGE_FILES ...]
                        When provided the data will plotted on top of the image
                        It can be one ore more, ideally one for each input dataset
                         It is desirable that the image is cropped to the array
                        corners otherwise an alignment file is needed
  --num-dimensions [INT]
                        The number of dimensions to use in the dimensionality reduction (2 or 3). (default: 2)
  --spot-size [INT]     The size of the spots when generating the plots. (default: 20)
  --top-genes-criteria [STR]
                        What criteria to use to keep top genes before doing
                        the dimensionality reduction (Variance or TopRanked) (default: Variance)
  --use-adjusted-log    Use adjusted log normalized counts (R Scater::normalized())
                        in the dimensionality reduction step (recommended with SCRAN normalization)
  --tsne-perplexity [INT]
                        The value of the perplexity for the t-sne method. (default: 30)
  --tsne-theta [FLOAT]  The value of theta for the t-sne method. (default: 0.5)
  --outdir OUTDIR       Path to output dir

unsupervised.py --counts-table-files test2_stdata.tsv --normalization DESeq2 --num-clusters 5 \
     --clustering KMeans --dimensionality tSNE --image-files HE_Rep6_MOB.jpg --use-log-scale 
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 206,602评论 6 481
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 88,442评论 2 382
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 152,878评论 0 344
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 55,306评论 1 279
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 64,330评论 5 373
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,071评论 1 285
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,382评论 3 400
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,006评论 0 259
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 43,512评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,965评论 2 325
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,094评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,732评论 4 323
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,283评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,286评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,512评论 1 262
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,536评论 2 354
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,828评论 2 345

推荐阅读更多精彩内容