在科学论文中,我们经常要用到热图。我们在热图在单细胞数据分析中的应用比较系统地介绍了热图的一般规则。但是在实际操作中还是会遇到一些细节问题,如标签顺序。
我们知道一个好的热图,要能反映出数据规律,直觉上就是要有明显的色块。那么色块是如何来的呢?和行与列的顺序有关。如一张好的热图大概率是这样的:
但是如果我们调整顺序,他可以变成这样的:
对我们来说重要的是获得这个顺序,然后指定给绘图函数。我们以熟悉的pheatmap为例来探索一下。首先生成示例数据:
library(pheatmap)
# Create test matrix
test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = c(paste("CGene", 6:10, sep = ""),
paste("AGene", 1:5, sep = ""),
paste("BGene", 11:15, sep = ""),
paste("DGene", 16:20, sep = ""))
看看数据长什么样子:
test
Test1 Test2 Test3 Test4 Test5 Test6
CGene6 3.32676462 -2.16507595 4.232450403 -0.73583213 3.94062305 -0.1935842619
CGene7 3.30040713 0.08865765 3.721572091 0.33449053 2.73292952 0.4583932832
CGene8 1.50450030 0.64337406 3.407904162 -1.24057682 3.14174263 -0.0007014311
CGene9 2.89088634 -0.55950670 2.060130582 -1.75583323 1.07926694 2.2556162284
CGene10 1.90369857 1.40255666 1.760750107 -0.76906325 2.64811141 -0.6957942691
AGene1 3.42061019 1.14950064 4.268530703 0.05037557 1.84633305 0.5137683525
AGene2 2.88919835 -1.02100837 2.957415715 -1.09980021 3.67011986 0.7510053428
AGene3 5.24239748 0.02736920 4.045355782 -0.08883342 4.06748687 -0.9685845021
AGene4 2.19006433 1.37861550 2.337982108 -0.94394769 3.83553785 -0.8334859349
AGene5 4.48235967 -1.48192686 5.028429364 0.15901242 3.49067895 0.5836504001
BGene11 0.93281128 0.60297065 0.877725891 2.68570163 -0.52096014 0.5303119758
BGene12 -0.82352032 4.13015350 -0.007314182 2.56230292 -1.22882126 2.0095278472
BGene13 1.07999506 2.00713092 1.185458666 1.13050138 0.15584559 2.3795046412
BGene14 1.11955349 2.84165755 0.220021162 1.63569739 0.99095614 3.3335572441
BGene15 1.77628153 6.37128696 1.004835310 4.90696601 0.75322787 5.3301565398
DGene16 0.04400472 6.33588183 -1.293469424 5.43806241 0.53726670 6.2000870073
DGene17 -2.63598249 7.79111111 -0.204355079 6.85814507 -0.87600545 6.8738334335
DGene18 -0.48197063 6.21941112 0.841207756 6.19352280 0.12741642 6.0838277426
DGene19 0.96229006 5.79064015 1.319576057 7.18360581 -0.05522554 5.6089813401
DGene20 -1.42032585 4.29067156 0.589306112 5.99965957 0.43606552 5.7949180143
Test7 Test8 Test9 Test10
CGene6 4.138778102 1.6304399 1.4972186 0.6664516
CGene7 4.205202621 0.7133720 1.3688061 0.4749147
CGene8 3.675146838 -0.8371708 2.4173558 -0.8573423
CGene9 0.911284470 0.6367740 1.8973446 0.5885573
CGene10 2.381027675 2.0743930 3.6874262 1.1493406
AGene1 3.416045270 -0.0662255 2.1358439 -1.3471116
AGene2 4.091088541 0.2684579 3.6841199 -1.7729912
AGene3 2.746024503 0.3570507 2.2417769 -0.1226907
AGene4 2.734958681 -0.7147136 1.8119604 -0.9273917
AGene5 2.131046458 0.5774685 4.1504215 -1.0478849
BGene11 0.367833875 1.5309153 -1.0897623 3.3879448
BGene12 0.003437035 1.1982992 -1.1184832 1.2544010
BGene13 0.124903765 2.0180698 -1.1180846 4.0343573
BGene14 -1.623291426 2.4192553 -1.3206414 0.7060437
BGene15 0.576155533 7.4567201 1.3057335 5.6594995
DGene16 0.542256420 5.8187826 -1.6232905 7.1829024
DGene17 -0.711543153 7.1164359 -0.8563482 7.9621794
DGene18 0.632542083 5.9143762 -0.9905354 7.6225081
DGene19 -0.659880146 5.0144296 -0.5088869 4.9703428
DGene20 -0.445718763 4.8705198 -1.5070905 6.2237708
默认参数:
p1 <- pheatmap(test, main = "pheatmap")
这时的顺序是按聚类顺序来的。
p2 <- pheatmap((test),cluster_row = FALSE,main = "cluster_row = FALSE")
不聚类时,行顺序就是我们的输入矩阵的数据顺序。
我们把行名按字母排个序。
p3<- pheatmap(test[order(rownames(test)),],cluster_row = FALSE,main = 'test[order(rownames(test)),]\ncluster_row = FALSE')
这时候就是字母序了。
有时候,我们只想留下聚类结果,并不想展示聚类轴,怎么办呢?
nr=rownames(test)[p1$tree_row[["order"]]]
nr # 可以把这个顺序传递给Doheatmap
[1] "DGene17" "DGene16" "DGene18" "BGene15" "DGene19" "DGene20" "BGene11" "BGene13"
[9] "BGene12" "BGene14" "CGene6" "AGene3" "AGene5" "CGene8" "AGene4" "AGene2"
[17] "CGene7" "AGene1" "CGene9" "CGene10"
nc=colnames(test)[p1$tree_col[["order"]]]
p4<-pheatmap(test[nr,nc], main = "pheatmap/nremove cluster lable",cluster_rows = F)
最后,我们把这四张图拼在一起,对读着有个交代。
require(ggplotify)
p1 = as.ggplot(p1)
p2 = as.ggplot(p2)
p3 = as.ggplot(p3)
p4 = as.ggplot(p4)
p12 <-cowplot:: plot_grid(p1, p2, labels = c('A', 'B'), align = 'h',
rel_widths = c(1, 1.3))
p34 = cowplot::plot_grid(p3, p4, labels = c('C', 'D'), align = 'h',
rel_widths = c(1, 1.3))
comb = cowplot::plot_grid(p12,p34, ncol = 1,
rel_heights = c(1, 1))
DoHeatmap clustering specific genes and not top x genes #2261
继续来看pheatmap那些有趣的事情
热图如何去掉聚类树的同时保留聚类的顺序?
【r<-ggplot2】cowplot在网格中排列图形
Arranging plots in a grid
https://github.com/satijalab/seurat/issues/2222