jimmy布置的个R语言中级作业中提到了几个R包,我查找了Bioconductor中org.Hs.eg.db包的使用说明书,为了自己更好的理解和应用,做了以下笔记。
首先了解一下基因芯片。 应用基因芯片可以直接检测mRNA的种类和丰度,基因芯片的原理是基于DNA的碱基配对,采用一段已知序列的核酸作为探针(prob)来检测与之配对的核酸序列。 根据探针制备和固定技术的不同,基因芯片主要分为两类 (1)寡核苷酸芯片(oligonucleotide microarray) (2)cDNA芯片(printed cDNA microarray)
在Bioconductor中有很多基因注释R包,其中org.Hs.eg.db就是一个人类基因的注释包,大多数注释包是在于AnnotationDb 的基础上改进的。
> if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
> BiocManager::install()
BiocManager::install("org.Hs.eg.db") # 下载安装 org.Hs.eg.db包,同时也会安装相应的依赖包
> ls("package:org.Hs.eg.db") # 查看有哪些包
"org.Hs.eg" # Bioconductor annotation data package
"org.Hs.eg.db" # Bioconductor annotation data package
"org.Hs.eg_dbconn" #
"org.Hs.eg_dbfile"
"org.Hs.eg_dbInfo"
"org.Hs.eg_dbschema"
"org.Hs.egACCNUM" #Map Entrez Gene identifiers to GenBank Accession Numbers
"org.Hs.egACCNUM2EG"
"org.Hs.egALIAS2EG" #Map between Common Gene Symbol Identifiers and Entrez Gene
"org.Hs.egCHR" # Map Entrez Gene IDs to Chromosomes
"org.Hs.egCHRLENGTHS" # A named vector for the length of each of the chromosomes
"org.Hs.egCHRLOC" # Entrez Gene IDs to Chromosomal Location
"org.Hs.egCHRLOCEND"
"org.Hs.egENSEMBL" # Map Ensembl gene accession numbers with Entrez Gene identifiers
"org.Hs.egENSEMBL2EG"
"org.Hs.egENSEMBLPROT" #Map Ensembl protein acession numbers with Entrez Gene identifiers
"org.Hs.egENSEMBLPROT2EG"
"org.Hs.egENSEMBLTRANS" # Map Ensembl transcript acession numbers with Entrez Gene identifiers
"org.Hs.egENSEMBLTRANS2EG"
"org.Hs.egENZYME" # Map between Entrez Gene IDs and Enzyme Commission (EC) Numbers
"org.Hs.egENZYME2EG"
"org.Hs.egGENENAME" # Map between Entrez Gene IDs and Genes
"org.Hs.egGO" # Maps between Entrez Gene IDs and Gene Ontology (GO) IDs
"org.Hs.egGO2ALLEGS"
"org.Hs.egGO2EG"
"org.Hs.egMAP" # Map between Entrez Gene Identifiers and cytogenetic maps/bands
"org.Hs.egMAP2EG"
"org.Hs.egMAPCOUNTS" # Number of mapped keys for the maps in package org.Hs.eg.db
"org.Hs.egOMIM" # Map between Entrez Gene Identifiers and Mendelian Inheritance in Man (MIM) identifiers
"org.Hs.egOMIM2EG"
"org.Hs.egORGANISM" # The Organism for org.Hs.eg
"org.Hs.egPATH" # Mappings between Entrez Gene identifiers and KEGG pathway identifiers
"org.Hs.egPATH2EG"
"org.Hs.egPFAM" #Maps between Manufacturer Identifiers and PFAM Identifiers
"org.Hs.egPMID" # Map between Entrez Gene Identifiers and PubMed Identifiers
"org.Hs.egPMID2EG"
"org.Hs.egPROSITE" # Maps between Manufacturer Identifiers and PROSITE Identifiers
"org.Hs.egREFSEQ" # Map between Entrez Gene Identifiers and RefSeq Identifiers
"org.Hs.egREFSEQ2EG"
"org.Hs.egSYMBOL" # Map between Entrez Gene Identifiers and Gene Symbols
"org.Hs.egSYMBOL2EG"
"org.Hs.egUCSCKG" # This mapping has been deprecated and will no longer be available after bioconductor 2.6\. See the details section for how you can live without it. For now, it is a map of UCSC "Known Gene" accession numbers with Entrez Gene identifiers
"org.Hs.egUNIGENE" #Map between Entrez Gene Identifiers and UniGene cluster identifiers
"org.Hs.egUNIGENE2EG"
"org.Hs.egUNIPROT" #Map Uniprot accession numbers with Entrez Gene identifiers
参考官方文档中给的例子,在Rstudio中跑几段代码并理解。
## select() interface: 使用select()函数
## Objects in this package can be accessed using the select() interface
## from the AnnotationDbi package. See ?select for details. 用AnnotationDbi包查看更详细的select()函数
## Bimap interface:
x <- org.Hs.egACCNUM #创建一个映射
# Get the entrez gene identifiers that are mapped to an ACCNUM
mapped_genes <- mappedkeys(x) #将得到的Entrez Gene identifiers 与 GenBank Accession Numbers进行map
# Convert to a list
xx <- as.list(x[mapped_genes]) #as.函数转换成list列表形式
if(length(xx) > 0) {
# Get the ACCNUM for the first five genes
xx[1:5] #获取前5个基因
# Get the first one
xx[[1]] #获取第一个
}
#For the reverse map ACCNUM2EG:
# Convert to a list
xx <- as.list(org.Hs.egACCNUM2EG)
if(length(xx) > 0){
# Gets the entrez gene identifiers for the first five Entrez Gene IDs
xx[1:5]
# Get the first one
xx[[1]]
}
以上。
入门生信最快方式请搜索生信技能树
- 生信技能树全球公益巡讲
https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g - B站公益74小时生信工程师教学视频合辑https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
- 招学徒
https://mp.weixin.qq.com/s/KgbilzXnFjbKKunuw7NVfw