【生信分析】-文本挖掘目标基因+评估致癌能力,你造嘛?

语雀:左手柳叶刀右手炭火烧
微信公众号:研平方 | 简书:研平方
关注可了解更多的科研教程及技巧。如有问题或建议,请留言。
欢迎关注我:一起学习,一起进步!

最近,小编“扫荡”文献时,发现一个令我十分感兴趣的应用,利用文本文本挖掘技术可以评估选定基因与癌症之间的关联。提到文本挖掘这类技术,小编当然要一探究竟了。

1.原文如下

Literature evidence for the identified target genes in cancer

We used OncoScore, a text mining tool to assess the associations between each gene and specific cancers based on the literature. A cutoff value of 21.09 was suggested to determine true positives and the true negatives in cancer gene identification.

2.查找资料

习惯性的打开浏览器,准备打破砂锅问到底,惊喜的发现,OncoScore竟然是一个写好的R包,而且放在了Bioconductor网页,可直接进行安装、使用。虽然文章发在了Sci Rep杂志上,但是小编认为还是值得一试。

image
image

3.它能干什么

The OncoScore analysis consists of two parts. One can estimate a score to asses the
oncogenic potential of a set of genes, given the lecterature knowledge, at the time of the
analysis, or one can study the trend of such score over time.

可见,OncoScore不仅可以依据文献中的知识,对一组设定目标基因列表的致癌能力进行评分,还可以研究这个分数随时间的趋势。

4.开始表演,拿好小板凳看戏

4.1 准备工作

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("OncoScore")

# load the library
library(OncoScore)
# Define a query
query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2"))

### Starting the queries for the selected genes.

### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 923 
    Number of papers found in PubMed for IDH1 was: 3691 
    Number of papers found in PubMed for IDH2 was: 1318 
    Number of papers found in PubMed for SETBP1 was: 177 
    Number of papers found in PubMed for TET2 was: 1609 

### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 1018 
    Number of papers found in PubMed for IDH1 was: 3902 
    Number of papers found in PubMed for IDH2 was: 1499 
    Number of papers found in PubMed for SETBP1 was: 229 
    Number of papers found in PubMed for TET2 was: 2117

以上我们可以发现,通过检索,得到了癌症相关研究的文献数量,以及所有与检索基因相关文献数量。

OncoScore provides a function to merge gene names if requested by the user. This function is useful when there are aliases in the gene list.

combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene')
         CitationsGene CitationsGeneInCancer
ASXL1             1018                   923
SETBP1             229                   177
TET2              2117                  1609
new_gene          5401                  5009

当然,OncoScore还可以依据染色体信息检索基因。这里不再演示。

4.2 重点来啦

4.2.1 开始计算基因的致癌评分
result = compute.oncoscore(query)

### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 81.59349 
     IDH1 -> 86.66355 
     IDH2 -> 79.59096 
     SETBP1 -> 67.43283 
     TET2 -> 69.12424
4.2.2 时间趋势分析(OncoScore timeline analysis)
query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"),
                                            c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01"))

### Starting the queries for the selected genes.
### Quering PubMed for timepoint 2012/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 86 
    Number of papers found in PubMed for IDH1 was: 409 
    Number of papers found in PubMed for IDH2 was: 173 
    Number of papers found in PubMed for SETBP1 was: 5 
    Number of papers found in PubMed for TET2 was: 173 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 92 
    Number of papers found in PubMed for IDH1 was: 489 
    Number of papers found in PubMed for IDH2 was: 235 
    Number of papers found in PubMed for SETBP1 was: 10 
    Number of papers found in PubMed for TET2 was: 197 
### Quering PubMed for timepoint 2013/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 135 
    Number of papers found in PubMed for IDH1 was: 662 
    Number of papers found in PubMed for IDH2 was: 267 
    Number of papers found in PubMed for SETBP1 was: 11 
    Number of papers found in PubMed for TET2 was: 258 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 150 
    Number of papers found in PubMed for IDH1 was: 753 
    Number of papers found in PubMed for IDH2 was: 336 
    Number of papers found in PubMed for SETBP1 was: 18 
    Number of papers found in PubMed for TET2 was: 303 
### Quering PubMed for timepoint 2014/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 188 
    Number of papers found in PubMed for IDH1 was: 904 
    Number of papers found in PubMed for IDH2 was: 365 
    Number of papers found in PubMed for SETBP1 was: 29 
    Number of papers found in PubMed for TET2 was: 347
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 209 
    Number of papers found in PubMed for IDH1 was: 1003 
    Number of papers found in PubMed for IDH2 was: 440 
    Number of papers found in PubMed for SETBP1 was: 36 
    Number of papers found in PubMed for TET2 was: 431 
### Quering PubMed for timepoint 2015/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 257 
    Number of papers found in PubMed for IDH1 was: 1198 
    Number of papers found in PubMed for IDH2 was: 468 
    Number of papers found in PubMed for SETBP1 was: 51 
    Number of papers found in PubMed for TET2 was: 461 
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 286 
    Number of papers found in PubMed for IDH1 was: 1304 
    Number of papers found in PubMed for IDH2 was: 551 
    Number of papers found in PubMed for SETBP1 was: 66 
    Number of papers found in PubMed for TET2 was: 583 
### Quering PubMed for timepoint 2016/03/01 
    ### Performing queries for cancer literature 
    Number of papers found in PubMed for ASXL1 was: 323 
    Number of papers found in PubMed for IDH1 was: 1506 
    Number of papers found in PubMed for IDH2 was: 569 
    Number of papers found in PubMed for SETBP1 was: 68 
    Number of papers found in PubMed for TET2 was: 587
    ### Performing queries for all the literature 
    Number of papers found in PubMed for ASXL1 was: 359 
    Number of papers found in PubMed for IDH1 was: 1625 
    Number of papers found in PubMed for IDH2 was: 661 
    Number of papers found in PubMed for SETBP1 was: 89 
    Number of papers found in PubMed for TET2 was: 745 

perform.query.timeseries ()函数检索了几个设定时间的文献数据信息。

result.timeseries = compute.oncoscore.timeseries(query.timepoints)

### Computing oncoscore for timepoint 2012/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 79.14893 
     IDH1 -> 74.27776 
     IDH2 -> 64.27063 
     SETBP1 -> 34.9485 
     TET2 -> 76.29579 
### Computing oncoscore for timepoint 2013/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 77.54983 
     IDH1 -> 78.71551 
     IDH2 -> 69.99559 
     SETBP1 -> 46.4559 
     TET2 -> 74.81894 
### Computing oncoscore for timepoint 2014/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 78.28121 
     IDH1 -> 81.08963 
     IDH2 -> 73.50788 
     SETBP1 -> 64.97398 
     TET2 -> 71.31087 
     ### Computing oncoscore for timepoint 2015/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 78.84769 
     IDH1 -> 82.99363 
     IDH2 -> 75.60886 
     SETBP1 -> 64.48853 
     TET2 -> 70.46695 
### Computing oncoscore for timepoint 2016/03/01 
### Processing data
### Computing frequencies scores 
### Estimating oncogenes
### Results:
     ASXL1 -> 79.37202 
     IDH1 -> 83.9881 
     IDH2 -> 76.89328 
     SETBP1 -> 64.60591 
     TET2 -> 70.53378 
4.2.3 可视化
## Oncogenetic potential of the considered genes
plot.oncoscore(result, col = 'darkblue')

## Absolute values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries)

## Variations of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
                          incremental = TRUE,
                          ylab='absolute variation')

## Variations as relative values of the oncogenetic potential of the considered genes over times
plot.oncoscore.timeseries(result.timeseries,
                          incremental = TRUE,
                          relative = TRUE,
                          ylab='relative variation')
plot1.png
plot2.png
plot3.png
plot4.png

温馨提示:语雀上的阅读,体验更佳!

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 199,440评论 5 467
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 83,814评论 2 376
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 146,427评论 0 330
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 53,710评论 1 270
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 62,625评论 5 359
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,014评论 1 275
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,511评论 3 390
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,162评论 0 254
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,311评论 1 294
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,262评论 2 317
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,278评论 1 328
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,989评论 3 316
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,583评论 3 303
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,664评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,904评论 1 255
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 42,274评论 2 345
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 41,856评论 2 339

推荐阅读更多精彩内容