刚发了time C-index,收到一条留言
就是机器学习算法应该分训练集与测试集,上次分享的时单个模型,或者多个模型的time C-index对比,这次分享的是同一个模型的训练集和测试集的time C-index对比。
新鲜出炉,现学现卖,代码参考自pec::cindex帮助文档,里面有提到第一个参数object:
| A named list of prediction models, where allowed entries are (1) R-objects for which a [predictSurvProb] method exists (see details), (2) a
call
that evaluates to such an R-object (see examples), (3) a matrix with predicted probabilities having as many rows asdata
and as many columns astimes
. For cross-validation all objects in this list must include theircall
.
而predictSurvProb是一个用于预测的函数。
rm(list = ls())
library(rms)
library(pec)
library(ggplot2)
library(prodlim)
编造三个示例数据,一个做训练集,两个做测试集,SimSurv是个方便的编生存数据的函数
set.seed(13)
dat <- SimSurv(100)
head(dat)
## eventtime censtime time event X1 X2 status
## 1 3.068009 24.716896 3.068009 1 0 0.5543269 1
## 2 9.666322 17.105853 9.666322 1 1 -0.2802719 1
## 3 1.200405 2.101732 1.200405 1 1 1.7751634 1
## 4 2.749020 5.286182 2.749020 1 1 0.1873201 1
## 5 5.974245 14.870069 5.974245 1 0 1.1425261 1
## 6 1.853016 7.541804 1.853016 1 1 0.4155261 1
set.seed(100)
test1 <- SimSurv(100)
set.seed(14)
test2 <- SimSurv(100)
fit <- cph(Surv(time,status)~X1+X2,data=dat,x=TRUE,y=TRUE,surv = T)
Cpec <- pec::cindex(fit,
formula=Surv(time,status)~1,
data=dat)
times <- c(1, 3, 5, 7, 10)
p1 <- predictSurvProb(fit,newdata=test1,times=times)
p2 <- predictSurvProb(fit,newdata=test2,times=times)
Cpec2 <- pec::cindex(list(train =fit,
test1= p1,
test2= p2),
formula=Surv(time,status)~1,
eval.times = times,
data=dat)
Cpec2$AppCindex
## $train
## [1] 0.7544231 0.7481185 0.7325740 0.7440132 0.7357568
##
## $test1
## [1] 0.4032685 0.5604834 0.4831370 0.5171105 0.5183455
##
## $test2
## [1] 0.6932702 0.6020585 0.5202996 0.5085714 0.4999364
plot(Cpec2)
用ggplot2画更好看的版本,cindex$AppCindex里面是3个向量
Cpec2$AppCindex
## $train
## [1] 0.7544231 0.7481185 0.7325740 0.7440132 0.7357568
##
## $test1
## [1] 0.4032685 0.5604834 0.4831370 0.5171105 0.5183455
##
## $test2
## [1] 0.6932702 0.6020585 0.5202996 0.5085714 0.4999364
所以需要宽变长才能给ggplot2用
cindex_df <- data.frame(
Time = times,
do.call(cbind,Cpec2$AppCindex)
)
cindex_df
## Time train test1 test2
## 1 1 0.7544231 0.4032685 0.6932702
## 2 3 0.7481185 0.5604834 0.6020585
## 3 5 0.7325740 0.4831370 0.5202996
## 4 7 0.7440132 0.5171105 0.5085714
## 5 10 0.7357568 0.5183455 0.4999364
#宽变长
library(tidyr)
dat = pivot_longer(cindex_df,cols = 2:4,
names_to = "model",
values_to = "cindex")
head(dat)
## # A tibble: 6 × 3
## Time model cindex
## <dbl> <chr> <dbl>
## 1 1 train 0.754
## 2 1 test1 0.403
## 3 1 test2 0.693
## 4 3 train 0.748
## 5 3 test1 0.560
## 6 3 test2 0.602
library(ggplot2)
ggplot(dat, aes(x = Time, y = cindex)) +
geom_line(aes(color = model),linewidth = 2) +
scale_color_brewer(palette = "Set1")+
geom_hline(yintercept = 0.5,linetype = 4)+
ylim(0.4,1)+
labs(title = "Time-dependent C-index", x = "Time (years)", y = "C-index") +
theme_bw()