写在前面。
在临床试验统计编程的工作中,像作者这样的菜鸟小白多数时候在做“填参工具人”
,也就是,做好ADaM
数据集,根据公司的公共宏程序说明
填写参数,即可产生各种统计表。
菜鸟小白现在准备利用业余时间学习编写宏程序,公司公共宏的源代码我是接触不到的,所以准备根据自己工作中的理解,去编写复现具有同样功能的宏程序。
本文记录编写的宏(macro)
的功能是:按系统术语
和首选术语
分层次计算受试者发生不良事件/反应(AE)
例数和例次;命名为AESOCPT
。
本文内容包括:
- 目标表格拆分
- 示例数据
- 宏程序参数
- 宏程序的结构
- 宏程序的编写
1. 目标表格拆分
上图是从adae
数据集产生的根据系统术语
和首选术语
统计受试者发生AE
的例数和例次统计表
。
- 从
行
的维度来看,需要统计的数据包括:
- 所有受试者的发生的例数例次的合计;
- 根据系统术语,每种系统术语的例数例次的合计;
- 根据系统术语和首选术语,分别计算的例数例次。
- 从
列
的维度来看,所呈现的信息是:
- 首列是统计时分层次的变量;
- 之后依次是各个试验分组的例数和例次;
- 最后2列是所有组合计的例数和例次。
需要注意的是,例数分组计算发生率,例次不计算发生率。
同时,同一受试者发生了同一SOC同一PT的AE多次,例数的计算中算作1次,例次可以算作多次。
2. 示例数据
该统计表一般用来统计adae
的数据,另外还需要从adsl
数据来获得受试者总人数
和各分组的人数
。
如下程序用来产生示例数据
,方法很多不唯一。
%let seed1 = 111111111;
data adae;
do ii = 1 to 3;
armn = ii;
arm = cats("第",put(ii, best.) ,"组");
do jj = 1 to 100;
usubjid = cats( "X",put(ii, best.) ,"-", put(jj, z3.));
soc = cats( "SOC",put(ranbin( &seed1., 5, 0.2) + 1, best.) );
pt = cats( "PT", compress(soc, , "kd") , put(ranbin( &seed1., 10, 0.1) + 1, best.) );
output;
end;
end;
run;
data adsl;
do ii = 1 to 3;
armn = ii;
arm = cats("第",put(ii, best.) ,"组");
do jj = 1 to 100;
usubjid = cats( "X",put(ii, best.) ,"-", put(jj, z3.));
output;
end;
end;
run;
3. 宏程序参数
SAS
处理数据的载体是数据集,那么肯定需要输入数据集
和它所在的逻辑库
,以及输出数据集
和它所在的逻辑库
,我分别命名为,libin
,dtin
,libout
,dtout
;
而要统计人数,那么还需要adsl
数据集和受试者编号USUBJID
,以及分组变量grpvarn
,注意,grpvarn
是数值型变量
,需要根据分组信息进行转换限定;
最重要的是,既然是统计系统术语
和首选术语
,那么自然还需要指定SOC
和PT
变量;
最后,如果需要选择是否计算行合计或者列合计
,可能还要设置变量rowsumyn
和colsumyn
,它们限定的可选参数
为Y
或者N
。
这个宏
,我命名为AESOCPT
:
%AESOCPT(libin=work ,
dtin = adae ,
adsl = adsl ,
usubjid = usubjid ,
l1var =soc ,
l2var = pt,
grpvarn = armn,
rowsumyn = Y,
colsumyn =Y ,
libout =work ,
dtout = table);
4. 宏程序的结构
我还是写宏程序的菜鸟,属于不断实践、探索和学习的过程。
目前我将宏程序的整体结构设计为如下几大步:
*_1. pre-processing;
*_2.main statistical step;
*_3 processing step of stat;
* _4.output steps;
- 第一步:预处理
在这一步,我主要会进行宏变量的处理和产生
,以及输入数据集的处理
,在这个宏的编写中,包括:
*_1. pre-processing;
*_1.1 macro variables;
*subjid number;
*_1.2 input datasets processing;
*_1.2.1 for times of case;
*_1.2.2 for number of case;
- 受试者数量的宏变量的生成;
- 用于例次计算的输入数据集处理
- 用于例数计算的输入数据集处理
- 第二步:主要统计步
*_2.stat statistical step;
*_2.1 number of case;
*_2.2 times of case;
*_2.3 caculation the sum for each row;
*_2.4 caculation the sum for each column;
在核心的统计步中,拆分为如下的几个小步骤:
- 例数计算
- 例次计算
- 计算每行的合计
- 计算每列的合计
- 第三步:统计后的处理
- 第四步:输出步骤
5. 宏程序的编写
下面是我编写这个宏的全部代码的展示,菜鸟一枚,如有疏漏,还望见谅。
5.1 预处理
5.1.1 宏变量的赋值
首先,按照我编写宏程序的结构步骤,先进行总的受试者和分组受试者数量的宏变量的赋值。
*_1. pre-processing;
*_1.1 macro variables;
*subjid number;
proc sql noprint;
select count(distinct &grpvarn.) , count(distinct &usubjid.) into: grpnum, : SUBN999 from &adsl.;
quit;
%put 受试者数量:&SUBN999. 分组数量:&grpnum.;
%do xx = 1 %to &grpnum.;
proc sql noprint;
select count(distinct &usubjid.) into:SUBN&xx. from &adsl. where &grpvarn. = &xx.;
quit;
%put &grpvarn. = &xx.组的受试者数量: &&SUBN&xx.;
%end;
5.1.2 输入数据集处理
*_1.2 input datasets processing;
data stdt0;
set &libin..&dtin.;
run;
proc sort data=stdt0 out=socn_ nodupkey;
by &l1var.;
run;
data &l1var.n;
set socn_;
&l1var.n = _N_;
proc sort;
by &l1var.;
run;
proc sort data=stdt0;
by &l1var.;
run;
5.1.2.1 用于例次计算数据集处理
*_1.2.1 for times of case;
data times1;
merge stdt0
&l1var.n;
by &l1var.;
run;
data times2;
set times1;
&l1var. = "合计";
&l1var.n = 0;
&l2var. = "合计";
run;
5.1.2.2 用于例数计算数据集处理
*_1.2.2 for number of case;
data case1;
merge stdt0
&l1var.n;
by &l1var.;
run;
proc sort data=case1 out=cs1nodup nodup dupout=cs1dup;
by &usubjid. &l1var.n &l1var. &l2var.;
run;
proc sort data=case1 out=cs2nodup nodup dupout=cs2dup;
by &usubjid. &l1var.n &l1var.;
run;
data case2;
set cs2nodup;
&l1var. = "合计";
&l1var.n = 0;
&l2var. = "合计";
run;
5.2 主要统计步骤
5.2.1 例数和发生率的计算
*_2.main statistical step;
*_2.1 number of case;
%do aa = 1 %to &grpnum.;
proc sql noprint;
create table ST_&aa. as
select &l1var.n, &l1var., "合计" as &l2var.,
cats(sum(&grpvarn. = &aa.), "(", put(sum(&grpvarn. = &aa.)/&&SUBN&aa.*100, 8.2), ")") as CASE_&aa.,
sum(&grpvarn. > 0) + 0.2 as seq1
from cs1nodup
group by &l1var.n, &l1var.
union
select &l1var.n, &l1var., &l2var.,
cats(sum(&grpvarn. = &aa.), "(", put(sum(&grpvarn. = &aa.)/&&SUBN&aa.*100, 8.2), ")") as CASE_&aa.,
sum(&grpvarn. > 0) +0.1 as seq1
from cs1nodup
group by &l1var.n, &l1var., &l2var.
union
select &l1var.n, &l1var., "合计" as &l2var.,
cats(sum(&grpvarn. = &aa.), "(", put(sum(&grpvarn. = &aa.)/&&SUBN&aa.*100, 8.2), ")") as CASE_&aa.,
sum(&grpvarn. > 0) + 1 as seq1
from case2
group by &l1var.n, &l1var.
;
quit;
proc sort data= ST_&aa.;
by &l1var.n &l1var. &l2var.;
run;
%end;
5.2.2 例次的计算
*_2.2 times of case;
%do aa = 1 %to &grpnum.;
proc sql noprint;
create table ST_&aa._ as
select &l1var.n, &l1var., "合计" as &l2var.,
cats(sum(&grpvarn. = &aa.)) as CASE_&aa._ ,
sum(&grpvarn. > 0) + 0.2 as seq2
from times1
group by &l1var.n, &l1var.
union
select &l1var.n, &l1var., &l2var.,
cats(sum(&grpvarn. = &aa.)) as CASE_&aa._ ,
sum(&grpvarn. > 0) +0.1 as seq2
from times1
group by &l1var.n, &l1var., &l2var.
union
select &l1var.n, &l1var., "合计" as &l2var.,
cats(sum(&grpvarn. = &aa.)) as CASE_&aa._ ,
sum(&grpvarn. > 0) + 1 as seq2
from times2
group by &l1var.n, &l1var.
;
quit;
proc sort data= ST_&aa._;
by &l1var.n &l1var. &l2var.;
run;
%end;
5.2.3 是否计算每行的合计
%if %sysfunc(upcase(&rowsumyn.) ) = %str(Y) %then
%do;
%put WARNING: 已经计算每行合计;
5.2.3.1 计算每行的例数和发生率的合计
*_2.3.1 caculation of each row for number of case;
proc sql noprint;
create table ST_99 as
select &l1var.n, &l1var., "合计" as &l2var., 1 as idid,
cats(sum(&grpvarn. > 0), "(", put(sum(&grpvarn. > 0)/&SUBN999.*100, 8.2), ")") as CASE_99,
sum(&grpvarn. > 0) + 0.2 as seq1
from cs1nodup
group by &l1var.n, &l1var.
union
select &l1var.n, &l1var., &l2var., 2 as idid,
cats(sum(&grpvarn. > 0), "(", put(sum(&grpvarn. > 0)/&SUBN999.*100, 8.2), ")") as CASE_99,
sum(&grpvarn. > 0) + 0.1 as seq1
from cs1nodup
group by &l1var.n, &l1var., &l2var.
union
select &l1var.n, &l1var., "合计" as &l2var., 3 as idid,
cats(sum(&grpvarn. > 0), "(", put(sum(&grpvarn. > 0)/&SUBN999.*100, 8.2), ")") as CASE_99,
sum(&grpvarn. > 0) + 1 as seq1
from case2
group by &l1var.n, &l1var.
;
run;
proc sort data= ST_99;
by &l1var.n &l1var. &l2var.;
run;
5.2.3.2 计算每行的例次的合计
*_2.3.2 caculation of each row for times of case;
proc sql noprint;
create table ST_99_ as
select &l1var.n, &l1var., "合计" as &l2var., 1 as idid,
cats(sum(&grpvarn. > 0) ) as CASE_99_,
sum(&grpvarn. > 0) + 0.2 as seq2
from times1
group by &l1var.n, &l1var.
union
select &l1var.n, &l1var., &l2var., 2 as idid,
cats(sum(&grpvarn. > 0) ) as CASE_99_,
sum(&grpvarn. > 0) + 0.1 as seq2
from times1
group by &l1var.n, &l1var., &l2var.
union
select &l1var.n, &l1var., "合计" as &l2var., 3 as idid,
cats(sum(&grpvarn. > 0) ) as CASE_99_,
sum(&grpvarn. > 0) + 1 as seq2
from times2
group by &l1var.n, &l1var.
;
run;
proc sort data= ST_99_;
by &l1var.n &l1var. &l2var.;
run;
%end;
提醒不计算每行的合计:
%else
%do;
%put WARNING: 不计算每行合计;
%end;
5.2.4 是否计算每列的合计
*_2.4 caculation the sum for each column;
data _0&dtout.;
merge ST_:
;
by &l1var.n &l1var. &l2var.;
%if %sysfunc(upcase(&colsumyn.) ) = %str(Y) %then
%do;
%put WARNING: 已经计算每列合计;
%end;
%else
%do;
%put WARNING: 不计算每列合计;
if &l1var.n = 0 then
delete;
%end;
5.3 统计后的处理步骤
*_3 processing step of stat;
proc sort;
by &l1var.n descending seq1 descending seq2;
run;
run;
data _1&dtout.;
set _0&dtout.;
by &l1var.n descending seq1 descending seq2;
if first.&l1var.n or first.&l1var. then
&l1var. = &l1var.;
else &l1var. = " "||&l2var.;
keep &l1var. CASE_:;
run;
5.4 数据输出步骤
* _4.output steps;
proc contents data= _1&dtout. out= _1outs noprint;
proc sort;
by varnum;
run;
proc sql noprint;
select count(distinct NAME) , NAME into:varn,:col1-:col99 from _1outs;
quit;
data &libout..&dtout.;
set _1&dtout.;
%do ii = 1 %to &varn.;
if &&col&ii. = "0(0.00)" then
&&col&ii. ="0";
%let jj = %eval(&ii. - 1);
rename &&col&ii. = C&jj.;
%end;
run;
proc datasets lib=work noprint;
delete soc: times: case: cs: st: _:;
run;
以上,如有疏漏,欢迎指正。