1. 准备工作

autodock-gpu

安装cuda，并设置环境变量：

export GPU_INCLUDE_PATH=/usr/local/cuda-11.2/include
export GPU_LIBRARY_PATH=/usr/local/cuda-11.2/lib64
export PATH="/usr/local/cuda-11.2/bin:$PATH"

不同GPU设备下的编译问题

根据不同的GPU设备，选择不同的计算能力ComputeCapability，参考：
https://en.wikipedia.org/wiki/CUDA#GPUs_supported

比如RTX 3090对应的算力是8.6，TARGETS应该是86。

可以修改下面这一行语句，也可以make时增加参数TARGETS="86"

TARGETS = 52 60 61 70 86

类似Issue
https://github.com/ccsb-scripps/AutoDock-GPU/issues/172
https://github.com/ccsb-scripps/AutoDock-GPU/issues/180
https://github.com/ccsb-scripps/AutoDock-GPU/issues/249

确认nvcc是否支持当前的算力参数

如下，nvcc支持compute_86，那么可以设置TARGETS="86"。否则，make时会报错

$ nvcc --help
...
--gpu-code <code>,...                           (-code)                         
        Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
        nvcc embeds a compiled code image in the resulting executable for each specified
        <code> architecture, which is a true binary load image for each 'real' architecture
        (such as sm_50), and PTX code for the 'virtual' architecture (such as compute_50).
        During runtime, such embedded PTX code is dynamically compiled by the CUDA
        runtime system if no binary load image is found for the 'current' GPU.
        Architectures specified for options '--gpu-architecture' and '--gpu-code'
        may be 'virtual' as well as 'real', but the <code> architectures must be
        compatible with the <arch> architecture.  When the '--gpu-code' option is
        used, the value for the '--gpu-architecture' option must be a 'virtual' PTX
        architecture.
        For instance, '--gpu-architecture=compute_60' is not compatible with '--gpu-code=sm_52',
        because the earlier compilation stages will assume the availability of 'compute_60'
        features that are not present on 'sm_52'.
        Note: the values compute_30, compute_32, compute_35, compute_37, compute_50,
        sm_30, sm_32, sm_35, sm_37 and sm_50 are deprecated and may be removed in
        a future release.
        Allowed values for this option:  'compute_35','compute_37','compute_50',
        'compute_52','compute_53','compute_60','compute_61','compute_62','compute_70',
        'compute_72','compute_75','compute_80','compute_86','lto_35','lto_37','lto_50',
        'lto_52','lto_53','lto_60','lto_61','lto_62','lto_70','lto_72','lto_75',
        'lto_80','lto_86','sm_35','sm_37','sm_50','sm_52','sm_53','sm_60','sm_61',
        'sm_62','sm_70','sm_72','sm_75','sm_80','sm_86'.
...

2. 编译

The first step is to set environmental variables GPU_INCLUDE_PATH and GPU_LIBRARY_PATH,
as described here: https://github.com/ccsb-scripps/AutoDock-GPU/wiki/Guideline-for-users

Template

make DEVICE=<TYPE> NUMWI=<NWI>

Example

make DEVICE=CUDA NUMWI=256 OVERLAP=ON TARGETS="86"

Parameters	Description	Values
`<TYPE>`	Accelerator chosen	`CPU`, `GPU`, `CUDA`, `OCLGPU`
`<NWI>`	work-group/thread block size, Number of work-items (wi)	`1`, `2`, `4`, `8`, `16`, `32`, `64`, `128`, `256`

When DEVICE=GPU is chosen, the Makefile will automatically tests if it can compile Cuda succesfully. To override, use DEVICE=CUDA or DEVICE=OCLGPU. The cpu target is only supported using OpenCL. Furthermore, an OpenMP-enabled overlapped pipeline (for setup and processing) can be compiled with OVERLAP=ON.
Hints: The best work-group size depends on the GPU and workload. Try NUMWI=128 or NUMWI=64 for modern cards with the example workloads. On macOS, use NUMWI=1 for CPUs.

After successful compilation, the host binary autodock_<type>_<N>wi is placed under bin.

Binary-name portion	Description	Values
<type>	Accelerator chosen	`cpu`, `gpu`
<N>	work-group/thread block size	`1`, `2`, `4`, `8`,`16`, `32`, `64`, `128`, `256`

3. 测试

$ ./autodock_gpu_256wi --lfile /home/shuzhang/ai/code/moldock/autodock/output/tmpnnuuab_g.pdbqt --ffile /data/autodock/grid/0cb544cb1474ff6d917fe409598886cb/protein.maps.fld --devnum 2 --ngen 1 --nrun 2 --stopstd 1.999
AutoDock-GPU version: v1.5.3-73-gf5cf6ffdd0c5b3f113d5cc424fabee51df04da7e

Running 1 docking calculation

Cuda device:                              NVIDIA GeForce RTX 3090 (#2 / 6)
Available memory on device:               21182 MB (total: 24268 MB)

CUDA Setup time 0.248527s
(Thread 52 is setting up Job #1)

Running Job #1
    Using heuristics: (capped) number of evaluations set to 6122449
    Warning: The set number of evals is 48.98% of the uncapped heuristics estimate of 12500000 evals.
             This means this docking may not be able to converge. Increasing --heurmax may improve
             convergence but will also increase runtime.
             AutoStop will not stop before 10.50% (643004) of the set number of evaluations.
    Local-search chosen method is: ADADELTA (ad)

Rest of Setup time 0.006511s

Executing docking runs, stopping automatically after either reaching 2.00 kcal/mol standard deviation of
the best molecules of the last 4 * 5 generations, 1 generations, or 6122449 evaluations:

Generations |  Evaluations |     Threshold    |  Average energy of best 10%  | Samples | Best Inter + Intra
------------+--------------+------------------+------------------------------+---------+-------------------
          0 |          150 | 1145.95 kcal/mol |  312.45 +/-  222.27 kcal/mol |       4 |   80.37 kcal/mol
          1 |         9167 | 1145.95 kcal/mol |   91.60 +/-  157.57 kcal/mol |      56 |   -7.85 kcal/mol
------------+--------------+------------------+------------------------------+---------+-------------------

                                   Finished evaluation after reaching
                          9167 evaluations. Best inter + intra    -7.85 kcal/mol.

Docking time 0.002292s

Shutdown time 0.002002s

Job #1 took 0.011 sec after waiting 0.347 sec for setup

(Thread 52 is processing Job #1)
Run time of entire job set (1 file): 0.360 sec
Processing time: 0.002 sec

All jobs ran without errors.

4. 参考

https://github.com/ccsb-scripps/AutoDock-GPU

AutoDock - 编译与测试