使用示例
如果你有并行需求,就安装和学习下面最基本的示例命令,其他的参数暂不学习即可。
for f in `ls /public/project/RNA/airway/raw_fq/*gz` ; do echo "name=`basename $f .gz`; gunzip -c $f >~/\$name"; done |parallel -j 2
# 循环的并行:随便你想几个并行
# 无需通过拆文件、用shell的循环来做,或者条件判断等
安装
非管理员安装命令,下载二进制的包
# https://www.gnu.org/software/
# https://www.gnu.org/manual/manual.html
# https://www.gnu.org/software/parallel/
# http://ftp.gnu.org/gnu/parallel/
wget -c https://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2
tar -jxvf parallel-latest.tar.bz2
cd parallel-20190622
cat README
mkdir $HOME/parallel
./configure --prefix=$HOME/parallel&& make && make install
# $HOME/parallel/自定义安装路径
$HOME/parallel/bin/parallel --help
make后显示如下
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether ln -s works... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating config.h
make all-recursive
make[1]: Entering directory '/home/qmcui/parallel-20190622'
Making all in src
make[2]: Entering directory '/home/qmcui/parallel-20190622/src'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[2]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Leaving directory '/home/qmcui/parallel-20190622'
make[1]: Leaving directory '/home/qmcui/parallel-20190622'
Making install in src
make[1]: Entering directory '/home/qmcui/parallel-20190622/src'
make[2]: Entering directory '/home/qmcui/parallel-20190622/src'
/bin/mkdir -p '/home/qmcui/parallel/bin'
/usr/bin/install -c parallel sql niceload parcat parset env_parallel env_parallel.ash env_parallel.bash env_parallel.csh env_parallel.dash env_parallel.fish env_parallel.ksh env_parallel.mksh env_parallel.pdksh env_parallel.sh env_parallel.tcsh env_parallel.zsh '/home/qmcui/parallel/bin'
make install-exec-hook
make[3]: Entering directory '/home/qmcui/parallel-20190622/src'
rm /home/qmcui/parallel/bin/sem || true
rm: cannot remove '/home/qmcui/parallel/bin/sem': No such file or directory
ln -s parallel /home/qmcui/parallel/bin/sem
make[3]: Leaving directory '/home/qmcui/parallel-20190622/src'
/bin/mkdir -p '/home/qmcui/parallel/share/doc/parallel'
/usr/bin/install -c -m 644 parallel.html env_parallel.html sem.html sql.html niceload.html parallel_tutorial.html parallel_book.html parallel_design.html parallel_alternatives.html parcat.html parset.html parallel.texi env_parallel.texi sem.texi sql.texi niceload.texi parallel_tutorial.texi parallel_book.texi parallel_design.texi parallel_alternatives.texi parcat.texi parset.texi parallel.pdf env_parallel.pdf sem.pdf sql.pdf niceload.pdf parallel_tutorial.pdf parallel_book.pdf parallel_design.pdf parallel_alternatives.pdf parcat.pdf parset.pdf parallel_cheat.pdf '/home/qmcui/parallel/share/doc/parallel'
/bin/mkdir -p '/home/qmcui/parallel/share/man/man1'
/usr/bin/install -c -m 644 parallel.1 env_parallel.1 sem.1 sql.1 niceload.1 parcat.1 parset.1 '/home/qmcui/parallel/share/man/man1'
/bin/mkdir -p '/home/qmcui/parallel/share/man/man7'
/usr/bin/install -c -m 644 parallel_tutorial.7 parallel_book.7 parallel_design.7 parallel_alternatives.7 '/home/qmcui/parallel/share/man/man7'
make[2]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[1]: Leaving directory '/home/qmcui/parallel-20190622/src'
make[1]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Entering directory '/home/qmcui/parallel-20190622'
make[2]: Nothing to be done for 'install-exec-am'.
make[2]: Nothing to be done for 'install-data-am'.
make[2]: Leaving directory '/home/qmcui/parallel-20190622'
make[1]: Leaving directory '/home/qmcui/parallel-20190622'
parallel --help
Usage:
parallel [options] [command [arguments]] < list_of_arguments
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
cat ... | parallel --pipe [options] [command [arguments]]
-j n Run n jobs in parallel
-k Keep same order
-X Multiple arguments with context replace
--colsep regexp Split input on regexp for positional replacements
{} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
{3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
{+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
-S sshlogin Example: foo@server.example.com
--slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
--trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
--onall Run the given command with argument on all sshlogins
--nonall Run the given command with no arguments on all sshlogins
--pipe Split stdin (standard input) to multiple jobs.
--recend str Record end separator for --pipe.
--recstart str Record start separator for --pipe.
See 'man parallel' for details
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
DOI https://doi.org/10.5281/zenodo.1146014
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
配置环境变量
vim ~/.bashrc
export PATH=/home/qmcui/parallel/bin:$PATH
. ~/.bashrc
除去parallel的提示
这一步不必须,不太懂代码的,不要乱改,这一步忽略。
vim $HOME/parallel/bin/parallel # 或者vim parallel
# 作如下处理,避免每次运行都要出这个命令的提示信息
# 删掉后保存命令文档
示例
$ parallel echo ::: a b c d e | tee a.txt
a
b
c
d
e
$ parallel echo ::: A B C ::: D E F | tee b.txt
A D
A E
A F
B D
B E
B F
C D
C E
C F
$ parallel echo ::: a b c d e|tee a.txt
a
b
c
d
e
qmcui 12:23:41 ~/parallel/bin
$ parallel -a a.txt -a b.txt echo
a A D
a A E
a A F
a B D
a B E
a B F
a C D
a C E
a C F
......
e C E
e C F
# 同:cat a.txt |parallel -a - -a b.txt echo
# -标准输入符,缓存占位符
# 同:cat a.txt | parallel echo :::: - b.txt
# 同: parallel echo ::: a b c d e :::: b.txt
# GNU Parallel使用 --no-run-if-empty 来跳过空行:
qmcui 12:32:43 ~/parallel/bin
$ (echo 1; echo; echo 2) | parallel --no-run-if-empty echo
1
2
qmcui 12:32:45 ~/parallel/bin
$ (echo 1; echo; echo 2) | parallel echo
1
2
参数解释
Usage:
parallel [options] [command [arguments]] < list_of_arguments
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
cat ... | parallel --pipe [options] [command [arguments]]
常用选项:
::: 后面接参数
:::: 后面接文件
-j、--jobs 并行任务数
-N 每次输入的参数数量
--xargs会在一行中输入尽可能多的参数
-xapply 从每一个源获取一个参数(或文件一行)
--header 把每一行输入中的第一个值做为参数名
-m 表示每个job不重复输出“背景”(context)
-X 与-m相反,会重复输出“背景文本”
-q 保护后面的命令
--trim lr 去除参数两头的空格,只能去除空格,换行符和tab都不能去除
--keep-order/-k 强制使输出与参数保持顺序 --keep-order/-k
--tmpdir/ --results 都是保存文件,但是后者可以有结构的保存
--delay 延迟每个任务启动时间
--halt 终止任务
--pipe 该参数使得我们可以将输入(stdin)分为多块(block)
--block 参数可以指定每块的大小
常用:
work.sh里,每一行都可以复制出在linux命令行运行的,全部写入文本work.sh。不需要执行权限。
好处是不用split切割并投,从而也不会每个运行脚本之间有的先结束有的后结束。
cat work.sh|parallel -j 4
# 4个任务并行
学习资料
- https://www.gnu.org/software/parallel/man.html
- https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice
- sem:https://www.baidu.com/link?url=Wjs-K8XG0qz4FcpcI8tS1fmy7UswGKykb77S-YdP9vnljLa5X3TgOVRJfXI0MP_zU-fStY-ENx6fRWpWRkTswK&wd=&eqid=f08d069d00075cfe000000065d1dd003
- sem:http://www.gnu.org/software/parallel/sem.html
- //www.greatytc.com/p/cc54a72616a1