理解顺序过程间通讯(CSP)

Understand Communicating Sequencial Processes

主要内容为论文解读：1978年ACM Communicating Sequential Process
此论文主要是一个编程语言的实现，并通过程序样例来举例不同的问题如何解决。

背景

程序的常见结构

并行的兴起

引入并行：硬件（CDC 6600 并行单元）到软件（I/O 控制包、多编程操作系统）
semaphore（ Dutch computer scientist Edsger Dijkstra in 1962 or 1963 提出）
公共存储的检查与更新：Algol 68, PL/I, 各种不同的机器码（开销较大）
events: PL/I
条件临界区
monitors and queues: Concurrent Pascal
path expression

Motivation: 并行的实现太多且繁杂，以上多种方法没有一个统一的选择标准，因此论文作者提出此编程语言来整合这些并行的实现（通过6个结构囊括并行的实现）

内容

一种新的设计

Dijkstra's guarded command
- 条件分支，类似 Go 的 Select 语句
- 主要的区别：如果多个条件为真，则随机选择一个分支执行
Dijkstra's parbegin parallel command
- 启动并发过程
input and output command
- 过程间通信手段；通信值通过复制进行，而不是共享；没有缓存；指令推迟到其他的过程能够处理该消息
- 等价于 Go 的无缓存的 channel
input + guarded command
- 等价于 Go 的 select 语句，条件分支仅当 input 源 ready 时开始执行
- 多个 input guards 同时就绪，只随机选择一个执行，其他 guards 没有影响
repetitive command
- 循环；终止性取决于所有的 guards 是否已经终止
pattern-matching
- 做条件判断使用（比如如何判断两个 bool 是否相等）

概念与记号

程序结构指令

<cmd>            ::= <simple cmd> | <structured cmd>
<simple cmd>     ::= <assignment cmd> | <input cmd> | <output cmd>
<structured cmd> ::= <alternative cmd> | <repetitive cmd> | <parallel cmd>
<null cmd>       ::= skip
<cmd list>       ::= {<declaration>; | <cmd>; } <cmd>

并行指令

<parallel cmd>     ::= [<proc>{||<proc>}]
<proc>             ::= <proc label> <cmd list>
<proc label>       ::= <empty> | <identifier> :: | <identifier>(<label subscript>{,<label subscript>}) ::
<label subscript>  ::= <integer const> | <range>
<integer constant> ::= <numeral> | <bound var>
<bound var>        ::= <identifier>
<range>            ::= <bound variable>:<lower bound>..<upper bound>
<lower bound>      ::= <integer const>
<upper bound>      ::= <integer const>

并行语句举例——循环构造并行语句

其他并行语句举例

(1) [cardreader ? cardimage || lineprinter ! lineimage]
    // 两个并行过程
(2) [west::DISASSEMBLE || X::SQUASH || east::ASSEMBLE]
    // 三个并行过程
(3) [room::ROOM || fork(i:0..4)::FORK || phil(i:0..4)::PHIL]
    // 十一个个并行过程，其中第二个和第三个并行过程分别包含五个并行的子过程

赋值指令

<assignment cmd>    ::= <target var> := <expr>
<expr>              ::= <simple expr> | <structured expr>
<structured expr>   ::= <constructor> ( <expr list> )
<constructor>       ::= <identifier> | <empty>
<expr list>         ::= <empty> | <expr> {, <expr> }
<target var>        ::= <simple var> | <structured target>
<structured target> ::= <constructor> ( <target var list> )
<target var list>   ::= <empty> | <target var> {, <target var> }

举例

(1) x := x + 1                  // 普通的赋值
(2) (x, y) := (y, x)            // 普通的赋值
(3) x := cons(left, right)      // 结构体赋值
(4) cons(left, right) := x      // 如果 x 是一个结构体 cons(y, z)，则赋值成功 left:=y, right:=z，否则失败
(5) insert(n) := insert(2*x+1)  // 等价于 n := 2*x + 1
(6) c := P()                    // 空结构体, 或称‘信号’（比如Go里面的 struct{}{} 充当一个信号，并不会实际占用内存）
(7) P() := c                    // 如果 c 同为信号，则无实际效果，否则失败
(8) insert(n) := has(n)         // 失败。不允许不匹配的结构体之间进行赋值

输入与输出指令

<input cmd>   ::= <source> ? <target var>
<output cmd>  ::= <destination> ! <expr>
<source>      ::= <proc name>
<destination> ::= <proc name>
<proc name>   ::= <identifier> | <identifier> ( <subscripts> )
<subscripts>  ::= <integer expr> {, <integer expr> }

举例

(1) cardreader?cardimage      // 类似于 cardimage <- cardreader
(2) lineprinter!lineimage     // 类似于 lineprinter := <- lineimage
(3) X?(x,y)                   // 从过程 X 接受两个值，并赋值给 x 和 y
(4) DIV!(3*a + b, 13)         // 向过程 DIV 发送两个值，分别为 3*a+b 和 13  
(5) console(i)?c              // 向第 i 个 console 发送值 c
(6) console(j-1)!"A"          // 向第 j-1 个 console 发送字符 "A"
(7) X(i)?V()                  // 从第 i 个 X 接受一个信号 V
(8) sem!P()                   // 向 sem 发送一个信号 P

通过 (3) 和 (4)，思考：[X :: DIV!(3*a+b, 13) || DIV :: X?(x,y)] 表达什么意思？
X被定义为 DIV 指令，DIV 被定义为 X 指令；X是图个输出指令，DIV 是一个输入指令；看上去是并发执行，但是实际上是顺序执行，X 首先执行完，把 3*a+b, 13 输出给 DIV，然后 DIV 再接受 X 的两个值 x y。

选择与重复指令

<repetitive cmd>  ::= * <alternative cmd>
<alternative cmd> ::= [<guarded cmd> { □ <guarded cmd> }]
<guarded cmd>     ::= <guard> → <cmd list> | ( <range> {, <range> }) <guard> → <cmd list>
<guard>           ::= <guard list> | <guard list>;<input cmd> | <input cmd>
<guard list>      ::= <guard elem> {; <guard elem> }
<guard elem>      ::= <boolean expr> | <declaration>

举例

(1) [x ≥ y → m := x □ y ≥ x → m := y] 
    // 如果某个条件成立，则执行 → 后的语句；若均成立则随机选择一个执行
(2) i := 0; *[i < size; content(i) ≠ n → i := i+1] 
    // 依次检查 content 中的值是否与 n 相同
(3) *[c:character; west?c → east!c] 
    // 从 west 复制一个字符串并输出到 east 中
(4) *[(i:1..10)continue(i); console(i)?c → X!(i,c); console(i)!ack(); continue(i) := (c ≠ sign off)]
    // 监听来自 10 个 console 的值，并将其发送给 X，当收到 sign off 时终止监听
(5) *[n:integer; X?insert(n) → INSERT □ n:integer; X?has(n) → SEARCH; X!(i<size)]
    // insert 操作: 执行 INSERT
    // has 操作: 执行 SEARCH，并当 i < size 时，向 X 发送 true，否则发送 false
(6) *[X?V() → val := val+1 □ val > 0; Y?P() → val := val-1]
    // V 操作: val++
    // P 操作: val > 0 时, val--，否则失败

算符

顺序算符 ; 
并行算符 ||  
赋值算符 := 
输入算符 ? （发送）
输出算符 ! （接受）
选择算符 □ 
守卫算符 → 
重复算符 *

协程（Coroutine）

S31 COPY

程序描述：编写一个过程 X，复制从 west 过程输出的字符到 east 过程

COPY :: *[c:character; west?c → east!c]

该语句等价于以下 go 语句：

func S31_COPY(west, east chan rune) {
    for c := range west {
        east <- c
    }
    close(east)
}

S32 SQUASH

程序描述：调整前面的程序，替换成对出现的「**」为「↑」，假设最后一个字符不是「*」。

SQUASH :: *[c:character; west?c → 
   [ c != asterisk → east!c
    □ c = asterisk → west?c;
          [ c != asterisk → east!asterisk; east!c
           □ c = asterisk → east!upward arrow
   ] ]    ]

练习：调整上面的程序，处理最后以奇数个「*」结尾的输入数据。
SQUASH_EX :: *[c:character; west?c → 
   [ c != asterisk → east!c
    □ c = asterisk → west?c;
          [ c != asterisk → east!asterisk; east!c
           □ c = asterisk → east!upward arrow
+         ] □ east!asterisk
   ]    ]

该程序等价于以下 Go 程序：

func S32_SQUASH_EX(west, east chan rune) {
    for {
        c, ok := <-west
        if !ok {
            break
        }
        if c != '*' {
            east <- c
        }
        if c == '*' {
            c, ok = <-west
            if !ok {
+               east <- '*'
                break
            }
            if c != '*' {
                east <- '*'
                east <- c
            }
            if c == '*' {
                east <- '↑'
            }
        }
    }
    close(east)
}

S33 DISASSEMBLE

程序描述：从卡片盒中读取卡片上的内容，并以流的形式将它们输出到过程 X ，并在每个卡片的最后插入一个空格。

DISASSEMBLE::
*[cardimage:(1..80)characters; cardfile?cardimage →
    i:integer; i := 1;
    *[i <= 80 → X!cardimage(i); i := i+1 ]
    X!space
]

等价于以下 Go 语言程序：

func S33_DISASSEMBLE(cardfile chan []rune, X chan rune) {
    cardimage := make([]rune, 0, 80)
    for tmp := range cardfile {
        if len(tmp) > 80 {
            cardimage = append(cardimage, tmp[:80]...)
        } else {
            cardimage = append(cardimage, tmp[:len(tmp)]...)
        }
        for i := 0; i < len(cardimage); i++ {
            X <- cardimage[i]
        }
        X <- ' '
        cardimage = cardimage[:0]
    }
    close(X)
}

S34 ASSEMBLE

程序描述：将一串流式字符串从过程 X 打印到每行 125 个字符的 lineprinter 上。必要时将最后一行用空格填满。

ASSEMBLE::
lineimage:(1..125)character;
i:integer, i:=1;
*[c:character; X?c →
    lineimage(i) := c;
    [i <= 124 → i := i+1
    □ i = 125 → lineprinter!lineimage; i:=1
]   ];
[ i = 1 → skip
□ i > 1 → *[i <= 125 → lineimage(i) := space; i := i+1];
  lineprinter!lineimage
]

等价于以下 Go 语言程序：

func S34_ASSEMBLE(X chan rune, lineprinter chan string) {
    lineimage := make([]rune, 125)

    i := 0
    for c := range X {
        lineimage[i] = c
        if i <= 124 {
            i++
        }
        if i == 125 {
            lineimage[i-1] = c
            lineprinter <- string(lineimage)
            i = 0
        }
    }
    if i > 0 {
        for i <= 124 {
            lineimage[i] = ' '
            i++
        }
        lineprinter <- string(lineimage)
    }

    close(lineprinter)
    return
}

S35 REFORMAT

程序描述：从长度为 80 的卡片上进行读取，并打印到长度为 125 个字符的 lineprinter 上。每个卡片必须跟随一个额外的空格，最后一行须使用空格进行填充。

REFORMAT::
[west::DISASSEMBLE || X::COPY || east::ASSEMBLE]

等价于以下 Go 语言程序：

func S35_Reformat(cardfile chan []rune, lineprinter chan string) {
    west, east := make(chan rune), make(chan rune)
    go S33_DISASSEMBLE(cardfile, west)
    go S31_COPY(west, east)
    S34_ASSEMBLE(east, lineprinter)
}

S36 Conway's Problem

程序描述：调整前面的程序，使用「↑」替换每个成对出现的「*」。

CONWAY::
[west::DISASSEMBLE || X::SQUASH || east::ASSEMBLE]

等价于以下 Go 语言程序：

func S35_ConwayProblem(cardfile chan []rune, lineprinter chan string) {
    west, east := make(chan rune), make(chan rune)
    go S33_DISASSEMBLE(cardfile, west)
    go S32_SQUASH_EX(west, east)
    S34_ASSEMBLE(east, lineprinter)
}

子程、数据表示与递归（Subroutine\Data Representation\Recursive）

不展开阐述例子，可以直接看论文，以及对应的Go实现

子程

[subr:SUBROUTINE || X::USER] // 子程和用户程序并行执行

SUBROUTINE::[X?(value params) → …; X!(result params)]
// 子程从用户程序读取一个输入，执行对应分支，最后输出给用户程序结果
USER::[ …; subr!(arguments); …; subr?(results)]
// 用户程序输出给子程一个参数，从子程读取一个输入（即返回的结果）

子程分为一个子程序和一个用户程序。用户往子程发送一个输入，监听子程获取输出；子程则监听用户程序的输入，并执行语句，最后输出给用户程序。
可以视为Go语言里面，子程序和用户程序通过 channel 来传递数据。

例子：

S41 带余除法
S42 阶乘
S43 S44 S45 S46 集合：实现一个集合的 insert 与 has 方法

数据表示

*[X? method1(value params) → … 
□ X? method2(value params) → … ]

数据表示可以视为一个具有多入口的子过程，根据 guarded command 进行分支选择。
可以参考 Go 的 Select 实现：满足 case 条件，则执行一个分支。

递归

[recsub(0)::USER || recsub(i:1..reclimit):: RECSUB]
USER::recsub(1)!(arguments); … ; recsub(1)?(results);

递归可以通过一个子程数组进行模拟，用户过程（零号子程）向一号子程发送必要的参数，再从一号子程接受递归后的结果。
用户程序输出给一号子程，一号子程获取输入后，处理并输出给二号子程，二号子程输出给三号子程，...，最后reclimit号子程输出给reclimit-1号子程，...，二号子程输出给一号子程，一号子程输出给用户程序

监控与调度（Subroutine\Data Representation\Recursive）

监控可以被视为与多个用户过程通信的单一过程，且总是能跟用户过程进行通信。例如：

*[(i:1..100)X(i)?(value param) → …; X(i)!(results)]

当两个用户过程同时选择一个 X(i) 时，guarded cmd(顺序执行) 保护了监控结果不会被错误地发送到错误的用户过程中。
例子：S51 Buffered Channel；S52 信号量；S53 哲学家进餐问题

算法（algorithom）

例子：S61 Eratosthenes 素数筛法；S62 矩阵乘法

反思和总结

这篇论文中讨论的 CSP 设计是 Tony Hoare 的早期提出的设计，与随后将理论完整化后的 CSP（1985）存在两大差异：

缺陷1: 未对 channel 命名

并行过程的构造具有唯一的名词，并以一对冒号作为前缀：[a::P || b::Q || … || c::R]
在过程 P 中，命令 b!v 将 v 输出到名为 b 的过程。该值由在过程 Q 中出现的命令 a?x 输入。
过程名称对于引入它们的并行命令是局部的，并且组件过程间的通信是隐藏的。
虽然其优点是不需要在语言中引入任何 channel 或者 channel 声明的概念。

缺点：

子过程需要知道其使用过程的名称，使得难以构建封装性较好的子过程库
并行过程组合本质上是具有可变数量参数的运算，不能进行简化（见 CSP 1985）

缺陷2: 重复指令的终止性模糊

重复指令默认当每个 guard 均已终止则指令中终止，这一假设过强。具体而言，对于 *[a?x → P □ b?x → Q □ ...]
要求当且仅当输入的所有过程 a,b,… 均终止时整个过程才自动终止。

缺点：

定义和实现起来很复杂
证明程序正确性的方法似乎比没有简单。

一种可能的弱化条件为：直接假设子过程一定会终止。

一些要点

CSP 1978 中描述的编程语言（与 Go 所构建的基于通道的 channel/select 同步机制进行对比）：

channel 没有被显式命名（作者描述了过程之间通信，但是隐藏了具体实现，没有明确命名出channel）
channel 没有缓冲，对应 Go 中 unbuffered channel
buffered channel 不是一种基本的编程源语，并展示了一个使用 unbuffered channel 实现其作用的例子
guarded command 等价于 if 与 select 语句的组合，分支的随机触发性是为了提供公平性保障
guarded command 没有对确定性分支选择与非确定性（即多个分支有效时随机选择）分支选择进行区分
repetitive command 的本质是一个无条件的 for 循环，但终止性所要求的条件太苛刻，不利于理论的进一步形式化
CSP 1978 中描述的编程语言对程序终止性的讨论几乎为零（作者自己在论文中也还没想清楚）
此时与 Actor 模型进行比较，CSP 与 Actor 均在实体间直接通信，区别在于 Actor 支持异步消息通信，而 CSP 1978 是同步通信
…

Q&A

Q: Tony Hoare 提出 CSP 的时代背景是什么？
A: 并行计算的兴起、需要一种形式化的并行编程语言

Q: CSP 1978 理论到底有哪些值得我们研究的地方？
A: 一种面向并行过程的演算代数

Q: CSP 1978 理论是否真的就是我们目前熟知的基于通道的同步方式？
A: 不是，早期的设计中没有显式的对通道进行命名。

Q: CSP 1978 理论的早期设计存在什么样的缺陷？
A: 早期设计中没有对通道进行命名是设计的一个主要缺陷，此外，对程序终止性的条件描述过于模糊，形式化程度不够完善。

进一步阅读的参考文献

完成阅读此论文之后，就可以继续看之后的理论发展。

[Hoare 78]
Hoare, C. A. R. (1978). Communicating sequential processes. Communications of the ACM, 21(8), 666–677.

[Brookes 84]
S. D. Brookes, C. A. R. Hoare, and A. W. Roscoe. 1984. A Theory of Communicating Sequential Processes. J. ACM 31, 3 (June 1984), 560-599.

[Hoare 85]
C. A. R. Hoare. 1985. Communicating Sequential Processes. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

[Milner 82]
R. Milner. 1982. A Calculus of Communicating Systems. Springer-Verlag, Berlin, Heidelberg.

[Fidge 94]
Fidge, C., 1994. A comparative introduction to CSP, CCS and LOTOS. Software Verification Research Centre, University of Queensland, Tech. Rep, pp.93-24.

推荐书籍

理解顺序过程间通讯(CSP)

理解顺序过程间通讯(CSP)

Understand Communicating Sequencial Processes

背景

程序的常见结构

并行的兴起

内容

一种新的设计

概念与记号

程序结构指令

并行指令

赋值指令

输入与输出指令

选择与重复指令

算符

协程（Coroutine）

S31 COPY

S32 SQUASH

S33 DISASSEMBLE

S34 ASSEMBLE

S35 REFORMAT

S36 Conway's Problem

子程、数据表示与递归（Subroutine\Data Representation\Recursive）

子程

数据表示

递归

监控与调度（Subroutine\Data Representation\Recursive）

算法（algorithom）

反思和总结

缺陷1: 未对 channel 命名

缺点：

缺陷2: 重复指令的终止性模糊

缺点：

一些要点

Q&A

进一步阅读的参考文献

相关链接

推荐阅读更多精彩内容