Bash Process Substitution

In addition to the fairly common forms of input/output redirection the shell recognizes something called process substitution. Although not documented as a form of input/output redirection, its syntax and its effects are similar.

The syntax for process substitution is:

  <(list)
or
  >(list)
where each list is a command or a pipeline of commands. The effect of process substitution is to make each list act like a file. This is done by giving the list a name in the file system and then substituting that name in the command line. The list is given a name either by connecting the list to named pipe or by using a file in /dev/fd (if supported by the O/S). By doing this, the command simply sees a file name and is unaware that its reading from or writing to a command pipeline.

 

To substitute a command pipeline for an input file the syntax is:

  command ... <(list) ...
To substitute a command pipeline for an output file the syntax is:
  command ... >(list) ...

 

At first process substitution may seem rather pointless, for example you might imagine something simple like:

  uniq <(sort a)
to sort a file and then find the unique lines in it, but this is more commonly (and more conveniently) written as:
  sort a | uniq
The power of process substitution comes when you have multiple command pipelines that you want to connect to a single command.

 

For example, given the two files:

  # cat a
  e
  d
  c
  b
  a
  # cat b
  g
  f
  e
  d
  c
  b
To view the lines unique to each of these two unsorted files you might do something like this:
  # sort a | uniq >tmp1
  # sort b | uniq >tmp2
  # comm -3 tmp1 tmp2
  a
        f
        g
  # rm tmp1 tmp2
With process substitution we can do all this with one line:
  # comm -3 <(sort a | uniq) <(sort b | uniq)
  a
        f
        g

 

Depending on your shell settings you may get an error message similar to:

  syntax error near unexpected token `('
when you try to use process substitution, particularly if you try to use it within a shell script. Process substitution is not a POSIX compliant feature and so it may have to be enabled via:
  set +o posix
Be careful not to try something like:
  if [[ $use_process_substitution -eq 1 ]]; then
    set +o posix
    comm -3 <(sort a | uniq) <(sort b | uniq)
  fi
The command set +o posix enables not only the execution of process substitution but the recognition of the syntax. So, in the example above the shell tries to parse the process substitution syntax before the "set" command is executed and therefore still sees the process substitution syntax as illegal.

 

Of course, note that all shells may not support process substitution, these examples will work with bash.


進程替換與命令替換很相似. 命令替換把一個命令的結果賦值給一個變量, 比如dir_contents=`ls -

al`或xref=$( grep word datafile). 進程替換把一個進程的輸出提供給另一個進程(換句話說, 它把

一個命令的結果發給了另一個命令).

命令替換的模版

用圓括號擴起來的命令

>(command)

<(command)

啟動進程替換. 它使用/dev/fd/<n>文件將圓括號中的進程處理結果發送給另一個進程. [1] (譯

者注: 實際上現代的UNIX類操作系統提供的/dev/fd/n文件是與文件描述符相關的, 整數n指的就

是進程運行時對應數字的文件描述符)

在"<"或">"與圓括號之間是沒有空格的. 如果加了空格, 會產生錯誤.

bash$ echo >(true)

/dev/fd/63

bash$ echo <(true)

/dev/fd/63

Bash在兩個文件描述符之間創建了一個管道, --fIn和fOut--. true命令的stdin被連接到fOut

(dup2(fOut, 0)), 然后Bash把/dev/fd/fIn作為參數傳給echo. 如果系統缺乏/dev/fd/<n>文件, Bash會

使用臨時文件. (感謝, S.C.)

進程替換可以比較兩個不同命令的輸出, 甚至能夠比較同一個命令不同選項情況下的輸出.

bash$ comm <(ls -l) <(ls -al)

total 12

-rw-rw-r-- 1 bozo bozo 78 Mar 10 12:58 File0

-rw-rw-r-- 1 bozo bozo 42 Mar 10 12:58 File2

-rw-rw-r-- 1 bozo bozo 103 Mar 10 12:58 t2.sh

total 20

drwxrwxrwx 2 bozo bozo 4096 Mar 10 18:10 .

drwx------ 72 bozo bozo 4096 Mar 10 17:58 ..

-rw-rw-r-- 1 bozo bozo 78 Mar 10 12:58 File0

-rw-rw-r-- 1 bozo bozo 42 Mar 10 12:58 File2

-rw-rw-r-- 1 bozo bozo 103 Mar 10 12:58 t2.sh

使用進程替換來比較兩個不同目錄的內容(可以查看哪些文件名相同, 哪些文件名不同):

1 diff <(ls $first_directory) <(ls $second_directory)

一些進程替換的其他用法與技巧:

1 cat <(ls -l)

2 # 等價于 ls -l | cat

3

4 sort -k 9 <(ls -l /bin) <(ls -l /usr/bin) <(ls -l /usr/X11R6/bin)

5 # 列出系統3個主要'bin'目錄中的所有文件, 并且按文件名進行排序.

6 # 注意是3個(查一下, 上面就3個圓括號)明顯不同的命令輸出傳遞給'sort'.

7

8

9 diff <(command1) <(command2) # 給出兩個命令輸出的不同之處.

10

11 tar cf >(bzip2 -c > file.tar.bz2) $directory_name

12 # 調用"tar cf /dev/fd/?? $directory_name", 和"bzip2 -c > file.tar.bz2".

13 #

14 # 因為/dev/fd/<n>的系統屬性,

15 # 所以兩個命令之間的管道不必被命名.

16 #

17 # 這種效果可以被模擬出來.

18 #

19 bzip2 -c < pipe > file.tar.bz2&

20 tar cf pipe $directory_name

21 rm pipe

22 # 或

23 exec 3>&1

24 tar cf /dev/fd/4 $directory_name 4>&1 >&3 3>&- | bzip2 -c > file.tar.bz2 3>&-

25 exec 3>&-

26

27

28 # 感謝, Stephane Chazelas

一個讀者給我發了一個有趣的例子, 是關于進程替換的, 如下.

1 # 摘自SuSE發行版中的代碼片斷:

2

3 while read des what mask iface; do

4 # 這里省略了一些命令...

5 done < <(route -n)

6

7

8 # 為了測試它, 我們讓它做點事.

9 while read des what mask iface; do

10 echo $des $what $mask $iface

11 done < <(route -n)

12

13 # 輸出:

14 # Kernel IP routing table

15 # Destination Gateway Genmask Flags Metric Ref Use Iface

16 # 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo

17

18

19

20 # 就像Stephane Chazelas所給出的那樣, 一個更容易理解的等價代碼是:

21 route -n |

22 while read des what mask iface; do # 管道的輸出被賦值給了變量.

23 echo $des $what $mask $iface

24 done # 這將產生出與上邊相同的輸出.

25 # 然而, Ulrich Gayer指出 . . .

26 #+ 這個簡單的等價版本在while循環中使用了一個子shell,

27 #+ 因此當管道結束后, 變量就消失了.

28

29

30

31 # 更進一步, Filip Moritz解釋了上面兩個例子之間存在一個細微的不同之處,

32 #+ 如下所示.

33

34 (

35 route -n | while read x; do ((y++)); done

36 echo $y # $y 仍然沒有被聲明或設置

37

38 while read x; do ((y++)); done < <(route -n)

39 echo $y # $y 的值為route -n的輸出行數.

40 )

41

42 # 一般來說, (譯者注: 原書作者在這里并未加注釋符號"#", 應該是筆誤)

43 (

44 : | x=x

45 # 看上去是啟動了一個子shell

46 : | ( x=x )

47 # 但

48 x=x < <(:)

49 # 其實不是

50 )

51

52 # 當你要解析csv或類似東西的時侯, 這非常有用.

53 # 事實上, 這就是SuSE的這個代碼片斷所要實現的功能.

注意事項

[1] 這與命名管道(臨時文件)具有相同的作用, 并且, 事實上, 命名管道也被同時使用在進程

替換中.