Tuesday, November 4, 2008

Process substitution (in *nix shells)

Some times we need to give output from multiple processes(commands) as input to another command. The common use case i used to come across was when, i needed to diff two config files located on two different servers, while i was logged into only my workstation.

For eg. if i need to diff say /etc/rc.local from server1 and server2, then the typical flow would be -
1) get rc.local from server1, store as say tmp1
2) get rc.local from server2, store as tmp2
3) diff the two and delete after use.

I wanted to find a slightly elegant solution for this. After some searching, i found about bash's 'process substitution' feature. Bash allows you to redirect to or, from multiple processes using the '<' and '>' operators. The '<' is more useful. The '>' is more like a pipe.

The command for the above use case would be:

diff <(ssh server1 cat /etc/rc.local) <(ssh server2 cat /etc/rc.local)

The substitutions can be nested, for eg -

sed 's/Nov/November/g' <(sort -r <(grep sshd /var/log/secure|tail) <(grep pppd /var/log/messages|tail))

This (rather useless) command greps for 'sshd' and 'pppd' in two different logs and collects twenty rows which are given to sort, by use of process substitution. The sorted output is then fed to sed via another substitution. Finally, sed replaces every occurrence of 'Nov' by November.

Not that process substitution makes any impossible things possible, its just that, the code can look better and intuitive. Theoretically speaking, it will perform better too(if implemented using the same logic, of course).

To know more about how process substitution works, let us do a cat <(sleep 15m)
This will keep cat waiting for input for 15 mins, which will give us some time to experiment.

Then, on another terminal we can check out from where cat is trying to read the input from.

ps -ef|grep cat

Output is -

[root@pranav ~]# ps -ef|grep cat
root 6310 4448 0 12:11 pts/4 00:00:00 cat /proc/self/fd/63

So, cat is reading from /proc/self/fd/63. We won't be able to read /proc/self/fd/63 of cat, we'll have to refer it using cat's pid i.e via /proc/6310/fd/63

Doing a stat /proc/6310/fd/63 reveals interesting details -

[root@pranav ~]# stat /proc/6310/fd/63
File: `/proc/6310/fd/63' -> `pipe:[61741]'
Size: 64 Blocks: 0 IO Block: 1024 symbolic link
Device: 3h/3d Inode: 61746 Links: 1
Access: (0500/lr-x------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2008-11-01 12:21:02.860555104 +0530
Modify: 2008-11-01 12:17:10.310805099 +0530
Change: 2008-11-01 12:17:10.310805099 +0530

The /proc/self/fd/* entries are being used by the process to read the output of the other. 63 is a symlink to the pipe which is opened for writing by the subshell running sleep 15m. Cat reads from the reading end of that pipe which is aliased as /proc/6310/fd/63. Thus process substitution is accomplished using pipes. Since cat is reading from /proc/6310/fd/63, if we echo anything to /proc/6310/fd/63, it will be visible on the terminal we are running cat on.

Although the syntax is similar, process substitution is very different from the commonly used command substitution technique i.e - cat <(ls) is much different from cat `ls`. In the former, the parenthesized commands are replaced by the file descriptor pointing to the generated output, and in the later, the parenthesized commands are replaced directly by their own output.

Related Content

No comments: