UNIX pipes and shell scripting for NMRPipe
From NMR Wiki
NMRPipe is a powerful UNIX command-line tool for processing NMR data.
Contents |
Shell command
In order to use nmrPipe one needs to understand how Unix shell interprets the command line. The command line accepts commands (of course).
So let's take a look at a command:
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di
The shell (the blank screen program accepting your typing) will split the line by empty spaces into words (there are more details in how shell treats the command line - but we'll skip them now).
The first word is the name of the program we want to run - nmrPipe, remaining words are options passed with the program.
If you have nmrPipe already installed and your shell "knows" where nmrPipe can be found in the file system, shell will try to run nmrPipe with the arguments that you provide. I.e. if all is right nmrPipe program will be loaded into the memory and the processor will be set to run through the executable instructions.
Input and output
Type the command line above and see what happens. You should get:
NMRPipe Error: input from a terminal: Check for missing input argument. Check for trailing spaces after any "\" line-continuations. NMRPipe Error opening NMR streams. NMRPipe Aborting with null header. NMRPipe Error Status: 1 NMRPipe Function PS
Nothing very useful, but it is good to know why that happens.
Every program in UNIX has three standard data streams "attached" by default: standard input (STDIN), standard output (STDOUT), standard error (STDERR).
So logically in our case nmrPipe is expecting data from the standard input, the shell is by default attaching your keyboard to the nmrPipe's standard input.
For nmrPipe that would be useless raw nmr data is never typed by hand. There is another way to feed data into nmrPipe an that is discussed in the next section.
Standard output is by default your screen (terminal output), as well as standard error (a separate stream usually reserved for reporting errors).
Running a little bit ahead, nmrPipe won't output data to the terminal either, so again there is another way to put data in and out of nmrPipe, which is disucssed in the next section.
The point of this section is to introduce concepts of standard data streams available to all programs running in the Unix environment.
"Echo" is one good program for the demonstration of standard input and standard output. Try typing
echo
Program "echo" will be launched. Now type some more stuff, then hit enter... Echo is a very simple program. Hit Ctrl-C to exit.
Echo reads data from standard input and sends it unchanged to the standard output.
Input and output 2.0
So we've mentioned that there is another way to get data in and out of Unix programs (e.g. nmrPipe).
There are two ways: (1) using files, (2) using pipes and (3) using a combination of these methods.
Input and output with files
Files are simply files on your disk or some other data storage device.
Here is a command line that will take data from file and put it through nmrPipe:
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di < test.DAT
(you'll probably get an error message that nmrPipe cant output to terminal, we'll fix it soon)
Notice the "<" symbol. If the shell registers that symbol, it will interpret the word to the right of it as a file name, and stuff to the left - as a command to run.
So now the shell takes file test.DAT and feeds it to nmrPipe
Now let's fix the last problem:
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di < test.DAT > output.DAT
This will either just work, or give some message about the content of test.DAT.
As you might guess that now shell stores output of
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di < test.DAT
to file output.DAT
Input and output with pipes (and some files)
You won't always want to store everything in files. Sometimes your work produces many intermediate results which are not very useful yet.
In those cases you will want to use pipes. Pipes directly connect input and output streams of programs that are running in the processing chain. Output of the upstream program is connected to the input of the downstream program.
Example:
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di < test.DAT | nmrPipe -fn EXT -x1 7.0ppm -xn 12.5ppm -sw > output.DAT
For now just notice symbol "|". That's the pipe. It takes output of whatever is to the left and passes as input to whatever is to the right.
Now the data flow is: (1) file test.DAT → (2) command nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di → (3) command nmrPipe -fn EXT -x1 7.0ppm -xn 12.5ppm -sw → (4) file output.DAT
Chains like these can be made very long. This one has one extra step inserted in the middle.
nmrPipe -fn PS -p0 -0.0 -p1 0.0 -di < test.DAT | nmrPipe -fn POLY -auto | nmrPipe -fn EXT -x1 7.0ppm -xn 12.5ppm -sw > output.DAT
The limitation is that each process can have only one standard input and one standard output stream. If your program has to read more then one file, extra file names will have to be given as arguments.
An important thing to understand when using pipes is that every command involved in the "pipeline" is run in a separate process in your system. All processes run concurrently (which is almost synonymous to simultaneously). Each program reads its own input, does some job and sends result to the output, while having no idea what program is supplying input and what is reading the output. The shell takes care of connecting programs with other programs and files.
IO redirection
In summary, standard input and standard output streams of any Unix process can be redirected from default keyboard and screen to disk files and other programs streams.
In the terms of computer this is called input/ouput (IO) redirection.
Putting it all in scripts
In practice you will only very rarely or even never issue nmrPipe commands directly by hand into the shell.
NMR data processing involves many steps and parameters, so typing would be tedious. Also you will often find that you need to apply the same type processing many times (while perhaps changing a few parameters).
Saving nmrPipe processing directives in a more permanent form - files on your disk - makes good sense.
Take a look at this script (content of an actual file):
#!/bin/csh nmrPipe -in test.fid \ | nmrPipe -fn SP -off 0.5 -end 1.00 -pow 2 -c 1.0 \ | nmrPipe -fn ZF -size 1024 \ | nmrPipe -fn FT \ | nmrPipe -fn PS -p0 0.0 -p1 0.0 -di \ | nmrPipe -fn POLY -auto \ | nmrPipe -fn TP \ | nmrPipe -fn SP -off 0.5 -end 1.00 -pow 2 -c 1.0 \ | nmrPipe -fn ZF -size 1024 \ | nmrPipe -fn FT \ | nmrPipe -fn PS -p0 0.0 -p1 0.0 -di \ | nmrPipe -fn POLY -auto \ | nmrPipe -fn TP \ | nmrPipe -fn NULL -out test.DAT -verb 2 -ov
In principle it is not much different from the examples shown in the previous sections, but there are some.
Script interpreter
The first line
#!/bin/csh
has a special meaning. #! ("dash-bang") beginning means to the shell that this file needs a special interpreter progam. Whatever comes after #!, (/bin/csh in this case) is a command that launches the interpreter for the remaining content of script file.
The command must be executable, i.e. /bin/csh must be an actual program located in your file system.
At least check that there is something at the path by typing
which /bin/csh
If you get
which: /bin/csh: No such file or directory
That means that your script won't work.
If everything is right, your command line shell will launch script interpreter (/bin/csh here) in a separate process, then feed the remaining contents into the the launched interpreter process.
That said, using a separate interpreter process is optional - in case your command line shell already has the nmrPipe environment properly set up.
Also you might have seen perl programs or programs written in other scripting languages. In those files, first line might look like:
#!/usr/bin/perl -w
Same idea here, but of course content of the script body will be quite different.
Line wrapping in shell scripts
Second difference from examples in the previous sections is that a backslash symbol "\" is used on almost every line.
In a C-Shell script, the Backslash is used to suppress any special meaning of the character that follows. So, a Backslash as the very last typed character in a line will suppress the meaning of the "invisible" newline character which follows. So, the Backslash at the very end of a line will have the effect of continuing the command onto the next line.
Accordingly, the script shown above is treated as if it has only two lines: one with #! and the next one with the long nmrPipe pipeline series.
Scripts of course don't have to have each line to end with \, backslash is only used to wrap lines that need to be wrapped.
An important point about line wrapping: if you are using backslash to wrap line, nothing should be there after the backslash - including any empty space characters.
Trailing empty space is a frequent cause breaking the nmrPipe scripts.
Perhaps you notice that no > or < symbols are used in the example above. Even though every Unix program has access to its STDIN and STDOUT streams, it does not have to use them. nmrPipe can read and write files using names provided with the -in and -out arguments.