Tips on running and analyzing NMR structure calculations with XPLOR
From NMR Wiki
Contents |
Finding/installing XPLOR on your system
Chances are you already have XPLOR available on your computer. Type
which xplor
If this returns something like
/opt/software/xplor3851_linux/xplor
Then you already have it, otherwise - obtain a copy of NIH-XPLOR software and install it on your system.
If you get a long and meaningless-looking output - then xplor binary is not on your system path.
also notice that the actual binary file might have a different name, e.g. xploron3851_linux_ELF, then "which" command will fail too.
Once you have XPLOR installed, an important part to know is the path to the xplor executable binary. The directory part of it contains a lot of useful stuff, including sample scripts, various topology and parameter sets, etc.
In the case above directory is /opt/software/xplor3851_linux/. In this document this string will be refered to as $XPLOR_DIR
Analyzing output files
This part relies on use of several unix tools:
- grep (text pattern search)
- sort (sorting utility)
- Unix shell pipes
- builtin shell commands cd, mkdir, cp,
- backtick-based command embedding into the command line
- output redirection with < and > symbols
- tail (printing ending portion of the file)
- sed (text stream editor)
- cat (tool for sticking files together and printing them to the terminal)
Suppose you have generated a number of pdb files named sa_<number>.pdb and you want to analyse those files select the best ones, etc. There are a number of frequently used (useful) operations and some of them are listed below.
Analysis of violated restraints
Find numbers of violated restraints per file
Run command
grep viol sa_*.pdb
The command above assumes that violations within a pdb structure are printed by xplor system in a way similar to:
REMARK violations.: 7, 0
This format may be different (and depends on the xplor script that was used to generate the pdb files), so the search pattern used by grep may need to be adjusted.
In the case above number 7 corresponds to NOE distance restraints.
Sort files by the number of violated NOE restraints
grep viol sa_*.pdb | sort -grk3
Vertical bar above | - is a symbol for the Unix pipe - which connects output of grep to input of sort.
Sort options used here are -g (general numeric search), -r (reverse output order), -k3 (sort by column 3 - which contains the number of violated NOE restraints in this case)
Select some least violating structures
For example, you might be interested to extract 20 least violating structures of the entire set.
grep viol sa_*.pdb | sort -grk3 | tail -n20
By this time output should look something like:
sa_6.pdb:REMARK violations.: 3, 0 sa_69.pdb:REMARK violations.: 3, 0 sa_47.pdb:REMARK violations.: 3, 0 sa_75.pdb:REMARK violations.: 2, 0 sa_57.pdb:REMARK violations.: 2, 0 ... 15 more lines
At this point you may want to extract the list of files from this output by building the command a little further:
grep viol sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/'
sed is stream editor and a parameter 's/pdb.*/pdb/' instructs to replace anything following substring pdb with nothing.
Let's save this list into a file:
grep viol sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/' > list20.txt
Likewise you can select some (again 20 in this case) lowest energy structures
grep ener sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/' > list20.txt
Now you have the file called list20.txt which contains just file names - one per line.
Copy these files into a separate directory (after creating it first):
mkdir list20 cp `cat list20.txt` list20
Here backtics are used to first run a command cat list20.txt (try it separately too) so that list of files itself is put onto the cp command line so that the files end up copied into directory list20
Overlay the structures by fitting
Now you have a directory list20 containing the 20 files. It is possible to fit them so that structures overlay well in the 3D structure display software.
Copy the fitting script
First get inside that directory:
cd list20
Copy/paste contents of Xplor_fit_backbone.inp into your own file.
Xplor_fit_backbone.inp was prepared starting from file $XPLOR_DIR/tutorial/nmr/average.inp that is supplied together with XPLOR software.
There are several parts that need editing:
- structure topology definition (all the information about bonds, angles etc - anything that translates to energy terms in the calculation)
- atom selection used for the fitting routine
- list of file names (in two places)
Specify location of the topology file
Topology of proteins and nucleic acids is often defined via PSF files. PSF is not the only method to enter topology for XPLOR, it is also possible to use native XPLOR script, but you will need to prepare it and this is outside the scope of this tutorial.
The entry in Xplor_fit_backbone.inp looks like this:
structure @g_protein.psf end {*Read the structure file.*}
g_protein.psf is a file name in the current directory - you of course don't have it. Maybe your file is at ../my_protein.psf or similar?
You can specify path to the file relative to the current directory or (absolute) with respect to the root of the UNIX file system
Relative path will look like:
structure @../some_dir/my_protein.psf end
Absolute path may be
structure @/path/to/some_dir/my_protein.psf end
Notice that the path starts at '/' - root of the file system.
Either format will work as long as the psf file can indeed be found at that location, which can be tested by
ls ../some_dir/my_protein.psf ls /path/to/some_dir/my_protein.psf
Check the overlay atom selection statement
The default atom selection statement (that works for the proteins) is
vector idend ( store9 ) ( name ca or name n or name c )
That should be appropriately adjusted. For example you might want to exclude floppy terminal residues and maybe some loops, or may have to specify custom selection if you have any non-standard residues or parts of the structure.
For example this will select all CA, C, N and O atoms of all residues (notice that atom names are not case sensitive), and all carbon atoms of residue number 1 (in this case residue 1 was part of the loop forming a cycle in the peptide).
vector idend (store9) ( (name ca or name c or name n or name o) or (resid 1 and name c#) )
It's important to notice that Xplor_fit_backbone.inp uses store9 on each atom to mark the selection, that is all atoms that have that storage area (extra data space alloted to each atom) marked will be selected for the least square fitting procedure.
The script will take first structure and then fit all the remaining ones to that by rotations and translations, so that the RMSD in the selected atom coordinates is minimum.
Insert pdb file names into the script
Here another unix command will be handy:
ls sa*.pdb | sed 's/\(.*\)/"\1"/' >> Xplor_fit_backbone.inp
This will list the files matching the sa_*.pdb wildcard and append the list to file Xplor_fit_backbone.inp
Notice double greater-than sign - >>. This is important. Double >> will instruct the shell to append the list (produced by the ls sa_*.pdb command) to Xplor_fit_backbone.inp. If the there were single greater than sign, file Xplor_fit_backbone.inp would be overwritten.
Also notice the command sed 's/\(.*\)/"\1"/'. it instructs sed to capture input line-by-line into variable "\1" and put it's content to the output, but surrounded with double quotes, then as mentioned above >> appends the result to Xplor_fit_backbone.inp.
Now open the file and place the list to two key locations, and then remove the appended listing.
The snippets you need to find look like:
for $1 in ( "file1.pdb" "file2.pdb" ) loop main
Just replace "file1.pdb" and "file2.pdb" (which are put there as example) - with your real file names. You have appended the quoted list of input pdb files - cut that list from there and paste it over "file1.pdb" and "file2.pdb"
Now you should be ready to run the script this way:
xplor < Xplor_fit_backbone.inp > fit.out&
In the command above program xplor invoked as the first token. Symbols < and > tell xplor to read from the file Xplor_fit_backbone.inp and write to fit.out. This technique is also called IO (input/output) redirection. If you just type xplor, the program will also run, but will expect you to type the input by hand and read the output from the screen.
An interesting detail is that we've used & - ampersand symbol at the end of the command line - that is used to send the xplor process into the background so that you can continue using the command line as the calculation proceeds. Here is is not so necessary as fitting script is fast, but for larger xplor jobs that may take hours - this will be important.
File fit.out is a regular text file with the output log printed by the xplor. It is useful to inspect this file to locate errors. The easiest way to do that is by using grep utility:
grep ERR fit.out
The script will actually print error like
%READC-ERR: multiple coordinates for 543 atoms
this error is not a big problem, but probably the script can be fixed to avoid this.