Tips on running and analyzing NMR structure calculations with XPLOR

From NMR Wiki

Jump to: navigation, search

Contents

Finding/installing XPLOR on your system

Chances are you already have XPLOR available on your computer. Type

which xplor

If this returns something like

/opt/software/xplor3851_linux/xplor

Then you already have it, otherwise - obtain a copy of NIH-XPLOR software and install it on your system.

If you get a long and meaningless-looking output - then xplor binary is not on your system path.

also notice that the actual binary file might have a different name, e.g. xploron3851_linux_ELF, then "which" command will fail too.

Once you have XPLOR installed, an important part to know is the path to the xplor executable binary. The directory part of it contains a lot of useful stuff, including sample scripts, various topology and parameter sets, etc.

In the case above directory is /opt/software/xplor3851_linux/. In this document this string will be refered to as $XPLOR_DIR

Analyzing output files

This part relies on use of several unix tools:

  • grep (text pattern search)
  • sort (sorting utility)
  • Unix shell pipes
  • builtin shell commands cd, mkdir, cp,
  • backtick-based command embedding into the command line
  • output redirection with < and > symbols
  • tail (printing ending portion of the file)
  • sed (text stream editor)
  • cat (tool for sticking files together and printing them to the terminal)

Suppose you have generated a number of pdb files named sa_<number>.pdb and you want to analyse those files select the best ones, etc. There are a number of frequently used (useful) operations and some of them are listed below.

Analysis of violated restraints

Find numbers of violated restraints per file

Run command

grep viol sa_*.pdb

The command above assumes that violations within a pdb structure are printed by xplor system in a way similar to:

REMARK violations.: 7, 0

This format may be different (and depends on the xplor script that was used to generate the pdb files), so the search pattern used by grep may need to be adjusted.

In the case above number 7 corresponds to NOE distance restraints.

Sort files by the number of violated NOE restraints

grep viol sa_*.pdb | sort -grk3

Vertical bar above | - is a symbol for the Unix pipe - which connects output of grep to input of sort.

Sort options used here are -g (general numeric search), -r (reverse output order), -k3 (sort by column 3 - which contains the number of violated NOE restraints in this case)

Select some least violating structures

For example, you might be interested to extract 20 least violating structures of the entire set.

grep viol sa_*.pdb | sort -grk3 | tail -n20

By this time output should look something like:

sa_6.pdb:REMARK violations.: 3, 0
sa_69.pdb:REMARK violations.: 3, 0
sa_47.pdb:REMARK violations.: 3, 0
sa_75.pdb:REMARK violations.: 2, 0
sa_57.pdb:REMARK violations.: 2, 0
... 15 more lines

At this point you may want to extract the list of files from this output by building the command a little further:

grep viol sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/'

sed is stream editor and a parameter 's/pdb.*/pdb/' instructs to replace anything following substring pdb with nothing.

Let's save this list into a file:

grep viol sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/' > list20.txt

Likewise you can select some (again 20 in this case) lowest energy structures

grep ener sa_*.pdb | sort -grk3 | tail -n20 | sed 's/pdb.*/pdb/' > list20.txt

Now you have the file called list20.txt which contains just file names - one per line.

Copy these files into a separate directory (after creating it first):

mkdir list20
cp `cat list20.txt` list20 

Here backtics are used to first run a command cat list20.txt (try it separately too) so that list of files itself is put onto the cp command line so that the files end up copied into directory list20

Overlay the structures by fitting

Now you have a directory list20 containing the 20 files. It is possible to fit them so that structures overlay well in the 3D structure display software.

Copy the fitting script

First get inside that directory:

cd list20

Copy/paste contents of Xplor_fit_backbone.inp into your own file.

Xplor_fit_backbone.inp was prepared starting from file $XPLOR_DIR/tutorial/nmr/average.inp that is supplied together with XPLOR software.

There are several parts that need editing:

  • structure topology definition (all the information about bonds, angles etc - anything that translates to energy terms in the calculation)
  • atom selection used for the fitting routine
  • list of file names (in two places)
Specify location of the topology file

Topology of proteins and nucleic acids is often defined via PSF files. PSF is not the only method to enter topology for XPLOR, it is also possible to use native XPLOR script, but you will need to prepare it and this is outside the scope of this tutorial.

The entry in Xplor_fit_backbone.inp looks like this:

structure @g_protein.psf end                     {*Read the structure file.*}

g_protein.psf is a file name in the current directory - you of course don't have it. Maybe your file is at ../my_protein.psf or similar?

You can specify path to the file relative to the current directory or (absolute) with respect to the root of the UNIX file system

Relative path will look like:

structure @../some_dir/my_protein.psf end

Absolute path may be

structure @/path/to/some_dir/my_protein.psf end 

Notice that the path starts at '/' - root of the file system.

Either format will work as long as the psf file can indeed be found at that location, which can be tested by

ls ../some_dir/my_protein.psf
ls /path/to/some_dir/my_protein.psf
Check the overlay atom selection statement

The default atom selection statement (that works for the proteins) is

vector idend ( store9 ) ( name ca or name n or name c )

That should be appropriately adjusted. For example you might want to exclude floppy terminal residues and maybe some loops, or may have to specify custom selection if you have any non-standard residues or parts of the structure.

For example this will select all CA, C, N and O atoms of all residues (notice that atom names are not case sensitive), and all carbon atoms of residue number 1 (in this case residue 1 was part of the loop forming a cycle in the peptide).

vector idend (store9) (
          (name ca or name c or name n or name o)
           or
          (resid 1 and name c#)
)

It's important to notice that Xplor_fit_backbone.inp uses store9 on each atom to mark the selection, that is all atoms that have that storage area (extra data space alloted to each atom) marked will be selected for the least square fitting procedure.

The script will take first structure and then fit all the remaining ones to that by rotations and translations, so that the RMSD in the selected atom coordinates is minimum.

Insert pdb file names into the script

Here another unix command will be handy:

ls sa*.pdb | sed 's/\(.*\)/"\1"/' >> Xplor_fit_backbone.inp

This will list the files matching the sa_*.pdb wildcard and append the list to file Xplor_fit_backbone.inp

Notice double greater-than sign - >>. This is important. Double >> will instruct the shell to append the list (produced by the ls sa_*.pdb command) to Xplor_fit_backbone.inp. If the there were single greater than sign, file Xplor_fit_backbone.inp would be overwritten.

Also notice the command sed 's/\(.*\)/"\1"/'. it instructs sed to capture input line-by-line into variable "\1" and put it's content to the output, but surrounded with double quotes, then as mentioned above >> appends the result to Xplor_fit_backbone.inp.

Now open the file and place the list to two key locations, and then remove the appended listing.

The snippets you need to find look like:

for $1 in
(
"file1.pdb"
"file2.pdb"
)
loop main

Just replace "file1.pdb" and "file2.pdb" (which are put there as example) - with your real file names. You have appended the quoted list of input pdb files - cut that list from there and paste it over "file1.pdb" and "file2.pdb"

Now you should be ready to run the script this way:

xplor < Xplor_fit_backbone.inp > fit.out&

In the command above program xplor invoked as the first token. Symbols < and > tell xplor to read from the file Xplor_fit_backbone.inp and write to fit.out. This technique is also called IO (input/output) redirection. If you just type xplor, the program will also run, but will expect you to type the input by hand and read the output from the screen.

An interesting detail is that we've used & - ampersand symbol at the end of the command line - that is used to send the xplor process into the background so that you can continue using the command line as the calculation proceeds. Here is is not so necessary as fitting script is fast, but for larger xplor jobs that may take hours - this will be important.

File fit.out is a regular text file with the output log printed by the xplor. It is useful to inspect this file to locate errors. The easiest way to do that is by using grep utility:

grep ERR fit.out

The script will actually print error like

%READC-ERR: multiple coordinates for    543 atoms

this error is not a big problem, but probably the script can be fixed to avoid this.

Personal tools